Differential expression (DE) analysis is commonly used to identify biomarker candidates that have significant changes in their expression levels between distinct biological groups. One drawback of DE analysis is that it only considers the changes on single biomolecular level. In differential network (DN) analysis, network is typically built based on the correlation and biomarker candidates are selected by investigating the network topology. However, correlation tends to generate over-complicated networks and the selection of biomarker candidates purely based on network topology ignores the changes on single biomolecule level. Thus, we have proposed a novel method INDEED, which considers both the changes on single biomolecular and network levels by integrating DE and DN analysis. INDEED has been published in Methods journal (PMID: 27592383). This is the R package that implements the algorithm.
This R package will generate a list of dataframes containing information such as p-values, node degree and activity score for each biomolecule. A higher activity score indicates that the corresponding biomolecule has more neighbors connceted in the differential network and their p-values are more statistically significant. It will also generate a network display to aid users’ biomarker selection.
You can install INDEED from github with:
# install.packages("devtools")
devtools::install_github("ressomlab/INDEED")
Load the package.
# load INDEED
library(INDEED)
# Loading required package: glasso
A testing dataset has been provided to the users to get familiar with INDEED R package. It contains the expression levels of 39 metabolites from 120 subjects (CIRR: 60; HCC: 60) with CIRR group named as group 0 and HCC group named as group 1.
# Data matrix contains the expression levels of 39 metabolites from 120 subjects
# (6 metabolites and 10 subjects are shown)
head(Met_GU[, 1:10])
# Group label for each subject (40 subjects are shown)
Met_Group_GU[1:40]
# Metabolite KEGG IDs (10 metabolites are shown)
Met_name_GU[1:10]
non_partial_cor()
In non partical correlation method, user only need to run
non_partial_cor()
function. No rho number will be needed.
In this method, user input dataframe with expression level, class label,
id, method, number of permutation and p-value similar to partical
correaltion but in one step fashion. Result will be saved in a list of
two dataframes: activity_score and diff_network. Activity_score contains
a dataframe with biomolecules ranked by activity score calculated from
p-value and node degree. Diff_network contains a dataframe of connection
weight between two nodes.
The following example demonstrates how to use
non_partial_cor()
function:
non_partial_cor(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU, method = "spearman")
select_rho_partial()
To use the function select_rho_partial()
, users will
need to have these data sets in hand:
partial_cor()
In partical correlation method, user will need to preprocess the data
using select_rho_partial()
function, followed by using
partial_cor()
and choosing desired rho number and complete
the analysis. User can also choose to provide p-value table and number
of permutations in partial_cor()
function. Result will be
saved in a list of two dataframes: activity_score and diff_network.
Activity_score contains a dataframe with biomolecules ranked by activity
score calculated from p-value and node degree. Diff_network contains a
dataframe of connection weight between two nodes.
The following example demonstrates how to use
select_rho_partial()
and
partial_cor()
function:
pre_data <- select_rho_partial(data = Met_GU, class_label = Met_Group_GU, id = Met_name_GU,
error_curve = "YES")
result <- partial_cor(data_list = pre_data, rho_group1 = 'min', rho_group2 = "min",
permutation = 1000, p_val = pvalue_M_GU, permutation_thres = 0.05)
In this case, the sparse differential network is based on partial correlation and p-value for each biomolecule is calculated for users. Also, rho is picked based on minimum rule and number of permutations is set to 1000.
network_display()
An interactive tool to assist in the visualization of the results from INDEED functions non_partial_corr() or patial_corr(). The size and the color of each node can be adjusted by users to represent either the Node_Degree, Activity_Score, Z_Score, or P_Value. The color of the edge is based on the binary value of either 1 corresonding to a positive correlation change dipicted as green or a negative correlation change of -1 dipicted as red. The user also has the option of having the width of each edge be proportional to its weight value. The layout of the network can also be customized by choosing from the options: ‘nice’, ‘sphere’, ‘grid’, ‘star’, and ‘circle’. Nodes can be moved and zoomed in on. Each node and edge will display extra information when clicked on. Secondary interactions will be highlighted as well when a node is clicked on.
The following example demonstrates how to use the
network_display()
function:
network_display(results = result, nodesize= 'Node_Degree', nodecolor= 'Activity_Score',
edgewidth= 'NO', layout= 'nice')