The signet
R package implements a method to detect selection in biological
pathways. The general idea is to search for gene subnetworks within biological
pathways that present unusual features, using a heuristic approach
(simulated annealing).
The general idea is simple: we consider a gene list with prealably defined scores (e.g. a differentiation measure like the Fst) and we want to find gene networks presenting a score higher than expected under the null hypothesis.
To do so, we will use biological pathways databases converted as gene networks and search in these graphs for high-scoring subnetworks.
Details about the algorithm can be found in Gouy et al. (2017).
Please cite this paper if you use signet
for your project:
signet
takes as input a data frame of gene scores. The first column must
correspond to the gene ID (e.g. Entrez) and the second columns is the gene
score (a single value per gene).
The other input is a list of biological pathways (gene networks) in the
graphNEL
format. We advise to use the package graphite
to get the
pathway data:
library(graphite)
# pathwayDatabases() #to have a look at pathways and species available
# get the pathway list:
paths <- graphite::pathways("hsapiens", "kegg")
# convert the first 3 pathways to graphs:
kegg_human <- lapply(paths[1:3], graphite::pathwayGraph)
head(kegg_human)
Note that gene identifiers must be the same between the gene scores data frame
and the pathway list (e.g. entrez). graphite
provides a function to convert
gene identifiers.
A example dataset from Daub et al. (2013, MBE) as well as human KEGG pathways are provided:
library(signet)
data(daub13)
head(scores) # gene scores
## gene score
## 1 1 0.9200665
## 2 10 1.5974385
## 3 100 1.6885589
## 4 1000 3.3314333
## 5 10000 1.6668512
## 6 10001 1.3529425
We first have to search for high-scoring subnetworks within the provided biological pathways, using simulated annealing:
# Run simulated annealing on the first 3 KEGG pathways:
HSS <- searchSubnet(kegg_human, scores)
This function returns, for each pathway, the highest-scoring subnetwork found, its score and other information about the simulated annealing run.
Then, to test the significance of the high-scoring subnetworks, we generate a null distribution of high-scores:
#Generate the empirical null distribution
null <- nullDist(kegg_human, scores, n = 1000)
Note that the null
object is a simple vector of null high-scores (here, 1000).
Therefore, you can run other iterations afterwards and concatenate the output
with the previous vector if you want to compute more precise p-values.
This distribution is finally used to compute p-values and update the
signet
object:
HSS <- testSubnet(HSS, null)
When p-values have been computed, you can generate a summary table (one row per pathway):
# Results: generate a summary table
tab <- summary(HSS)
head(tab)
## pathway net.size subnet.size subnet.score
## 1 Acute myeloid leukemia 53 9 2.75399072301884
## 2 Adherens junction 66 8 3.25243964604843
## 3 Adipocytokine signaling pathway 62 9 2.63809122980381
## p.val subnet.genes
## 1 0.105 572 2475 2885 3815 3845 5291 5295 6654 10000
## 2 0.038 87 1495 1496 4008 4301 7082 7414 29119
## 3 0.128 1374 2538 4852 5562 5563 6774 53632 57818 92579
# you can write the summary table as follow:
# write.table(tab,
# file = "signet_output.tsv",
# sep = "\t",
# quote = FALSE,
# row.names = FALSE)
Note that searching for high-scoring subnetworks and generating the null distribution can take a few hours. However, these steps are easy to parallelize on a cluster as different iterations are independent from each other.
Cytoscape (www.cytoscape.org) is an external software dedicated to network
visualization. signet
allows to generate an XGMML file to be loaded in
Cytoscape (File > Import > Network > File…).
This file can be written in your working directory thanks to the
function writeXGMML
.
If the input of the function is a single signet
object, the whole pathway will
be represented and nodes belonging to the highest-scoring subnetwork (HSS)
will be highlighted in red.
writeXGMML(HSS[[1]], filename = "cytoscape_input.xgmml")
If a list of pathways (signetList) is provided, all subnetworks with a p-value below a given threshold (default: 0.01) are merged and represented. Note that in this case, only the nodes belonging to HSS are kept for representation.
writeXGMML(HSS, filename = "cytoscape_input.xgmml", threshold = 0.01)
The representation can then be finely customised in Cytoscape.