Time Incorporated miR-mRNA Generation of Networks (TimiRGeN) is aimed at researchers who wish to explore interactions in time series microRNA(miR/miRNA)-mRNA expression data. This package integrates, functionally analyses and generates small networks for hypothesis generation. To achieve data reduction without reducing biological signal, the TimiRGeN package utilises several published packages and employs their functions in a synergistic fashion for time series multi-omic analysis. The following packages have been built upon for several functions in the TimiRGeN package:
rWikiPathways [1], clusterProfiler [2], DOSE [3], biomaRt [4], RCy3 [5], Mfuzz [6], igraph [7].
TimiRGeN is very selective and only uses miR-mRNA interaction data from databases curated within the last 4 years.
TargetScans[8], miRDB[9], miRTarBase[10].
TimiRGeN does have the capability to generate networks in R, however this package is uniquely open ended, as the output can be easily be exported to cytoscape [11] or pathvisio [12] for better visualisation options.
TimiRGeN solely uses WikiPathways for functional pathway analysis, and is the first tool to allow this for time series data. WikiPathways is a user curated pathway database that contains 1000s of mechanistic signalling pathways from multiple species [13]. Furthermore, WikiPathways works very well with PathVisio which is our recommended tool for GRN (gene regulatory network) design. Please read the (Pathvisio_GRN_guide.pdf)[https://github.com/Krutik6/TimiRGeN/issues/2] for a step-by-step tutorial for a GRN creation process.
Currently the package can analyse most vertebrate model organisms e.g. human, mouse, rat, and zebrafish. It can also analyse miR and mRNA data combined or separately, and can use entrez or ensembl gene IDs for pathway analysis. This is because most WikiPathways are annotated with either entrez IDs or ensembl gene IDs. This tool can be best used after differential expression (DE) analysis, and has potential to become a staple part of any miR-mRNA expression data study. A number of longitudinal analysis methods have been included in this package including correlation analysis, cross-correlation, and hierarchichal clustering. Furthermore, there are now options to investigate miRNA-mRNA pairs using regression analysis and even multi-miRNA/mRNA regression prediction over time.
TimiRGeN dependencies loaded with the package can be further investigated from these sources [1-7, 14-24].
Depending on the data to be analysed, please load the appropriate org.db package e.g. org.Mm.eg.db
for mouse data or org.Hs.eg.db
for human data.
In this section the combined method will be used to analyse a mouse kidney fibrosis data set. The mRNA data was published in Craciun et al (2016) [25] which was downloaded from GSE65267. The associated miR data was published in Pellegrini et al (2016) [26] and this was downloaded from GSE61328.
Notice the standard nomenclature used in the column names. Do follow the this standard for your own input data. The time point should come first and is followed by a .
. The time point should consist of alphabetical characters followed by numerical characters e.g. D1, H6, TP3. After the .
, the column name should continue to display the specific result types from differential expression analysis.
Note. There should only be one .
in each column name and no _
characters. Having more than one .
or any _
characters will confuse some functions.
Note. There should be no NAs in your miR and mRNA data files.
TimiRGeN uses MultiAssayExperiment
(MAE) to contain information. The dataframes and matrices will be stored as assays
, S4 objects will be stored as Experiments
and the lists will be stored as metadata
.
If unfamiliar with MultiAssayExpriments please read through the vignette to understand how data can be accessed or go through the user guide which can be found on the MultiAssayExperiment bioconductor page [24].
MAE <- getIdsMir(MAE = MAE, assay(MAE, 1), orgDB = org.Mm.eg.db, miRPrefix = "mmu")
MAE <- getIdsMrna(MAE = MAE, assay(MAE, 2), mirror = "useast", species = "mmusculus", orgDB = org.Mm.eg.db)
## [1] "BiomaRt server connection was established."
Using getIds
functions will produce dataframes containing entrezgene and ensembl ID annotations for genes. This is useful for downstream analysis.
Due to the nature of miRs, many NAs may be found in the output of getIdsMir
functions. Entrezgene IDs and ensemble IDs are insensitive to miRs with -3p and -5p strands. Therefore, adjusted entrezgene IDs and ensemble IDs are also created.
Note. For getIdsMrna
functions, if a connection time out error occurs or if downloads are very slow, try to use other mirrors e.g. mirror = "useast"
. Incase BiomaRt
servers do not connect, clusterProfiler
will be used instead. Generally, the former method finds more annotations.
MAE <- combineGenes(MAE = MAE, miR_data = assay(MAE, 1), assay(MAE, 2))
MAE <- genesList(MAE = MAE, method = "c", genetic_data = assay(MAE, 9), timeString = "D")
MAE <- significantVals(MAE = MAE, method = "c", geneList = metadata(MAE)[[1]], maxVal = 0.05, stringVal = "adjPVal")
mRNA and miR data can be combined using combineGenes
function.
The genesList
function will transform the large dataframe into multiple nested dataframes within a list. The data will be separated by the timeString
parameter. In this example by D
(days), because it was the non-numeric character before the .
in the column names.
Significantly differentially expressed genes can be retrieved from each nested dataframe using the significantVals
function. In this example. only genes which had an adjusted P value of less than 0.05
would remain in the list.
MAE <- addIds(MAE = MAE, method = "c", filtered_genelist = metadata(MAE)[[2]], miR_IDs = assay(MAE, 3), mRNA_IDs = assay(MAE, 7))
MAE <- eNames(MAE = MAE, method = "c", gene_IDs = metadata(MAE)[[3]])
Now entrezgene IDs or ensembl IDs which were created before can be integrated into the filtered dataframes of genes using addIds
. In this example entrezgene IDs were added.
Lists of entrez IDs/ ensembl IDs can be extracted for further analysis using the eNames
function.
Once we have a list of significant genes per time point we can put this through gene set enrichment analysis to find enriched pathways in each time point in the data. TimiRGeN uses WikiPathways [13] for overrepresentation analysis.
This is standard overrepresentation analysis, here the enrichWiki
function wraps around enrichment functions from DOSE and clusterProfiler [2,3] but applies these functions for time series analysis with WikiPathways.
Note. Making multiple separate MAE objects makes it easier to work with all the generated data files.
MAE2 <- enrichWiki(MAE = MAE2, method = "c", ID_list = metadata(MAE)[[4]], orgDB = org.Mm.eg.db, path_gene = assay(MAE2, 1), path_name = assay(MAE2, 2), ID = "ENTREZID", universe = assay(MAE2, 1)[[2]])
path_gene
and path_name
can be found as output from the dloadGmt
function.
For an alternative test, a unique universe
can be used e.g. all probes found in a microarray or all known genes expressed in a cell type. This is done in section 3.
To plot results from Overrepresentation analysis, the savePlots
function can save all plots in the current working directory, and the plots can be saved to file in a variety of formats.
From here a WikiPathway can be selected for miR-mRNA interaction analysis. If a pathway of interest is found, move onto part 2.7
of the vignette.
TimiRGeN also offers a supervised soft clustering method using Mfuzz [6, 27]. This provides a global view of temporal patterns based on the common genes found between the each time point of the data and each pathway.
This method starts by importing all mouse WikiPathways and associated gene IDs. The L
symbol will give us entezgene IDs. Use the En
symbol for ensembl IDs.
wikiMatrix
will produce a matrix of common genes found between each time point (rows) and each pathway (columns). A final row representing the total number of genes in each pathway is added to the bottom of the matrix.
turnPercent
will normalise the number of genes found in each pathway by converting to percentages. This will account for pathways of differing sizes.
MAE2 <- createClusters(MAE = MAE2, method = "c", percentMatrix = assay(MAE2, 5), noClusters = 4, variance = 0.99)
createClusters
uses functions from Mfuzz [6] to separate the percentMatrix
into a number of clusters based on temporal changes. In our example we created 12 clusters. Please refer to the Mfuzz source material for more information [27].
Note. The message “genes excluded” refers to number of pathways excluded in TimiRGeN.
A standard deviation plot is created to show the variance found in each pathway over the time course. A PCA plot can also be created, this will show how close each of the clusters are. Both plots are from Mfuzz functions [6].
Once we are happy about the number of clusters, the fuzzy clusters can be plotted. Standardized values representing abundance are on the y axis and the time points are on the x axis.
quickFuzz(Mfuzzdata = experiments(MAE2)[[7]], Clusters = metadata(MAE2)[[3]], W = FALSE, background = "white", subcol = "black", labelcol = "black", axiscol = "black", axisline = "black")
Each cluster represents a different temporal behaviour, and each cluster will consist of lines which will fit to varying degrees to each temporal behaviour. Colours of the lines will represent how well a particular pathway fits into a temporal behaviour. Red lines fit very well, orange lines fit fairly well, yellow lines somewhat fit and purple lines do not fit well.
Note. Clear plots after quickFuzz, as there could be visual defects in future plots.
MAE2 <- diffExpressRes(MAE = MAE2, df = assay(MAE, 1), dataType = "Log2FC", genes_ID = assay(MAE, 3), name = "miR_log2fc")
MAE2 <- diffExpressRes(MAE = MAE2, df = assay(MAE, 2), dataType = "Log2FC", genes_ID = assay(MAE, 7), name = "mRNA_log2fc")
The diffExpressRes
function will retrieve one specific result type from differential expression, along with either entrez gene IDs or ensembl gene IDs from a getIDs
function. It is recommend to use a DE results type which represents abundance e.g. Log2FC.
Add a name
to differentiate the results from diffExpressRes
. Objects within MAEs cannot share the same name.
Note. Parameter df
of diffExpressRes
should not contain any NAs. Parameter genes_ID
can contain NAs.
MAE2 <- reduceWiki(MAE = MAE2, path_data = assay(MAE2, 3), stringWiki = "Lung fibrosis")
MAE2 <- wikiMrna(MAE2, mRNA_express = assay(MAE2, 10), singleWiki = assay(MAE2, 11), stringWiki = "Lung fibrosis")
Now we can identify which genes from a pathway of interest are in common with the input data. The Lung fibrosis
pathway is used in this example.
MAE3 <- MultiAssayExperiment()
MAE3 <- mirMrnaInt(MAE = MAE3, miR_express = assay(MAE2, 9), GenesofInterest = assay(MAE2, 12), maxInt = 5)
mirMrnaInt
creates a correlation matrix for every possible miR-mRNA interaction between the GenesofInterest
created from wikiMrna
and all the input miRs.
The maxInt
parameter must be equal to the number of time points in the input data. The corMeth
option allows users to select between pearson, spearman or kendall, default being pearson.
TargetScans v7.2, miRDB v6.0 and miRtaRBase v8.0 [8-10] can be used to determine which of the potential miR-mRNA interactions have been predicted/ functionally validated.
The following code will download the latest versions of the databases and format them for species specific miR-mRNA interaction mining.
For human analysis use "hsa"
and org.Hs.eg.db
.
Note. If timeout error occurs during downloads; increase it.
MAE3 <- dataMiningMatrix(MAE = MAE3, corrTable = assay(MAE3, 1), targetscan = assay(MAE3, 2), mirdb = assay(MAE3, 3), mirtarbase = assay(MAE3, 4))
The dataMiningMatrix
function will identify if any of the potential miR-mRNA interactions from the mirMrnaInt
function are predicted/ functionally assessed, and display this as integers.
MAE3 <- matrixFilter(MAE = MAE3, miningMatrix = assay(MAE3, 5), negativeOnly = TRUE, predictedOnly = FALSE, threshold = 2, maxCor = -0.5)
Potential interactions with no-little evidence can be filtered out using parameters on matrixFilter
.
NegativeOnly
being TRUE
will only filter for interactions with a negative average correlation. PredictedOnly
being TRUE
will ignore interactions found on miRTarBase. THRESHOLD
being 2
will filter for interactions that have been predicted/ validated by two or more databases. maxCor
being -0.5
will remove any interactions that have an average correlation greater than -0.5
.
Utilising igraph [7] the filtered miR-mRNA interactions can be displayed. The colour of the edges represents the level of correlations between the miRs and mRNAs.
Note. Adjustments to the size of the plot window may be needed to access the key in the bottom left hand side.
Note. Having too many interactions will be very difficult to see in R, so it’s recommend exporting this data into Cytoscape in these cases.
Note. Filtered miR-mRNA interactions must consist of at least two miR-mRNA interactions or this plot will not work and input time series must have at least three time points.
Once a pathway of interest is selected and miR-mRNA interactions have been filtered, a suite of longitudinal analysis tools are available to inspect relationships between the miRNA-mRNA interactions. Firstly hierarchical clustering can be performed on the genes of interest.
quickPathwayTC(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), morethan = TRUE, threshold = 1, pathwayname = "Lung fibrosis")
quickDendro(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), pathwayname = "Lung fibrosis")
quickDMap(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), pathwayname = "Lung fibrosis")
quickHClust(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), pathwayname = "Lung fibrosis", k = 3, cluster = 1)
quickPathwayTC
will generate a time course consisting of each gene of interest from the selected pathway. Certain genes can be highlighted based on their relative expression levels. Code for this section was ammended from this blog [31].
quickDendo
will generate a dendrogram of the mRNAs and miRNAs involved with the pathway of interest.
From here quickHClust
will plot the genes from a selected cluster. In this examples 3 (k
) clusters were created.
For analysis of individual miRNA-mRNA interactions, TimiRGeN provides options to generate correlation plots and cross-correlation plots. These pair analysis methods is ideal for longer time courses with at least five time points. Shorter time series are not appropriate and warning messages will display this.
quickMap(filt_df = assay(MAE3, 6), numpairs = 11)
quickTC(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), pair = 1, scale = FALSE, Interpolation = TRUE, timecourse = 14)
quickCrossCorr(filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10), pair = 1, scale = FALSE, Interpolation = FALSE)
quickMap
will show all miRNA-mRNA interaction pairs in descending order of correlation.
quickTC
will contrast the correlation over time for a selected pair and quickCrossCorr
will generate of cross-correlation plot to analyse the relationship between the two time series (miRNA and mRNA). Specifically at which point the two time series are best matched.
TimiRGeN
can also plot regression for miRNA-mRNA interactions. Furthermore, based on the expression levels of an interacting miR, the mRNA expression levels can be predicted and the accuracy calculated. This can also be done for a miR. Interestingly this can also be performed for multi miRNA/ mRNA interactions, e.g. Igf1
has many miRNA binding partners and this can be explored by regression.
MAE3 <- multiReg(MAE = MAE3, gene_interest = "Igf1", mRNAreg = TRUE, filt_df = assay(MAE3, 6), miRNA_exp = assay(MAE2, 9), mRNA_exp = assay(MAE2, 10))
model1 <- linearRegr(mreg = assay(MAE3, 7), colpair = 3, alterpairs = c(4, 5))
summary(model1$regression)
model2 <- linearRegr(mreg = assay(MAE3, 7), colpair = 5)
summary(model2$regression)
quickTCPred(model = model1, reg_df = assay(MAE3, 7))
quickReg(reg_df = assay(MAE3, 7), colselect = 3)
quickReg(reg_df = assay(MAE3, 7), colselect = 6)
multiReg
will generate a matrix of a gene of interest (mRNA or miRNA) and all it’s predicted binding partners. The linearRegr
can be used to create a linear model out of several of the genes used in multiReg
.
Note. If many genes are used in linearRegr
NANs may appear. For this reason try testing out different combination of formulas.
quickTCPred
generates a multi-regression plot from the formula made by linearRegr
. This plot will have data from the analysis and a predicted simulation. R.squared and P value is generated and pasted onto the plot.
Lasty, quickReg
will generate a simple regression plot between a miRNA-mRNA interacting pair of choice. The odds-ratio and confidence intervals for 95% will be calculated and pasted onto the plot.
Note Regression and correlation analysis is suitable for longer and more regular time series. Section 5 discusses how longer time series datasets may be with TimiRGeN if users do not wish to perform pair-wise DE.
MAE3 <- makeMapp(MAE3, filt_df = assay(MAE3, 6), miR_IDs_adj = assay(MAE, 5), dataType = "L")
MAE3 <- makeDynamic(MAE = MAE3, miR_expression = assay(MAE2, 9), mRNA_expression = assay(MAE2, 10), miR_IDs_adj = assay(MAE, 5), dataType = "L")
write.table(assay(MAE3, 8), "MAPP.txt", quote = FALSE, row.names = FALSE, col.names = FALSE, sep = "\t")
write.csv(assay(MAE3, 9), "Dynamics.csv", row.names = TRUE, quote = FALSE)
TimiRGeN creates two types of files which can be imported by PathVisio [12]. Firstly, MAPP data which contains miR information. This can be imported into PathVisio via the MAPPbuilder app. This will allow all the miRs to be imported at once. From here some manual adjustments are necessary to properly place the miRs where they are predicted to be effecting the signalling pathway. Next is the Dynamic data which can be imported as an expression data set to visualise the changes that occur to the pathway along the time course.
Please follow the instructions in the (Pathvisio_GRN_guide.pdf)[https://github.com/Krutik6/TimiRGeN/issues/2] for the reccomended method for GRN construction. If any entrezgene/ ensembl IDs are missing, it is recommended to annotate them manually in the MAPP and Dynamics files before importing them into PathVisio. It is also recommended to go through PathVisio tutorials [12] for better usage of the network visualisation tool.
RCy3::cytoscapePing()
cytoMake(assay(MAE3, 6), titleString = "Lung Fibrosis Pathway", collectionString = "PathwaysforKidneyFibrosis")
The internal R network generated by quickNet
can be exported into Cytoscape [11] by using functions from RCy3 [5]. TimiRGeN has wrapped around several of these functions to make the process quicker. This will be useful if many miR-mRNA interactions have been found and the internal R graphics cannot cope with this.
Make sure Cytoscape version 3.7 or newer is opened, and check if R recognises that Cytoscape is open by using using the cytoscapePing()
function. From here further investigation is possible by using cytoscape apps.
TimiRGeN also offers the option of analysing the miR and mRNA data separately. Many functions are similar to the combined method, but the "s"
mode is used for some functions, rather than the "c"
mode.
To go through this we are using a Human Breast Cancer dataset which had Hypoxia induced. This data was published in Camps et al (2014) [28]. The mRNA data was downloaded from GSE47533 and the miR data was downloaded from GSE47532.
library(org.Hs.eg.db)
data("hs_miR")
rownames(hs_miR) <- gsub(rownames(hs_miR), pattern = "\\.", replacement = "-")
rownames(hs_miR) <- sub("-$", "*", rownames(hs_miR))
For TimiRGeN to work the rownames and colnames must adhere to naming conventions. As mentioned above, the time point should be the first part of each column name and the result type from differential expression analysis (Log2FC, adjPVal) should be the second part of the column name. These two parts must be separated by a .
, which can be the only .
in the column name string.
Furthermore, there should be no _
characters in the column names. Acceptable names: D1.log2fc, H5.ajdpval, TP8.zscore ect. Not acceptable names: D.1.log2fc, H5_adjpval, TP8zscore.
For microRNAs (because naming conventions vary quite a bit), TimiRGeN uses it’s own naming convention which must be adhered to. A -
should separate parts of the microRNA name e.g. mmu-miR-193a, mmu-miR-140-5p, hsa-let-7a. The human miR data loaded in this example is from a dataset which uses .
instead of -
, in the miR names. Depending on your own data, you may have to change the nomenclature to run the data through TimiRGeN, like what was done above.
Some miR data will have a .
at the end of the gene names because there can be multiple transcripts with the same name. It is best to keep these extra genes. It is recommended to replace the final .
with a *
symbol to make this more obvious.
data("hs_mRNA")
MAE <- startObject(hs_miR, hs_mRNA)
MAE <- getIdsMir(MAE = MAE, assay(MAE, 1), orgDB = org.Hs.eg.db, miRPrefix = "hsa")
MAE <- getIdsMrna(MAE = MAE, assay(MAE, 2), mirror = "useast", species = "hsapiens", orgDB = org.Hs.eg.db)
## [1] "BiomaRt server connection was established."
MAE <- addPrefix(MAE = MAE, gene_df = assay(MAE, 1), prefixString = "miR")
MAE <- addPrefix(MAE = MAE, gene_df = assay(MAE, 2), prefixString = "mRNA")
addPrefix
function will differentiate the miR and mRNA column names. Add a unique prefixString
to the miR and mRNA data to do this. This must be used for "s"
analysis.
Here we use ensemble data instead of entrezgene ID. Use the gmtEnsembl
function to get WikiPathways data which is curated with ensembl IDs.
MAE2 <- gmtEnsembl(MAE = MAE2, path_gene = assay(MAE2, 1), path_data = assay(MAE2, 3), orgDB = org.Hs.eg.db)
Since this dataset was from microarray data, it is useful to use the probes as the backround/ universe. This will tell us, out of all the genes captured by the microarrays, which are significantly overexpressed within WikiPathways. For this, users will need to create their own universe (list of annotation IDs stored as characters). For this dataset a list of available annotation IDs was created using downloadable data from Agilent-019118 Human miRNA Microarray 2.0 G4470B (miRNA ID version) and Illumina HumanWG-6 v3.0 expression beadchip.
data("hs_probes")
MAE2 <- enrichWiki(MAE = MAE2, method = "s", ID_list = metadata(MAE)[[4]], orgDB = org.Hs.eg.db, path_gene = assay(MAE2, 4), path_name = assay(MAE2, 2), ID = "ENSEMBL", universe = hs_probes$ensembl_gene_id)
Note. If there are not enough enriched pathways found with the "s"
method, over representation analysis will fail. This occurs often with microRNA data because not many pathways have microRNAs within them. If this occurs, try to use the "c"
method for miR-mRNA enrichment analysis.
MAE2 <- wikiMatrix(MAE = MAE2, ID_list = metadata(MAE)[[4]], wp_list = metadata(MAE2)[[2]])
MAE2 <- turnPercent(MAE = MAE2, wikiMatrix = assay(MAE2, 6))
Insert miR
or mRNA
as Data_string
for createClusters
. This should be the same as the prefixString
added before.
MAE2 <- createClusters(MAE = MAE2, method = "s", dataString = "mRNA", percentMatrix = assay(MAE2, 7), noClusters = 2, variance = 0.99)
clusterCheck(Clusters = metadata(MAE2)[[3]], W = FALSE)
quickFuzz(Mfuzzdata = experiments(MAE2)[[9]], Clusters = metadata(MAE2)[[3]], W = FALSE)
MAE2 <- returnCluster(MAE2, clusterData = assay(MAE2, 8), whichCluster = 1, fitCluster = 0.5)
From here, once a pathway of interest is found, continue on from section 2.7
to generate miR-mRNA interaction networks.
If only mRNA or miR data is available, this can also be put through TimiRGeN for time course analysis.
It is easier to run mRNA data by itself, rather than miR data because very few miRs are annotated within signalling pathways.
Note. Conditional data should work here as well.
data("mm_mRNA")
Data <- startObject(miR = NULL, mRNA = mm_mRNA)
Data <- getIdsMrna(MAE = Data, mRNA = assay(Data, 2), mirror = "useast", species = "mmusculus", orgDB = org.Mm.eg.db)
Data <- genesList(Data, method = "c", genetic_data = assay(Data, 2), timeString = "D")
Data <- significantVals(Data, method = "c", geneList = metadata(Data)[[1]], maxVal = 0.05, stringVal = "adjPVal")
Data <- addIds(MAE = Data, method = "c", filtered_genelist = metadata(Data)[[2]], miR_IDs = assay(Data, 3), mRNA_IDs = assay(Data, 3))
Data <- eNames(MAE = Data, method = "c", gene_IDs = metadata(Data)[[3]])
Data2 <- MultiAssayExperiment()
Data2 <- dloadGmt(MAE = Data2, species = "Mus musculus")
Data2 <- enrichWiki(MAE = Data2, method = "c", ID_list = metadata(Data)[[4]], orgDB = org.Mm.eg.db, path_gene = assay(Data2, 1), path_name = assay(Data2, 2), ID = "ENTREZID", universe = assay(Data2, 1)[[2]])
quickBar(X = metadata(Data2)[[1]][[1]], Y = names(metadata(Data2)[[1]][1]))
If users have longer time courses they may wish to perform differential expression over a cubic spline or use the “LRT” method in DESeq2, among other options. In this case, they will not be able to retrieve a Log2FC and adjusted P value per time point. TimiRGeN can still accommodate for this. After a non pair-wise based DE experiment, users should filter out significantly DE genes from averaged normalised or raw counts/ expression values. Naming conventions for the column names and microRNA names will be the same as for pair-wise DE analysis input. There should only be one column for each time point.
For a less stringent analysis it is recommended to use all the microRNAs available, rather than only the significantly differentially expressed ones.
In this example a subset of data from a breast cancer dataset is used. It can be downloaded from GSE78169 [29].
library(TimiRGeN)
library(org.Hs.eg.db)
data(long_data)
miRNA <- long_data[c(1:105), ]
mRNA <- long_data[-c(1:105), ]
MAE <- startObject(miRNA, mRNA)
MAE <- getIdsMir(MAE = MAE, miR = MAE[[1]], orgDB = org.Hs.eg.db, miRPrefix = "hsa")
MAE <- getIdsMrna(MAE = MAE, mRNA = MAE[[2]], mirror = "www", species = "hsapiens", orgDB = org.Hs.eg.db)
MAE <- combineGenes(MAE = MAE, miR_data = MAE[[1]], mRNA_data = MAE[[2]])
MAE <- genesList(MAE = MAE, method = "c", genetic_data = MAE[[9]], timeString = "H", miR_data = )
MAE <- addIds(MAE = MAE, method = "c", filtered_genelist = metadata(MAE)[[1]], miR_IDs = MAE[[3]], mRNA_IDs = MAE[[7]])
MAE <- eNames(MAE = MAE, method = "c", gene_IDs = metadata(MAE)[[2]])
MAE2 <- MultiAssayExperiment()
MAE2 <- dloadGmt(MAE = MAE2, species = "Homo sapiens")
MAE2 <- enrichWiki(MAE = MAE2, method = "c", ID_list = metadata(MAE)[[3]], orgDB = org.Hs.eg.db, path_gene = MAE2[[1]], path_name = MAE2[[2]], ID = "ENTREZID", universe = MAE2[[1]][[2]])
quickBar(metadata(MAE2)[[1]][[1]], Y = "Enriched Pathways")
Here pathways that are enriched over the entire time course can be identified.
Alternatively, users that use averaged expression data as an input can create temporal clusters from their miRNA-mRNA data. From here genes which associate with a temporal cluster can be sorted and used as input for overrepresentation analysis.
library(TimiRGeN)
library(org.Hs.eg.db)
data(long_data)
miRNA <- long_data[c(1:105), ]
mRNA <- long_data[-c(1:105), ]
MAE <- startObject(miRNA, mRNA)
MAE <- getIdsMir(MAE = MAE, miR = MAE[[1]], orgDB = org.Hs.eg.db, miRPrefix = "hsa")
MAE <- getIdsMrna(MAE = MAE, mRNA = MAE[[2]], mirror = "www", species = "hsapiens", orgDB = org.Hs.eg.db)
MAE <- combineGenes(MAE = MAE, miR_data = MAE[[1]], mRNA_data = MAE[[2]])
MAE <- createClusters2(MAE = MAE, genetic_data = assay(MAE, 9))
quickFuzz(Mfuzzdata = MAE[[11]], Clusters = metadata(MAE)[[1]], ylab = "Standardised Gene Expression")
MAE <- clusterList(MAE = MAE, clusterData = assay(MAE, 10), fitCluster = 0.5, miR_IDs = assay(MAE, 3), mRNA_IDs = assay(MAE, 7))
MAE2 <- MultiAssayExperiment()
MAE2 <- dloadGmt(MAE = MAE2, species = "Homo sapiens")
MAE2 <- enrichWiki(MAE = MAE2, method = "c", ID_list = metadata(MAE)[[2]], orgDB = org.Hs.eg.db, path_gene = MAE2[[1]], path_name = MAE2[[2]], ID = "ENTREZID", universe = MAE2[[1]][[2]])
savePlots(largeList = metadata(MAE2)[[1]], maxInt = 5, fileType = "png")
Once a pathway of interest has been found, users can continue on miRNA-mRNA pathways analysis from section 2.7
.
Some datasets may include time as one of several interventions. TimiRGeN “vanilla” usage considers time as the only variable. However, the downstream analysis can be used as a form of meta-analysis between multiple interventions over time.
A time series dataset of surgically induced mouse kidney fibrosis was also generated and downloaded from this repository GSE118341 [30]. The time series datasets (Unilateral ureter obstruction (UUO) was run through TimiRGeN (following the code found in section 2). Folic acid data was also run through the same analysis in section 2. Results from the analysis can be contrasted to find differences between kidney fibrosis via FA and UUO.
Both FA (left) and UUO (right) show that Igf1
is targetted by mmu-miR-18a-5p
. Regression analysis can contrast this miRNA-mRNA pair in both datasets.
It seems we can be confident that the Ifg1-mmu-miR-18a-5p
interaction has stronger relationship in the FA dataset than in the UUO dataset due to the higher odd-ratio and the lower range of the 95% confidence intervals.
Notably this example also shows warnings from the UUO dataset because it only has three time points. Such datasets will produce warning messages when users try to perform cross-correlation analysis, interpolation and regression analysis.