The day-to-day development version from the Github repository can be installed.
# From Github
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("robogeno/SVMDO")
# From Biocodunctor
if(!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("SVMDO")
#library(SVMDO)
#Main screen of GUI
#runGUI()
# OR
#SVMDO::runGUI()
Transcriptome-based supervised classification can be difficult to handle due to including vast amount of junk features reducing the efficiency of the process. To minimize the efficiency problem, feature selection methods have been applied [1]. Ensemble methodology in constructing a feature selection method has been mainly preferred for its better accuracy in classification processes [2]. Cancer-based studies have been reported to co-occur with several chronic diseases [3] and oncogenic viral infections [4]. Disease Ontology (DO) enrichment analysis can provide a filtration approach through reported diseases that interact with the gene sets used [5]. Wilk’s lambda criterion is one of the moderate techniques for feature selection. The method combines different features(genes) step by step based on their contributions to the discriminating power of overall model [6]. Thus, disease related genes with better discriminatory characteristics can be acquired. SVMDO is an easy-to-use GUI using disease information for detecting tumor/normal sample discriminating gene sets from differentially expressed genes. Our approach is based on an iterative algorithm filtering genes with disease ontology enrichment analysis and wilk’s lambda criterion connected to SVM classification model construction. Along with gene set extraction, SVMDO also provides individual prognostic marker detection. The algorithm is designed for FPKM and RPKM normalized RNA-Seq transcriptome datasets. During the development our algorithm, Bioconductor provided a robust approach for acquiring significant disease-gene interaction information from disease ontology database.
SVMDO was developed by using Shiny package. It is available for Windows and Linux from Bioconductor website. SVMDO requires the following R packages: Shiny, ShinyFiles, golem, nortest, e1071, BSDA, data.table, sjmisc, klaR, caTools, dplyr, caret, survival, DOSE, AnnotationDbi, DOSE, org.Hs.eg.db
RNA-Seq cancer transcriptome and clinical datasets must be prepared before applying them in SVMDO. The datasets must be used in txt format. The expected designs of input datasets are indicated in Table-1 and Table-2 for transcriptome and clinical datasets respectively.
id | tissue_type | AB1G | A2M |
---|---|---|---|
TCGA-AA-3662 | Normal | Exp_1 | Exp_1 |
TCGA-AA-3514 | Normal | Exp_2 | Exp_2 |
TCGA-D5-6541 | Tumour | Exp_3 | Exp_3 |
… | … | … | … |
id | days_to_death | vital_status |
---|---|---|
TCGA-AA-3662 | 49 | Alive |
TCGA-AA-3514 | 1331 | Dead |
TCGA-D5-6541 | 225 | Dead |
… | … | … |
Using tissue_type and id as column names for representing tissue information and TCGA sample id is mandatory. If there is not any requirement for survival analysis, preparation of clinical dataset and involvement of id column in the transcriptome dataset are optional.
The main dialog box is indicated in Figure-1. The GUI page consists of two main sections which are “Analysis” and “Result”. In the Analysis section, steps of acquiring discriminative gene set and preparations of individual survival plots are included. In the Result section, visualization and download of discriminative gene set and survival plots are included. At each step of the GUI sections, necessary variables are saved as objects in the workspace environment to be used in the following steps. To provide experience about the GUI usage, a test section involving dummy example using SummarizedExperiment objects of transcriptome (small form) and clinical datasets is also included.
To open GUI screen, you can directly write SVMDO::runGUI() in R console. If library is previously activated you can open GUI screen by writing runGUI() in R console.
(Important: Except file input processes, each step provides a message box indicating process success/failure. It disappears after clicking on any area in the GUI screen. This is necessary for continuing the steps.)
SVMDO includes test datasets providing dummy examples for gaining experience on the GUI usage. Test datasets consists of summarized experiment objects including expression and clinical datasets. These objects are saved in RDA forms and called from extdata folder of the package. As expression datasets, test files includes simplified forms of TCGA-COAD (COAD) and TCGA-LUSC (LUSC) with 400 genes In test-based analysis, predetermined expression and clinical datasets are automatically uploaded into the GUI. Furthermore, predefined input size (n=50) is also automatically applied. Therefore, users have to continue with DO Analysis after DEG Analysis without adjusting input size.
When the user task is completed, click on the Clear Environment button to remove the global variables created during the algorithm sections. To prevent error in the next usages of GUI, it is a necessary process. It can be applied at any moment without the necessity of completing all of the steps of algorithm.
RStudio requires R 3.3.0+. Choose a version of R that matches your computer’s operating system
**R-base Install:
**RStudio Install:
**R-base Install:
**RStudio Install:
#> R version 4.3.0 RC (2023-04-13 r84269)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.2 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.17-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] BiocStyle_2.28.0
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.31 R6_2.5.1 codetools_0.2-19
#> [4] bookdown_0.33 fastmap_1.1.1 xfun_0.39
#> [7] cachem_1.0.7 knitr_1.42 htmltools_0.5.5
#> [10] rmarkdown_2.21 cli_3.6.1 sass_0.4.5
#> [13] jquerylib_0.1.4 compiler_4.3.0 highr_0.10
#> [16] tools_4.3.0 evaluate_0.20 bslib_0.4.2
#> [19] yaml_2.3.7 BiocManager_1.30.20 jsonlite_1.8.4
#> [22] rlang_1.1.0