element;intro
#welcome;Welcome to the <b>Data Input</b> tour for <code>GeDi</code>.
#Step1;The tour will take you through the relevant elements of the UI. You can start the tour in each section by clicking on the dedicated button. You can (re)start the tour any time by clicking the dedicated button. The guided introduction can be exited any time by clicking outside of the highlighted area.
#tab_data_upload; In this tab you will provide your <b>data</b> which will be used to calculate the <b>geneset distance scores</b>.
#ui_panel_data_input; In <b>Step 1</b> you provide your data. You can either provide your data as a <b>data.frame</b> in a RDS object, or as either a plain text file or in csv format. As mentioned in the <b>Welcome panel</b>, there are only minimal format requirements for the input. The data is expected to consist of two specific columns: one named <b>Genesets</b>, containing identifiers for individual genesets in your data, and another named <b>Genes</b>, which includes the genes belonging to the genesets in your input. For additional information, please refer to the package's vignette.
#btn_loaddemo; However, if you are not entirely sure how the input data should look like or you want to test out <code>GeDi</code> before generating your own data, you can click this button to load some demo data which can be used to explore the app.
#btn_showDataStructure; With the <b>Have a look at the data structure</b> button, you can view a screenshot of the expected data's organizational layout. This feature allows you to visually explore how the data should be structured, providing insight into its organization and facilitating a better understanding of its components.
#Genesets_preview; After inputting your data, you can view it by expanding the collapsed <b>Geneset preview</b> box. Simply click the plus sign located at the right corner of the box to expand it. Likewise, to collapse the expanded element, click the sign again (once expanded, the plus sign will change to a minus sign).
#ui_filter_data; After successfully loading input data, an <b>Optional Filtering Step</b> box will become visible. You are encouraged to utilize this feature to identify large and generic genesets in your data, which may not significantly contribute to the overall analysis results but could adversely affect runtime. The selection of an appropriate filtering metric is left to your discretion. Potential metrics may include filtering the largest 10% of genesets or genesets with more than 100 associated genes. However, it's important to note that filtering is not obligatory and can be skipped if desired.
#ui_panel_specify_species; Once you've successfully loaded and optionally filtered your data, you can proceed to <b>Step 2</b>. In this step, you have the option to specify the species of your data to download the corresponding <b>Protein-Protein Interaction (PPI)</b> matrix. This PPI matrix is required for computing the <b>pMM score</b>, one of the available distance scores in the <b>Distance Scores</b> panel. You can choose the species of your data from a list of preselected species or type the name of your species in the provided box. If you're uncertain about the spelling or whether your organism of interest has an associated PPI matrix, you can utilize the link provided in the box to access the <a href="https://string-db.org/STRING">STRING</a> website. It's important to note that this step is optional, as the PPI is not required for all available scores. You can skip this step and return later to download the PPI if you intend to use the pMM score.
#ui_panel_download_ppi; Once you specified the species of your data, you can proceed to download the corresponding PPI matrix in <b>Step 3</b>. Please note that this process may take some time - you can monitor the progress of your download in the lower right corner of the page.
#PPI_preview; Once the PPI matrix has been downloaded, you can review it by expanding the collapsed <b>PPI preview</b> box. The PPI matrix is represented as a <code>data.frame</code> and comprises three columns: <b>Gene1</b> and <b>Gene2</b>, which contain the gene symbols of the interacting proteins, and a column named <b>combined_score</b>, indicating the confidence level of the interaction. This score is calculated based on the number of known interactions between two proteins and is normalized to the (0, 1) interval using the formula: <math display="block"><mi>combined_score</mi><mo>=</mo><mfrac><mrow><mi>#interactions</mi><mo>-</mo><mi>min</mi></mrow> <mrow><mi>max</mi><mo>-</mo><mi>min</mi></mrow></mfrac></math> where <b>min</b> and <b>max</b> represent the minimum and maximum number of interactions, respectively.
#ui_panel_download_ppi; Instead of downloading a PPI in Step 3, you also have the option to load a previously saved PPI matrix from your device. The PPI matrix should adhere to the specified data structure with three columns: Gene1, Gene2, and combined_score. Depending on the size of the matrix, the upload process may take some time. Please monitor the upload progress using the progress bar that appears after selecting the file.
#save_ppi; To minimize download times in each <code>GeDi</code> session, you have the option to save a downloaded PPI matrix to your device using the <b>Save PPI matrix</b> button. Subsequently, rather than re-downloading the PPI matrix in a new <code>GeDi</code> session, you can upload the locally saved version using the <b>Upload a PPI matrix</b> functionality provided above.
#sidebar; Once you have successfully completed all three steps of the data input, you can navigate to the next panel via the sidebar.
#Thanks;Thank you for taking the <b>Data Input</b> tour of <code>GeDi</code>!
