element;intro
#welcome;Welcome to the <b>Clustering Graph</b> tour for <code>GeDi</code>.
#Step1;The tour will take you through the relevant elements of the UI. You can start the tour in each section by clicking on the dedicated button. You can (re)start the tour any time by clicking the dedicated button. The guided introduction can be exited any time by clicking outside of the highlighted area.
#sidebar; This panel requires that you first calculate distance scores on your genesets. If you have not done so yet, please use the sidebar to navigate to the <b>Distance Scores</b> panel and calculate distance scores.
#clustering_selection_box; Once you have calculated distance scores for your input data, you can start clustering the genesets based on the distance scores. For the clustering, there are several different clustering algorithms available.
#clustering_selection_box; The first clustering algorithm available is the <b>Louvain</b> algorithm. The algorithm is widely used in biological network analysis. You can specify a similarity threshold via the slider. All genesets with a distance smaller than the threshold will be considered for forming a cluster. In the Louvain algorithm, each geneset can only be part of one cluster.
#clustering_selection_box; The second clustering algorithm is the <b>Markov</b> algorithm, which is also widely used in biological network analyses. In this algorithm, a node can be part of several clusters. There is also a slider to select the threshold which determines if two genesets are considered similar. Genesets are similar if their distance score is smaller than the threshold.
#clustering_selection_box; The third clustering algorithm is a <b>Fuzzy</b> clustering algorithm in which a node can be part of several clusters. In cormparison to the other two algorithms, this one is a little more complex and has a larger number of thresholds which all determine if two genesets are considered similar and shoud be clustered together. The first parameter is the <b>Similarity Threshold</b>. This threshold determines which genesets are classified as similar by evaluating for which pair of genesets it holds <math display="block"><mi>distance_score(A, B)</mi><mo>&#8804</mo><mi>simThreshold</mi></math> The default threshold for this parameter is set to 0.3. The second parameter is the <b>Membership Threshold</b>. This parameter indicates how many members of a proposed cluster need to have a close relationship, for example: <math display="block"><mi>distance_score(A, B)</mi><mo>&#8804</mo><mi>simThreshold</mi></math> to each other in order for the cluster to be formed. For example, if the Membership Threshold is set to 0.5 (50%) at least half of the members in a cluster need to have a close relationship to each other. If this is not the case the cluster will not be considered as such. The default value of this threshold is 0.5. The last parameter to set is the <b>Clustering Threshold</b>. This threshold determines how many members two clusters need to share in order for them to be merged into one larger cluster. Hence, this threshold can strongly influence the resulting number of clusters. The default value for this parameter is 0.5.
#clustering_selection_box; As last clustering algorithm, there is <b> PAM </b> clustering available which is a derivative of the commonly known kMeans algorithm. The algorithm divides the data into a specified number k of clusters. Via the slider you can select a value for k.
#cluster_data; Once you selected an algorithm and determined the appropriate threshold(s), you can calculate the clusters of your data by clicking the <b>Cluster the Genesets</b> button. This operation can take some time depending on the number of genesets you have in your input data and your selected thresholds, so please have a look at the progress bar in the lower right corner which will inform you on the state of the clustering of your data.
#tabcard_cluster; Once the clusters of your data are determined, they will be visualized in this box.
#tabcard_cluster; In the <b>Geneset Graph</b> card, the individual clusters will be visualized as a graph. The nodes are individual genesets and edges are drawn between genesets which belong to the same cluster. You can select and highlight individual nodes by using the <b>Select by id</b> functionality on the left or you can highlight all nodes of a cluster by selecting the respective cluster via the <b>Select by cluster</b> option. In this graph, only genesets belonging to at least one cluster will be shown, so please do not be suprised if you cannot find a specific geneset of your input data.
#tabcard_cluster; In order to provide additional information in the <b>Geneset Graph</b>, the option to color the nodes by specific parameters of your input data is provided. You can select one of the available options via the <b>Color the graph by</b> drop down menu. The menu will be depending on the information given together with your input data. If your input data only contains geneset ids and genes, this option will not provide additional value. However, if your data contains more information on the genesets, you can select from those.
#tabcard_cluster; You can also interact with the shown network, by relocating individual nodes of the network. For this simply select one of the nodes, hold the left button of your mouse and drag the node to a place of your choice. With this you can change the placement of individual nodes and cluster. This is especially helpful in cluttered graphs and crowded areas of the network.
#tabcard_cluster; The <b>Cluster-Geneset Bipartite Graph</b> is a bipartite visualization of the clusters. Here, the nodes are clusters and genesets and edges are drawn between cluster nodes and their repsective geneset members . Upon hovering over the nodes, you will also get additional information about the data. For the cluster nodes, the members of the respective cluster will be listed, while for the geneset nodes the respective genes belonging to the genesets will be shown.
#tabcard_cluster; The <b>Cluster Enrichment Terms Word Cloud</b> will visualise the most commonly used terms for each cluster. This visualization is especially valuable if your data also contains a small description of the genesets besides the required input data. Via the <b>Select a cluster</b> drop-down menu, you can specify the cluster. You can also hover over the enrichment map and select individual terms to see how many times the term appeared in the description of the genesets in this cluster.
#card_clustering; The cluster information is also summarized in a table-like format in the <b>Clustering graph summaries</b> box. The table will show each geneset and the cluster the respective geneset belongs to. The table also has a search function, so you can easily look for a geneset of interest.
#Thanks;Thank you for taking the <b>Clustering Graph</b> tour of <code>GeDi</code>!
