--- title: "glycoTraitR Quick Start" author: "Bingyuan Zhang" date: "`r Sys.Date()`" abstract: > glycoTraitR provides a streamlined workflow for extracting, summarizing, and analyzing glycan structural traits from glycoproteomics GPSM data generated by pGlyco3 or Glyco-Decipher. The package represents glycan traits using SummarizedExperiment data structures, enabling seamless downstream statistical analysis and visualization. output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Quick Start} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::knitr} --- ```{r style, echo = FALSE, results = 'asis'} BiocStyle::markdown() ``` # Introduction glycoTraitR is designed to provide users a streamlined workflow for extracting, summarizing, and analyzing glycan structural traits from GPSM data generated by **pGlyco3** or **Glyco-Decipher**. Building on the Bioconductor core data infrastructure by representing glycan trait data using the `SummarizedExperiment` class, enabling interoperability with downstream Bioconductor analysis tools, including packages such as `limma` and related differential analysis frameworks. Rather than focusing on individual glycoforms, glycoTraitR converts glycan structures into biologically interpretable composition- and structure-based traits, and summarizes these traits at the glycopeptide (site) or protein level. In this quick start guide, we demonstrate how to: - parse glycan structures from common glycoproteomics formats (WURCS 2.0 and pGlyco3 glycan strings) - compute glycan composition and structural traits - summarize glycan–PSM information at site/protein level (`SummarizedExperiment`) - performing differential trait analysis and visualizing trait changes ## Installation The glycoTraitR package can be installed using **BiocManager**. ```{r install, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("matsui-lab/glycoTraitR") ``` After installation, we load the package into the current R session. All functions used in the following examples are provided by *glycoTraitR*. ```{r load} library(glycoTraitR) ``` # Quick Start In this quick start tutorial, we walk through a typical *glycoTraitR* analysis workflow using example GPSM data generated by **pGlyco3**. The goal is to demonstrate how glycan structures can be transformed into biologically interpretable traits, summarized at the protein/site level, and analyzed by statistical comparison between groups. ## 1. Load example data We begin by loading example GPSM data and accompanying sample metadata. For reproducibility and convenience, this vignette uses a small toy dataset bundled with the package. The commented code above shows how users can instead download and analyze the full public dataset hosted on **Zenodo**. ```{r load-example-data} # The following is an example file from Zenodo # url <- "https://zenodo.org/records/17756830/files/pGlycoDB-GP-FDR-Pro-Quant-Site.txt?download=1" # download.file(url, "pGlyco3_GPSM.txt", mode = "wb") # gpsm <- read_pGlyco3_gpsm("pGlyco3_GPSM.txt") # meta_url <- "https://zenodo.org/records/17759790/files/meta_mcp_2022_100433.rds?download=1" # download.file(meta_url, "meta_example.rds", mode = "wb") # meta <- readRDS("meta_example.rds") # The following is a smaller toy example files from package path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR") gpsm_toyexample <- read_pGlyco3_gpsm(path) data("meta_toyexample") head(gpsm_toyexample) head(meta_toyexample) ``` The `gpsm_toyexample` object contains glycopeptide-spectrum matches, including glycan structures encoded in pGlyco3 format. The `meta_toyexample` object provides sample-level annotations that will later be used to define experimental groups. ## 2. Build glycan trait matrices Next, we convert GPSM-level glycan information into glycan trait matrices. This step parses glycan structures, computes composition and structural traits, and summarizes the resulting values at the **protein level** using a `SummarizedExperiment` object. ```{r build-trait} trait_se <- build_trait_se( gpsm_toyexample, from = "pGlyco3", motifs = NULL, level = "protein", meta = meta_toyexample ) trait_se ``` The resulting `trait_se` object stores one assay matrix per glycan trait. Rows correspond to proteins, columns correspond to GPSMs, and values represent trait-specific glycan abundances. ## 3. Differential analysis We now perform differential analysis to identify glycan traits that differ significantly between experimental groups. As an example, we compare samples labeled as *Normal* and *Symptomatic* based on the `Diagnosis` column in the sample metadata. ```{r DA-trait} changed_traits <- analyze_trait_changes( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), min_psm = 20 ) head(changed_traits) ``` The output table reports glycan traits and corresponding proteins for which either the Welch t-test or Levene’s variance test is significant (p < 0.05). Each row represents a specific trait–protein combination showing evidence of differential behavior between the two groups. ## 4. Trait distribution visualization We can visualize its distribution at the GPSM level. Here, we select the *Hexose* trait for the protein *MOG* as an example. ```{r plot-trait} p <- plot_trait_distribution( trait_se = trait_se, group_col = "Diagnosis", group_levels = c("Normal", "Symptomatic"), trait_name = "Hexose", feature = "MOG" ) p$p_hist p$p_box ``` The histogram shows the distribution of GPSM-level trait abundances across groups, while the boxplot summarizes group-wise differences and variability. # Define Your Own Glycan Motifs In addition to pre-defined glycan traits, *glycoTraitR* allows users to define custom glycan motifs of biological interest. Next we demonstrate how to define the glycan motifs of users' own interests. We show how *glycoTraitR* detects these motif occurrences within glycan structures. We will: - Parse glycan structures into graph trees - Visualize full glycans & motifs - Perform **subgraph isomorphism** matching using *igraph* functions - Count occurrences of motif patterns ```{r load-igraph, message=FALSE} library(igraph) ``` ----- ## 1. Parse and plot glycan trees We first parse glycan structures into tree representations. Internally, glycans are represented as graphs, where nodes correspond to monosaccharides and edges represent glycosidic linkages. ```{r example-pGlyco3} # Example: parsed glycan trees from existing GPSM data path <- system.file("extdata", "pGlyco3_gpsm_toyexample.txt", package = "glycoTraitR") gpsm_toyexample <- read_pGlyco3_gpsm(path) data("meta_toyexample") glycans <- lapply(gpsm_toyexample$GlycanStructure, glycoTraitR:::pGlyco3_to_tree) ``` We use the function `build_glycan_igraph()` to visualize an example of glycan structure. ```{r visualize-tree} # visualize example structures igraph_obj <- build_glycan_igraph(glycans[[1]]) plot_glycan_tree(igraph_obj) ``` This visualization highlights the branching structure and residue composition of a glycan, which is essential for understanding motif topology. ------------------------------------------------------------------------ ## 2. A Simple User Motif (Example 1) This example demonstrates how to define and detect a simple glycan motif consisting of three hexose (H) residues connected in series. Here we define a **linear trisaccharide motif**: - H — H — H - Edges: `"a-b"`, `"b-c"` ```{r example1} #### Example 1 #### # (1) Select a glycan from pGlyco3 data tree_full <- glycans[[24]] g_full <- build_glycan_igraph(tree_full) # (2) Define motif manually motif <- list( node = c("H", "H", "H"), edge = c("a-b", "b-c") ) g_motif <- build_glycan_igraph(motif) # (3) Perform subgraph matching n_match <- count_subgraph_isomorphisms( g_motif, g_full, method = "vf2", vertex.color1 = factor(V(g_full)$type), vertex.color2 = factor(V(g_motif)$type) ) # (4) Visualize par(mfrow = c(1, 2)) plot_glycan_tree(g_motif) plot_glycan_tree(g_full) print(paste(n_match, "motif(s) found in the glycan")) ``` The reported match count indicates how many times the specified motif occurs within the full glycan structure. ------------------------------------------------------------------------ ## 3. Motif Containing Fucose (Example 2) More complex motifs can also be defined, including branched structures and specific monosaccharide types such as fucose (F). This example defines a **branched motif** containing Fucose: ```{r example2} #### Example 2 #### # (1) Select a glycan tree_full <- glycans[[1]] g_full <- glycoTraitR::build_glycan_igraph(tree_full) # (2) Symbolic motif definition motif <- list( node = c("H", "N", "F"), edge = c("a-b", "b-c") ) g_motif <- glycoTraitR::build_glycan_igraph(motif) # (3) Subgraph matching n_match <- igraph::count_subgraph_isomorphisms( g_motif, g_full, method = "vf2", vertex.color1 = factor(V(g_full)$type), vertex.color2 = factor(V(g_motif)$type) ) # (4) Visualize motif vs. full glycan par(mfrow = c(1, 2)) plot_glycan_tree(g_motif) plot_glycan_tree(g_full) print(paste(n_match, "motif(s) found in the glycan")) ``` ------------------------------------------------------------------------ ## 4. Integrating Motifs with glycoTraitR (for Trait Computation) Multiple user-defined motifs can be supplied simultaneously. Once defined, these motifs are treated as additional glycan traits and are incorporated into the trait computation workflow. ```{r udmotifs} user_motifs <- list( LinearH3 = list( node = c("H", "H", "H"), edge = c("a-b", "b-c") ), FucBranch = list( node = c("H", "N", "F"), edge = c("a-b", "b-c") ) ) ``` Each user-defined motif appears as a separate assay in the `SummarizedExperiment`, allowing it to be analyzed together by `build_trait_se()` as other built-in glycan traits. ```{r se-show} trait_se <- build_trait_se( gpsm_toyexample, from = "pGlyco3", motifs = user_motifs, level = "protein", meta = meta_toyexample ) SummarizedExperiment::assayNames(trait_se) ``` Your motif counts will automatically appear as **additional glycan traits** in the trait matrices. ------------------------------------------------------------------------ ## Conclusion In this vignette, we provided a step-by-step tutorial demonstrating how *glycoTraitR* integrates glycoproteomics output into a unified trait-based analysis framework. You can now construct your own biologically meaningful glycan motifs and include them directly in *glycoTraitR* analyses. ```{r session-info, echo=FALSE} sessionInfo() ```