--- title: "Analyzing Lipid Feature Tendencies with LipidTrend" package: LipidTrend output: BiocStyle::html_document: toc: true toc_depth: 3 number_sections: true date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{LipidTrend} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r style, echo=FALSE, results='asis'} BiocStyle::markdown(css.files = c('custom.css')) knitr::opts_chunk$set(collapse=TRUE, comment="#>") ``` # Introduction `LipidTrend` is an R package designed to identify significant differences in lipidomic feature tendencies between groups. The package includes 3 main functions: 1. **analyzeLipidRegion** – Performs statistical analysis to examine lipid feature tendencies. 2. **plotRegion1D** – Visualizes results for one-dimensional lipid features. 3. **plotRegion2D** – Visualizes results for two-dimensional lipid features. Additionally, several helper functions are available to facilitate the viewing of result data in the return object. For more details, please refer to [helper function section](#helper). # Installation To install `LipidTrend`, ensure that you have R 4.5.0 or later installed (see the R Project at [http://www.r-project.org](http://www.r-project.org)) and are familiar with its usage. `LipidTrend` package is available on Bioconductor repository [http://www.bioconductor.org](http://www.bioconductor.org). Before installing `LipidTrend`, you must first install the core Bioconductor packages. If you have already installed them, you can skip the following step. ```{r install_Bioconductor, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install() ``` Once the core Bioconductor packages are installed, you can proceed with installing `LipidTrend`. ```{r install_package, eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("LipidTrend") ``` After the installation is complete, you’re ready to start using `LipidTrend`. Now, let’s load the package first. ```{r load, message=FALSE} library(LipidTrend) ``` # Data preparation `LipidTrend` requires a SummarizedExperiment object as input data. It must contain the following components: 1. **Assay**: A matrix containing lipid abundance data, where rows represent lipids and columns represent samples. 2. **RowData**: A data frame of lipid features (e.g., double bond count, chain length), where rows are lipids and columns are lipid features (limited to 1 or 2 columns). The order of lipids must match the abundance data. * If RowData contains one column, a one-dimensional analysis will be performed. * If RowData includes two columns, a two-dimensional analysis will be conducted. 3. **ColData**: A data frame containing group information, where rows represent sample names and columns must include sample name, label name, and group, arranged accordingly. If you are already familiar with constructing a SummarizedExperiment object, you can skip the following section. Otherwise, refer to the example in the rest of this section to learn how to build a SummarizedExperiment object. ## Abundance data The abundance data is a matrix containing lipid abundance values across lipids and samples, where rows represent lipids and columns represent samples. ```{r abundance} # load example abundance data data("abundance_2D") # view abundance data head(abundance_2D, 5) ``` ## Lipid characteristic table The lipid characteristic table is a data frame containing information about each lipid's characteristics, such as the number of double bonds and chain length. The order of the lipids in this table must align with the abundance data. A one-dimensional analysis will be conducted if the table has only one column, and a two-dimensional analysis will be performed if it contains two columns. The table can only have a maximum of two columns. In this example, we use data suitable for a two-dimensional analysis. ```{r char_table} # load example lipid characteristic table (2D) data("char_table_2D") # view lipid characteristic table head(char_table_2D, 5) ``` ## Group information table The group information table is a data frame containing grouping details corresponding to the samples in the lipid abundance data. It must adhere to the following requirements: 1. The columns must be arranged in the following order: `sample_name`, `label_name`, and `group`. 2. All sample names must be unique. 3. The sample names in the `sample_name` column must match those in the lipid abundance data. 4. The columns `sample_name`, `label_name`, and `group` must not contain NA values. For example: ```{r groupInfo} # load example group information table data("group_info") # view group information table group_info ``` ## Constructing SummarizedExperiment object Once the abundance data, lipid characteristic table, and group information table are prepared, we can construct the input SummarizedExperiment object. We will use the `SummarizedExperiment` function from `r Biocpkg("SummarizedExperiment")`. Follow the command below to create this object. ```{r build_se} se_2D <- SummarizedExperiment::SummarizedExperiment( assays=list(abundance=abundance_2D), rowData=S4Vectors::DataFrame(char_table_2D), colData=S4Vectors::DataFrame(group_info)) ``` # Initiate `LipidTrend` workflow The workflow of `LipidTrend` is primarily divided into one-dimensional and two-dimensional lipid feature analyses. The process begins with a SummarizedExperiment object as input data and, after computation and plotting, respectively, returns a statistical results data frame and plot. We recommend using the `set.seed()` function before starting to ensure stability in the permutation process during computation. ```{r } set.seed(1234) ``` ## One-dimensional analysis {#lipid1d} The one-dimensional analysis is applicable when the input data contains only a single lipid characteristic. Let’s briefly view the structure of the example input data. ```{r LipidTrend_1D} # load example data data("lipid_se_CL") # quick look of SE structure show(lipid_se_CL) ``` ### Analyze lipid region This section calculate p-value of lipidomic features by permutation test. The test of region statistics smoothing with Gaussian kernel integrates neighbor's information and provides a stable testing result under small sample size. You can obtain separate results for even-chain and odd-chain lipids by adequately configuring the variables `split_chain` and `chain_col.` If `split_chain` is set to `TRUE`, the results will be divided into even-chain and odd-chain lipids. By default, `split_chain` is set to `FALSE`; however, we recommend setting it to TRUE for more meaningful biological insights. **Note:** - When `split_chain=TRUE`, you must specify the column name for chain length in the `chain_col` variable. - If `split_chain=FALSE`, then `chain_col` should be NULL. ```{r countRegion1D} # run analyzeLipidRegion res1D <- analyzeLipidRegion( lipid_se_CL, ref_group="sgCtrl", split_chain=TRUE, chain_col="chain", radius=3, own_contri=0.5, permute_time=1000) # view result summary show(res1D) ``` The `analyzeLipidRegion` function produces an extended SummarizedExperiment object called `LipidTrendSE`. To facilitate result extraction, we offer several helper functions that make viewing the resulting data frame easier. For more information on these helper functions, please refer to [helper function section](#helper). ```{r Region1D_result} # view even chain result (first 5 lines) head(even_chain_result(res1D), 5) # view odd chain result (first 5 lines) head(odd_chain_result(res1D), 5) ``` ### Result visualization This section utilizes the `LipidTrendSE` output from the `analyzeLipidRegion` function to visualize the results. **Note**: - If `split_chain` is set to TRUE, two plots will be generated: one for even-chain lipids and another for odd-chain lipids. - If `split_chain` is set to FALSE, only a single plot will be returned. ```{r plotRegion1D} # plot result plots <- plotRegion1D(res1D, p_cutoff=0.05, y_scale='identity') # even chain result plots$even_result # odd chain result plots$odd_result ``` The visualization illustrates lipid trends and includes the following components: 1. **Blue/Red Ribbons** – Highlight significant regions of difference between groups. 2. **Points** – Display average abundance of each lipid across all samples. **Color Interpretation**: * Red areas indicate higher abundance in case samples. * Blue areas indicate higher abundance in control samples. ## Two-dimensional analysis {#lipid2d} The main difference between one-dimensional and two-dimensional analysis lies in the provided lipid characteristics table. One-dimensional analysis focuses on a single lipid feature, whereas two-dimensional analysis examines two lipid features simultaneously. The two-dimensional approach is applicable when the input data includes two lipid characteristics. Now, let’s take a quick look at the structure of the example input data. ```{r LipidTrend_2D} # load example data data("lipid_se_2D") # quick look of SE structure show(lipid_se_2D) ``` ### Analyze lipid region This section calculate p-value of lipidomic features by permutation test. The test of region statistics smoothing with Gaussian kernel integrates neighbor's information and provides a stable testing result under small sample size. You can obtain separate results for even-chain and odd-chain lipids by adequately configuring the variables `split_chain` and `chain_col.` If `split_chain` is set to `TRUE`, the results will be divided into even-chain and odd-chain lipids. By default, `split_chain` is set to `FALSE`; however, we recommend setting it to TRUE for more meaningful biological insights. **Note:** - When `split_chain=TRUE`, you must specify the column name for chain length in the `chain_col` variable. - If `split_chain=FALSE`, then `chain_col` should be NULL. ```{r countRegion2D} # run analyzeLipidRegion res2D <- analyzeLipidRegion( lipid_se_2D, ref_group="sgCtrl", split_chain=TRUE, chain_col="Total.C", radius=3, own_contri=0.5, permute_time=1000, abund_weight=TRUE) # view result summary show(res2D) ``` The `analyzeLipidRegion` function produces an extended SummarizedExperiment object called `LipidTrendSE`. To facilitate result extraction, we offer several helper functions that make viewing the resulting data frame easier. For more information on these helper functions, please refer to [helper function section](#helper). ```{r Region2D_result} # view even chain result (first 5 lines) head(even_chain_result(res2D), 5) # view odd chain result (first 5 lines) head(odd_chain_result(res2D), 5) ``` ### Result visualization This section utilizes the `LipidTrendSE` output from the `analyzeLipidRegion` function to visualize the results. **Note**: - If `split_chain` is set to TRUE, two plots will be generated: one for even-chain lipids and another for odd-chain lipids. - If `split_chain` is set to FALSE, only a single plot will be returned. ```{r LipidTrend_2D_plot} # plot result plot2D <- plotRegion2D(res2D, p_cutoff=0.05) # even chain result plot2D$even_result # odd chain result plot2D$odd_result ``` This plot visualizes two-dimensional lipid features, highlighting trends and significant differences between case and control groups. The X-axis represents the total carbon chain length, while the Y-axis shows the total double bond count. The color scale indicates the log2 fold-change (FC) between the case and control groups: red regions indicate higher abundance in case samples, whereas blue regions indicate higher abundance in control samples. Circle size represents the average abundance level, with larger circles indicating higher abundance. Red and blue outlines highlight regions with significant differences between the case and control groups. Asterisks (\*, \*\*, ***) signify levels of statistical significance, with more asterisks representing greater significance. Through this visualization, you can identify lipidomic patterns and differences across various lipid characteristics. # Helper Functions – enhancing result viewing {#helper} `LipidTrend` provides 4 helper functions to enhance the viewing of the `LipidTrendSE` object returned by `analyzeLipidRegion`: 1. result – Returns the result data frame. 2. even_chain_result – Returns the even-chain result data frame. 3. odd_chain_result – Returns the odd-chain result data frame. 4. show – Displays a summary of the `LipidTrendSE` object. **Notes**: * If `split_chain=TRUE`, use `even_chain_result` and `odd_chain_result` to view the results separately. Otherwise, use `result`. * To extract assay, rowData, or colData from the `LipidTrendSE` object, use functions from the `r Biocpkg("SummarizedExperiment")` package. # Session info ```{r sessionInfo, echo=FALSE} sessionInfo() ```