---
title: 'AWAggregatorData Vignette'
author:
- name: Jiahua Tan
  affiliation:
  - &1 "Department of Chemistry, University of British Columbia, Vancouver, BC, 
  Canada"
- name: Gian L. Negri
  affiliation:
  - &2 "Canada's Michael Smith Genome Sciences Centre, BC Cancer Research 
  Institute, University of British Columbia, Vancouver, BC, Canada"
- name: Gregg B. Morin
  affiliation:
  - *2
  - &3 "Department of Medical Genetics, University of British Columbia, 
  Vancouver, BC, Canada"
- name: David D. Y. Chen
  affiliation:
  - *1
date: '`r format(Sys.Date(), "%B %e, %Y")`'
package: AWAggregatorData
output: 
    BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{AWAggregatorData vignette}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

# Introduction

The `AWAggregatorData` package contains the data associated with the 
`AWAggregator` R package. It includes two pre-trained random forest models, one 
incorporating the average coefficient of variation as a feature, and the other 
one not including it. It also contains the PSMs in Benchmark Set 1\~3 derived 
from the `psm.tsv` output files generated by FragPipe, which are used to train 
the random forest models.

# Overview of Package Data

Data available in the `AWAggregatorData` package:

-   `regr`: represent the pre-trained random forest model that incorporates the 
average coefficient of variation (CV) as a feature.

-   `regr.no.CV`: represent the pre-trained random forest model that does not 
include the average CV as a feature.

-   `benchmark.set.1`, `benchmark.set.2`, `benchmark.set.3`: represents PSMs 
in Benchmark Set 1\~3 derived from the `psm.tsv` output files generated by 
FragPipe, which are used to train the random forest model. Columns unnecessary 
for the `AWAggregator` have been removed from the sample data.

# Installation

```{r install, eval=FALSE}
if (!requireNamespace('BiocManager', quietly = TRUE))
    install.packages('BiocManager')

BiocManager::install('ExperimentHub')
BiocManager::install('AWAggregatorData')
```

# Load Data from `ExperimentHub`

Data are stored via `ExperimentHub` package. The information of available 
datasets can be retrieved by the `query` function

```{r query datasets}
library(ExperimentHub)
eh = ExperimentHub()
query(eh, 'AWAggregatorData') # Require Bioconductor version 3.21 or later
```

The datasets and pre-trained models can be downloaded by:

```{r download datasets}
# Benchmark Set 1
df = eh[['EH9637']]
# Benchmark Set 2
df = eh[['EH9638']]
# Benchmark Set 3
df = eh[['EH9639']]
# Pre-trained model incorporating the average coefficient of variation (CV) as 
# a feature
regr = eh[['EH9640']]
# Pre-trained model excluding CV as a feature
regr = eh[['EH9641']]
```

```{r session info}
sessionInfo()
```