--- title: "Description and usage of MsBackendMgf" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Description and usage of MsBackendMgf} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignettePackage{MsBackendMgf} %\VignetteDepends{Spectra,BiocStyle,BiocParallel} --- ```{r style, echo = FALSE, results = 'asis', message=FALSE} BiocStyle::markdown() ``` **Package**: `r Biocpkg("MsBackendMgf")`
**Authors**: `r packageDescription("MsBackendMgf")[["Author"]] `
**Last modified:** `r file.info("MsBackendMgf.Rmd")$mtime`
**Compiled**: `r date()` ```{r, echo = FALSE, message = FALSE} library(Spectra) library(BiocStyle) ``` # Introduction The `r Biocpkg("Spectra")` package provides a central infrastructure for the handling of Mass Spectrometry (MS) data. The package supports interchangeable use of different *backends* to import MS data from a variety of sources (such as mzML files). The `MsBackendMgf` package allows the import of MS/MS data from mgf ([Mascot Generic Format](http://www.matrixscience.com/help/data_file_help.html)) files. This vignette illustrates the usage of the `MsBackendMgf` package. # Installation To install this package, start `R` and enter: ```{r, eval = FALSE} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("MsBackendMgf") ``` This will install this package and all eventually missing dependencies. # Importing MS/MS data from mgf files Mgf files store one to multiple spectra, typically centroided and of MS level 2. In our short example below, we load 2 mgf files which are provided with this package. Below we first load all required packages and define the paths to the mgf files. ```{r load-libs} library(Spectra) library(MsBackendMgf) fls <- dir(system.file("extdata", package = "MsBackendMgf"), full.names = TRUE, pattern = "mgf$") fls ``` MS data can be accessed and analyzed through `Spectra` objects. Below we create a `Spectra` with the data from these mgf files. To this end we provide the file names and specify to use a `MsBackendMgf()` backend as *source* to enable data import. Note that below we also disable parallel processing by specifically *registering* the serial processing as the default. See `?bpparam` for more details on parallel processing options with the `r Biocpkg("BiocParallel")` package. ```{r import} library(BiocParallel) register(SerialParam()) sps <- Spectra(fls, source = MsBackendMgf()) ``` With that we have now full access to all imported spectra variables that we list below. ```{r spectravars} spectraVariables(sps) ``` Besides default spectra variables, such as `msLevel`, `rtime`, `precursorMz`, we also have additional spectra variables such as the `TITLE` of each spectrum in the mgf file. ```{r instrument} sps$rtime sps$TITLE ``` By default, fields in the mgf file are mapped to spectra variable names using the mapping returned by the `spectraVariableMapping` function: ```{r spectravariables} spectraVariableMapping(MsBackendMgf()) ``` The names of this `character` vector are the spectra variable names (such as `"rtime"`) and the field in the mgf file that contains that information are the values (such as `"RTINSECONDS"`). Note that it is also possible to overwrite this mapping (e.g. for certain mgf *dialects*) or to add additional mappings. Below we add the mapping of the mgf field `"TITLE"` to a spectra variable called `"spectrumName"`. ```{r map} map <- c(spectrumName = "TITLE", spectraVariableMapping(MsBackendMgf())) map ``` We can then pass this mapping to the `backendInitialize` method, or the `Spectra` constructor. ```{r import2} sps <- Spectra(fls, source = MsBackendMgf(), mapping = map) ``` We can now access the spectrum's title with the newly created spectra variable `"spectrumName"`: ```{r spectrumName} sps$spectrumName ``` In addition we can also access the m/z and intensity values of each spectrum. ```{r mz} mz(sps) intensity(sps) ``` The `MsBackendMgf` backend allows also to export data in mgf format. Below we export the data to a temporary file. We hence call the `export` function on our `Spectra` object specifying `backend = MsBackendMgf()` to use this backend for the export of the data. Note that we use again our custom mapping of variables such that the spectra variable `"spectrumName"` will be exported as the spectrums' title. ```{r export} fl <- tempfile() export(sps, backend = MsBackendMgf(), file = fl, mapping = map) ``` We next read the first lines from the exported file to verify that the title was exported properly. ```{r export-check} readLines(fl)[1:12] ``` Note that the `MsBackendMgf` exports all spectra variables as fields in the mgf file. To illustrate this we add below a new spectra variable to the object and export the data. ```{r} sps$new_variable <- "A" export(sps, backend = MsBackendMgf(), file = fl) readLines(fl)[1:12] ``` We can see that also our newly defined variable was exported. Also, because we did not provide our custom variable mapping this time, the variable `"spectrumName"` was **not** used as the spectrum's title. Sometimes it might be required to not export all spectra variables since some exported fields might not be recognized/supported by external tools. Using the `selectSpectraVariables` function we can reduce our `Spectra` object to export to contain only relevant spectra variables. Below we restrict the data to only m/z, intensity, retention time, acquisition number, precursor m/z and precursor charge and export these to an mgf file. Also, some external tools don't support the `"TITLE"` field in the MGF file. To disable export of the spectrum ID/title `exportTitle = FALSE` can be used. ```{r} sps_ex <- selectSpectraVariables(sps, c("mz", "intensity", "rtime", "acquisitionNum", "precursorMz", "precursorCharge")) export(sps_ex, backend = MsBackendMgf(), file = fl, exportTitle = FALSE) readLines(fl)[1:12] ``` # Session information ```{r} sessionInfo() ```