--- title: "Importing/Exporting Data" author: "Shian Su" output: html_document vignette: > %\VignetteIndexEntry{Importing/Exporting Data} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) # preload to avoid loading messages library(NanoMethViz) ``` ```{r} library(NanoMethViz) ``` In order to use this package, your data must be converted from the output of methylation calling software to a tabix indexed bgzipped format. The data needs to be sorted by genomic position to respect the requirements of the samtools [tabix](http://www.htslib.org/doc/tabix.html) indexing tool. On Linux and macOS systems this is done using the bash `sort` utility, which is memory efficient, but on Windows this is done by loading the entire table and sorting within R. We currently support output from * Nanopolish * f5c * Megalodon If you would like any further other formats supported please create an issue at https://github.com/Shians/NanoMethViz/issues. ## Data example The conversion can be done using the `create_tabix_file()` function. We provide example data of nanopolish output within the package, we can look inside to see how the data looks coming out of nanopolish ```{r} methy_calls <- system.file(package = "NanoMethViz", c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz")) # have a look at the first 10 rows of methy_data methy_calls_example <- read.table( methy_calls[1], sep = "\t", header = TRUE, nrows = 6) methy_calls_example ``` We then create a temporary path to store a converted file, this will be deleted once you exit your R session. Once `create_tabix_file()` is run, it will create a .bgz file along with its tabix index. Because we have a small amount of data, we can read in a small portion of it for inspection, do not do this with large datasets as it decompresses all the data and will take very long to run. ### Megalodon Data To import data from Megalodon's modification calls, the [per-read modified bases](https://nanoporetech.github.io/megalodon/file_formats.html#per-read-modified-bases) file must be generated. This can be done by either adding `--write-mods-text` argument to Megalodon run or using the `megalodon_extras per_read_text modified_bases` utility. ## Importing data ```{r, message=F} methy_tabix <- file.path(tempdir(), "methy_data.bgz") samples <- c("sample1", "sample2") # you should see messages when running this yourself create_tabix_file(methy_calls, methy_tabix, samples) # don't do this with actual data # we have to use gzfile to tell R that we have a gzip compressed file methy_data <- read.table( gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6) methy_data ``` Now `methy_tabix` will be the path to a tabix object that is ready for use with NanoMethViz. Please head over to the "Introduction" vignette to see how to use this data for visualisation! ## Exporting data The methylation data can be exported into formats appropriate for bsseq, DSS, or edgeR. ### bsseq and DSS Both bsseq and DSS make use of the BSSeq object, and these can be obtained from the NanoMethResult objects using the `methy_to_bsseq()` function. ```{r, message = FALSE} nmr <- load_example_nanomethresult() bss <- methy_to_bsseq(nmr) bss ``` ### edgeR edgeR can also be used to perform differential methylation analysis: https://f1000research.com/articles/6-2055. BSSeq objects can be converted into the appropriate format using the `bsseq_to_edger()` function. This can be used to count reads on a per-site basis or over regions. ```{r} gene_regions <- exons_to_genes(NanoMethViz::exons(nmr)) edger_mat <- bsseq_to_edger(bss, gene_regions) edger_mat ```