---
title: "Importing/Exporting Data"
author: "Shian Su"
output: html_document
vignette: >
  %\VignetteIndexEntry{Importing/Exporting Data}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

# preload to avoid loading messages
library(NanoMethViz)
```

```{r}
library(NanoMethViz)
```

In order to use this package, your data must be converted from the output of
methylation calling software to a tabix indexed bgzipped format. The data needs
to be sorted by genomic position to respect the requirements of the samtools
[tabix](http://www.htslib.org/doc/tabix.html) indexing tool. On Linux and macOS
systems this is done using the bash `sort` utility, which is memory efficient,
but on Windows this is done by loading the entire table and sorting within R.

We currently support output from

* Nanopolish
* f5c
* Megalodon

If you would like any further other formats supported please create an issue
at https://github.com/Shians/NanoMethViz/issues.

## Data example

The conversion can be done using the `create_tabix_file()` function. We provide
example data of nanopolish output within the package, we can look inside to see
how the data looks coming out of nanopolish

```{r}
methy_calls <- system.file(package = "NanoMethViz",
    c("sample1_nanopolish.tsv.gz", "sample2_nanopolish.tsv.gz"))

# have a look at the first 10 rows of methy_data
methy_calls_example <- read.table(
    methy_calls[1], sep = "\t", header = TRUE, nrows = 6)

methy_calls_example
```

We then create a temporary path to store a converted file, this will be deleted
once you exit your R session. Once `create_tabix_file()` is run, it will create
a .bgz file along with its tabix index. Because we have a small amount of data,
we can read in a small portion of it for inspection, do not do this with large
datasets as it decompresses all the data and will take very long to run.

### Megalodon Data

To import data from Megalodon's modification calls, the [per-read modified
bases](https://nanoporetech.github.io/megalodon/file_formats.html#per-read-modified-bases)
file must be generated. This can be done by either adding `--write-mods-text`
argument to Megalodon run or using the `megalodon_extras per_read_text
modified_bases` utility.

## Importing data

```{r, message=F}
methy_tabix <- file.path(tempdir(), "methy_data.bgz")
samples <- c("sample1", "sample2")

# you should see messages when running this yourself
create_tabix_file(methy_calls, methy_tabix, samples)

# don't do this with actual data
# we have to use gzfile to tell R that we have a gzip compressed file
methy_data <- read.table(
    gzfile(methy_tabix), col.names = methy_col_names(), nrows = 6)

methy_data
```

Now `methy_tabix` will be the path to a tabix object that is ready for use with
NanoMethViz. Please head over to the "Introduction" vignette to see how to use
this data for visualisation!

## Exporting data

The methylation data can be exported into formats appropriate for bsseq, DSS, or
edgeR. 

### bsseq and DSS

Both bsseq and DSS make use of the BSSeq object, and these can be obtained from
the NanoMethResult objects using the `methy_to_bsseq()` function.

```{r, message = FALSE}
nmr <- load_example_nanomethresult()
bss <- methy_to_bsseq(nmr)

bss
```

### edgeR

edgeR can also be used to perform differential methylation analysis: https://f1000research.com/articles/6-2055.
BSSeq objects can be converted into the appropriate format using the
`bsseq_to_edger()` function. This can be used to count reads on a per-site
basis or over regions.

```{r}
gene_regions <- exons_to_genes(NanoMethViz::exons(nmr))
edger_mat <- bsseq_to_edger(bss, gene_regions)

edger_mat
```