---
title: "How to Use Fit-Hi-C R Package"
author: "Ruyu Tan"
date: "`r Sys.Date()`"
output:
    html_document:
        toc: true
        toc_float: true
vignette: >
    %\VignetteIndexEntry{Vignette Title}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

## Introduction

[Fit-Hi-C](http://genome.cshlp.org/content/24/6/999) is a tool for assigning
statistical confidence estimates to intra-chromosomal contact maps produced by
genome-wide genome conformation capture assays such as Hi-C as well as newer
technologies such as PLAC-seq, HiChIP and region capture Hi-C. When using
Fit-Hi-C with Hi-C data, we strongly suggest using bias values from matrix
balancing-based normalization methods such as ICE or KR to control for
experimental and techical biases in significance estimation. While using bias
values, please make sure to use RAW counts and NOT the normalized counts as
normalization will be taken into account through bias values. Here we provide an
R implementation of Fit-Hi-C. Compared to its original implementation in Python
(https://noble.gs.washington.edu/proj/fit-hi-c), Fit-Hi-C R port has the
following advantages:

- Fit-Hi-C R package is more efficient than Python original by re-writting some
logic in C/C++
- Fit-Hi-C R package is easy to use for those familiar with R language and
Bioconductor without additional configuration
- Bug fixes on "nan" errors in q-value computation and plotting
- Compatible with output of hicpro2fithic.py script available in
[HiCPro 2.8.1](https://github.com/nservant/HiC-Pro/tree/master/bin/utils)

## Install FitHiC

To install this package, start R and enter

```
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("FitHiC")
```

## Example I: Yeast (S. cerevisiae) Hi-C data at single restriction enzyme (RE) resolution without bias values

__Duan_yeast_EcoRI__

_FRAGSFILE_ and _INTERSFILE_ are located in `system.file("extdata",
"fragmentLists/Duan_yeast_EcoRI.gz", package = "FitHiC")` and `system.file(
"extdata", "contactCounts/Duan_yeast_EcoRI.gz", package = "FitHiC")`,
respectively. When input data is ready, run as follows:

```{r, eval=FALSE}
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Duan_yeast_EcoRI")
FitHiC(fragsfile, intersfile, outdir, libname="Duan_yeast_EcoRI",
    distUpThres=250000, distLowThres=10000)
```

Internally, Fit-Hi-C will successively call `generate_FragPairs`,
`read_ICE_biases`, `read_All_Interactions`, `calculateing_Probabilities`,
`fit_Spline` methods. The execution of Fit-Hi-C will be successfully completed
till the following log appears:

```{r run, echo=FALSE, collapse=TRUE, warning=FALSE}
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Duan_yeast_EcoRI")
FitHiC(fragsfile, intersfile, outdir, libname="Duan_yeast_EcoRI",
    distUpThres=250000, distLowThres=10000)
```

```{r run-visual, echo=FALSE, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_EcoRI.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Duan_yeast_EcoRI")
FitHiC(fragsfile, intersfile, outdir, libname="Duan_yeast_EcoRI",
    distUpThres=250000, distLowThres=10000, visual=TRUE)
```

The output files come from two internal methods called by Fit-Hi-C.

* __calculate_Probabilites__

```{r calculate-probabilities, echo=FALSE, results="asis"}
output <- file.path(getwd(), "Duan_yeast_EcoRI",
    "Duan_yeast_EcoRI.fithic_pass1.txt")
data <- read.table(output, header=TRUE)
knitr::kable(head(data, n=6L), caption="Duan_yeast_EcoRI.fithic_pass1.txt")

output <- file.path(getwd(), "Duan_yeast_EcoRI",
    "Duan_yeast_EcoRI.fithic_pass2.txt")
data <- read.table(output, header=TRUE)
knitr::kable(head(data, n=6L), caption="Duan_yeast_EcoRI.fithic_pass2.txt")
```

* __fit_Spline__

```{r fit-spline, echo=FALSE, results="asis"}
output <- file.path(getwd(), "Duan_yeast_EcoRI",
    "Duan_yeast_EcoRI.spline_pass1.significances.txt.gz")
data <- read.table(gzfile(output), header=TRUE)
knitr::kable(head(data, n=6L), align="crcrrrr",
    caption="Duan_yeast_EcoRI.spline_pass1.significances.txt.gz")

output <- file.path(getwd(), "Duan_yeast_EcoRI",
    "Duan_yeast_EcoRI.spline_pass2.significances.txt.gz")
data <- read.table(gzfile(output), header=TRUE)
knitr::kable(head(data, n=6L), align="crcrrrr",
    caption="Duan_yeast_EcoRI.spline_pass2.significances.txt.gz")
```

If `visual` is set to TRUE, corresponding images will be also outputed:

+-------------------------------------------------------------------------+---------------------------------------------------------------+
| ![](Duan_yeast_EcoRI/Duan_yeast_EcoRI.fithic_pass1.png)                 | ![](Duan_yeast_EcoRI/Duan_yeast_EcoRI.spline_pass1.png)       |
+-------------------------------------------------------------------------+---------------------------------------------------------------+
| ![](Duan_yeast_EcoRI/Duan_yeast_EcoRI.spline_pass1.extractOutliers.png) | ![](Duan_yeast_EcoRI/Duan_yeast_EcoRI.spline_pass1.qplot.png) |
+-------------------------------------------------------------------------+---------------------------------------------------------------+

__Duan_yeast_HindIII__

Similarly, Duan_yeast_HindIII can be run as follows:

```{r, echo=FALSE, message=FALSE, warning=FALSE, results="hide"}
fragsfile <- system.file("extdata", "fragmentLists/Duan_yeast_HindIII.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/Duan_yeast_HindIII.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Duan_yeast_HindIII")
FitHiC(fragsfile, intersfile, outdir, libname="Duan_yeast_HindIII",
    distUpThres=250000, distLowThres=10000)
```

## Example II: Human ESC Hi-C data at 40kb fixed size resolution (only chr1) without bias values
```{r, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata",
    "fragmentLists/Dixon_hESC_HindIII_hg18_w40000_chr1.gz",
    package = "FitHiC")
intersfile <- system.file("extdata",
    "contactCounts/Dixon_hESC_HindIII_hg18_w40000_chr1.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Dixon_hESC_HindIII_hg18_w40000_chr1")
FitHiC(fragsfile, intersfile, outdir,
    libname="Dixon_hESC_HindIII_hg18_w40000_chr1", noOfBins=50,
    distUpThres=5000000, distLowThres=50000, visual=TRUE)
```

+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_w40000_chr1/Dixon_hESC_HindIII_hg18_w40000_chr1.fithic_pass1.png)                 | ![](Dixon_hESC_HindIII_hg18_w40000_chr1/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.png)       |
+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_w40000_chr1/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.extractOutliers.png) | ![](Dixon_hESC_HindIII_hg18_w40000_chr1/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.qplot.png) |
+---------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+

## Example III: Human ESC Hi-C data at 10 consecutive RE resolution (only chr1) without bias values

```{r, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata",
    "fragmentLists/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.gz",
    package = "FitHiC")
intersfile <- system.file("extdata",
    "contactCounts/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Dixon_hESC_HindIII_hg18_combineFrags10_chr1")
FitHiC(fragsfile, intersfile, outdir,
    libname="Dixon_hESC_HindIII_hg18_combineFrags10_chr1", noOfBins=200,
    distUpThres=5000000, distLowThres=50000, visual=TRUE)
```

+-------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_combineFrags10_chr1/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.fithic_pass1.png)                 | ![](Dixon_hESC_HindIII_hg18_combineFrags10_chr1/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.spline_pass1.png)       |
+-------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_combineFrags10_chr1/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.spline_pass1.extractOutliers.png) | ![](Dixon_hESC_HindIII_hg18_combineFrags10_chr1/Dixon_hESC_HindIII_hg18_combineFrags10_chr1.spline_pass1.qplot.png) |
+-------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+

```{r, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata",
    "fragmentLists/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.gz",
    package = "FitHiC")
intersfile <- system.file("extdata",
    "contactCounts/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Dixon_mESC_HindIII_mm9_combineFrags10_chr1")
FitHiC(fragsfile, intersfile, outdir,
    libname="Dixon_mESC_HindIII_mm9_combineFrags10_chr1", noOfBins=200,
    distUpThres=5000000, distLowThres=50000, visual=TRUE)
```

+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_mESC_HindIII_mm9_combineFrags10_chr1/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.fithic_pass1.png)                 | ![](Dixon_mESC_HindIII_mm9_combineFrags10_chr1/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.spline_pass1.png)       |
+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_mESC_HindIII_mm9_combineFrags10_chr1/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.spline_pass1.extractOutliers.png) | ![](Dixon_mESC_HindIII_mm9_combineFrags10_chr1/Dixon_mESC_HindIII_mm9_combineFrags10_chr1.spline_pass1.qplot.png) |
+-----------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+

## Example IV: Human ESC Hi-C data at 40kb fixed size resolution (only chr1) WITH bias values 

```{r, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata",
    "fragmentLists/Dixon_hESC_HindIII_hg18_w40000_chr1.gz",
    package = "FitHiC")
intersfile <- system.file("extdata",
    "contactCounts/Dixon_hESC_HindIII_hg18_w40000_chr1.gz",
    package = "FitHiC")
outdir <- file.path(getwd(), "Dixon_hESC_HindIII_hg18_w40000_chr1.afterICE")
biasfile <- system.file("extdata",
    "biasPerLocus/Dixon_hESC_HindIII_hg18_w40000_chr1.gz",
    package = "FitHiC")
FitHiC(fragsfile, intersfile, outdir, biasfile,
    libname="Dixon_hESC_HindIII_hg18_w40000_chr1", noOfBins=50,
    distUpThres=5000000, distLowThres=50000, visual=TRUE)
```

+------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_w40000_chr1.afterICE/Dixon_hESC_HindIII_hg18_w40000_chr1.fithic_pass1.png)                 | ![](Dixon_hESC_HindIII_hg18_w40000_chr1.afterICE/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.png)       |
+------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+
| ![](Dixon_hESC_HindIII_hg18_w40000_chr1.afterICE/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.extractOutliers.png) | ![](Dixon_hESC_HindIII_hg18_w40000_chr1.afterICE/Dixon_hESC_HindIII_hg18_w40000_chr1.spline_pass1.qplot.png) |
+------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+

## Example V: Human MCF7 HiC-Pro data at 5Mb resolution WITH bias values

```{r, message=FALSE, warning=FALSE, results="hide"}
library("FitHiC")
fragsfile <- system.file("extdata", "fragmentLists/data_5000000_abs.bed.gz",
    package = "FitHiC")
intersfile <- system.file("extdata", "contactCounts/data_5000000.matrix.gz",
    package = "FitHiC")
biasfile <- system.file("extdata",
    "biasPerLocus/data_5000000_iced.matrix.biases.gz", package = "FitHiC")
outdir <- file.path(getwd(), "data_5000000")
FitHiC(fragsfile, intersfile, outdir, biasfile, libname="data_5000000",
    distUpThres=500000000, distLowThres=5000000, visual=TRUE, useHiCPro=TRUE)
```

+-----------------------------------------------------------------+-------------------------------------------------------+
| ![](data_5000000/data_5000000.fithic_pass1.png)                 | ![](data_5000000/data_5000000.spline_pass1.png)       |
+-----------------------------------------------------------------+-------------------------------------------------------+
| ![](data_5000000/data_5000000.spline_pass1.extractOutliers.png) | ![](data_5000000/data_5000000.spline_pass1.qplot.png) |
+-----------------------------------------------------------------+-------------------------------------------------------+

## References
1. Fit-Hi-C original manuscript: Ay et al. Genome Research, 2014 - https://www.ncbi.nlm.nih.gov/pubmed/24501021
2. Fit-Hi-C Python implementation - https://noble.gs.washington.edu/proj/fit-hi-c
3. Budding yeast Hi-C data: Duan et al. Nature, 2010 - https://www.ncbi.nlm.nih.gov/pubmed/20436457
4. Human embryonic stem cell Hi-C data: Dixon et al. Nature, 2012 - https://www.ncbi.nlm.nih.gov/pubmed/22495300
5. Human MCF7 cell line Hi-C data: Barutcu et al. Genome Biology, 2015 - https://www.ncbi.nlm.nih.gov/pubmed/26415882

## Prepare Data
There are two different options for running FitHiC:

1. Use Hi-C pro pipeline;

2. Prepare at least two input files described below:

* __FRAGSFILE__ This file stores the information about midpoints (or start
indices) of the fragments. It should consist of 5 columns: first column stands
for chromosome name; third column stands for the midPoint; fourth column stands
for the hitCount; second column and fifth column will be ignored so you can set
them to 0. HitCount (4th column) is only used for filtering in conjuction with
the "mappabilityThreshold" option. By default setting bins that need to be
filtered to "0" and others to "1" and leaving the mappabilityThreshold option to
its default value of 1 is enough. You do not need to compute hitCount (4th
column) unless you will explicitly filter using a custom threshold on marginal
counts set by the "mappabilityThreshold" option.

```{r fragsfile, echo = FALSE, results = "asis"}
fragsfile <- system.file("extdata",
    "fragmentLists/Dixon_hESC_HindIII_hg18_w40000_chr1.gz", package = "FitHiC")
data <- read.table(gzfile(fragsfile), header=FALSE,
    col.names=c("Chromosome Name", "Column 2", "Mid Point", "Hit Count",
    "Column 5"))
knitr::kable(head(data, n=6L), align = "crrrr", caption="FRAGSFILE")
```

* __INTERSFILE__ This file stores the information about interactions between
fragment pairs. It should consist of 5 columns: first column and third column
stand for the chromosome names of the fragment pair; second column and fourth
column stand for midPoints of the fragment pair; fifth column stands for
contact count between the two bins. Make sure that midpoints in this file match 
those in fragsfile and in biasfile (when normalization is applied). Make sure 
to use RAW contact counts and NOT the normalized counts. Use BIASFILE if 
normalization is to be taken into account (Highly suggested). 

```{r intersfile, echo=FALSE, results="asis"}
intersfile <- system.file("extdata",
    "contactCounts/Dixon_hESC_HindIII_hg18_w40000_chr1.gz", package = "FitHiC")
data <- read.table(gzfile(intersfile), header=FALSE,
    col.names=c("Chromosome1 Name", "Mid Point 1", "Chromosome2 Name",
    "Mid Point 2", "Hit Count"))
knitr::kable(head(data, n=6L), align = "crcrr", caption="INTERSFILE")
```

* __BIASFILE__ FitHiC works with matrix balancing-based normalization methods 
such as (ICE, KR or Vanilla Coverage). These methods provide a bias value per
bin/locus, the vector of which should be centered on 1 so that:  
bias = 1 (expected amount of count/visibility)
bias > 1 (higher than expected count)
bias < 1 (lower than expected count)

```{r biasfile, echo=FALSE, results="asis"}
biasfile <- system.file("extdata",
    "biasPerLocus/Dixon_hESC_HindIII_hg18_w40000_chr1.gz", package = "FitHiC")
data <- read.table(gzfile(biasfile), header=FALSE,
    col.names=c("Chromosome Name", "Mid Point", "Bias"))
knitr::kable(head(data, n=6L), align = "crr", caption="BIASFILE")
```

Besides, __OUTDIR__, the path where the output files will be stored, is also
required to be specified.

## Support

For questions about the use of Fit-Hi-C method, to request pre-processed Hi-C
data and/or additional features and scripts, or to report bugs and provide
feedback please e-mail Ferhat Ay.

[Ferhat Ay](http://www.lji.org/faculty-research/labs/ay)
\<ferhatay at lji dot org\>

Fit-Hi-C R package maintainer Ruyu Tan
\<rut003 at ucsd dot edu\>