--- title: "Introduction to HiCParser" author: - name: Elise Maigné affiliation: - INRAE, MIAT email: elise.maigne@inrae.fr - name: Matthias Zytnicki affiliation: - INRAE, MIAT email: matthias.zytnicki@inrae.fr output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('HiCParser')`" vignette: > %\VignetteIndexEntry{Introduction to HiCParser} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL, warning = FALSE ) ``` # Basics ## Required knowledge `r BiocStyle::Biocpkg("HiCParser")` is based on other packages and in particular in those that have implemented the infrastructure needed for dealing with HiC data with several replicates and conditions. Is provides several parsers, for several HiC data standard format to import them into R in a `r BiocStyle::Biocpkg("InteractionSet")` object. ## Citing `HiCParser` We hope that `r BiocStyle::Biocpkg("HiCParser")` will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you! ```{r "citation"} ## Citation info citation("HiCParser") ``` # Start using `HiCParser` ```{r "start", message=FALSE} library("HiCParser") ``` `HiCParser` can import Hi-C data sets in various different formats: - Cooler `.cool` or `.mcool` files. - Juicer `.hic` files. - HiC-Pro `.matrix` and `.bed` files. - Tabular (`.tsv`, `.csv`, ...) files. ## Cooler files ### `.cool` files To load `.cool` files generated by [Cooler][cooler-documentation] [@cooler]: ```{r coolFormat} # Path to each file paths <- c( "path/to/condition-1.replicate-1.cool", "path/to/condition-1.replicate-2.cool", "path/to/condition-1.replicate-3.cool", "path/to/condition-2.replicate-1.cool", "path/to/condition-2.replicate-2.cool", "path/to/condition-2.replicate-3.cool" ) # For the sake of the example, we will use the same file, several times paths <- rep( system.file("extdata", "hicsample_21.cool", package = "HiCParser" ), 6 ) # Condition and replicate of each file. Can be names instead of numbers. conditions <- c(1, 1, 1, 2, 2, 2) replicates <- c(1, 2, 3, 1, 2, 3) # Instantiation of data set hic.experiment <- parseCool( paths, conditions = conditions, replicates = replicates ) ``` ### `.mcool` files To load `.mcool` files generated by [Cooler][cooler-documentation] [@cooler]: ```{r mcoolFormat} # Path to each file paths <- c( "path/to/condition-1.replicate-1.mcool", "path/to/condition-1.replicate-2.mcool", "path/to/condition-1.replicate-3.mcool", "path/to/condition-2.replicate-1.mcool", "path/to/condition-2.replicate-2.mcool", "path/to/condition-2.replicate-3.mcool" ) # For the sake of the example, we will use the same file, several times paths <- rep( system.file("extdata", "hicsample_21.mcool", package = "HiCParser" ), 6 ) # Condition and replicate of each file. Can be names instead of numbers. conditions <- c(1, 1, 1, 2, 2, 2) replicates <- c(1, 2, 3, 1, 2, 3) # mcool files can store several resolutions. # We will mention the one we need. binSize <- 5000000 # Instantiation of data set # The same function "parseCool" is used for cool and mcool files hic.experiment <- parseCool( paths, conditions = conditions, replicates = replicates, binSize = binSize # Specified for .mcool files. ) ``` ## hic files To load `.hic` files generated by [Juicer][juicer-documentation] [@juicer]: ```{r hicFormat} # Path to each file paths <- c( "path/to/condition-1.replicate-1.hic", "path/to/condition-1.replicate-2.hic", "path/to/condition-2.replicate-1.hic", "path/to/condition-2.replicate-2.hic", "path/to/condition-3.replicate-1.hic" ) # For the sake of the example, we will use the same file, several times paths <- rep( system.file("extdata", "hicsample_21.hic", package = "HiCParser" ), 6 ) # Condition and replicate of each file. Can be names instead of numbers. conditions <- c(1, 1, 1, 2, 2, 2) replicates <- c(1, 2, 3, 1, 2, 3) # hic files can store several resolutions. # We will mention the one we need. binSize <- 5000000 # Instantiation of data set hic.experiment <- parseHiC( paths, conditions = conditions, replicates = replicates, binSize = binSize ) ``` Currently, `HiCParser` supports the hic format up to the version 9. ## HiC-Pro files To load `.matrix` and `.bed` files generated by [HiC-Pro][hicpro-documentation] [@hicpro]: ```{r hicproFormat} # Path to each matrix file matrixPaths <- c( "path/to/condition-1.replicate-1.matrix", "path/to/condition-1.replicate-2.matrix", "path/to/condition-1.replicate-3.matrix", "path/to/condition-2.replicate-1.matrix", "path/to/condition-2.replicate-2.matrix", "path/to/condition-2.replicate-3.matrix" ) # For the sake of the example, we will use the same file, several times matrixPaths <- rep( system.file("extdata", "hicsample_21.matrix", package = "HiCParser" ), 6 ) # Path to each bed file bedPaths <- c( "path/to/condition-1.replicate-1.bed", "path/to/condition-1.replicate-2.bed", "path/to/condition-1.replicate-3.bed", "path/to/condition-2.replicate-1.bed", "path/to/condition-2.replicate-2.bed", "path/to/condition-2.replicate-3.bed" ) # Alternatively, if the same bed file is used, we can provide it only once bedPaths <- system.file("extdata", "hicsample_21.bed", package = "HiCParser" ) # Condition and replicate of each file. Can be names instead of numbers. conditions <- c(1, 1, 1, 2, 2, 2) replicates <- c(1, 2, 3, 1, 2, 3) # Instantiation of data set hic.experiment <- parseHiCPro( matrixPaths = matrixPaths, bedPaths = bedPaths, conditions = conditions, replicates = replicates ) ``` ## Tabular files A tabular file is a tab-separated multi-replicate sparse matrix with a header: ``` chromosome position 1 position 2 C1.R1 C1.R2 C1.R3 ... Y 1500000 7500000 145 184 72 ... ``` The number of interactions between `position 1` and `position 2` of `chromosome` are reported in each `condition.replicate` column. There is no limit to the number of conditions and replicates. To load Hi-C data in this format: ```{r tabFormat} hic.experiment <- parseTabular( system.file("extdata", "hicsample_21.tsv", package = "HiCParser" ), sep = "\t" ) ``` # InteractionSet format # Output : InteractionSet format The output is a `r BiocStyle::Biocpkg("InteractionSet")`. This object can store one or several samples. Please read the documentation associated with the `r BiocStyle::Biocpkg("InteractionSet")` package to known more about this format. ```{r} library("HiCParser") hicFilePath <- system.file("extdata", "hicsample_21.hic", package = "HiCParser") hic.experiment <- parseHiC( paths = rep(hicFilePath, 6), binSize = 5000000, conditions = rep(seq(2), each = 3), replicates = rep(seq(3), 2) ) hic.experiment ``` The conditions and replicates are reported in the `colData` slot : ```{r} SummarizedExperiment::colData(hic.experiment) ``` They corresponds to columns of the `assays` matrix (containing interactions values): ```{r} head(SummarizedExperiment::assay(hic.experiment)) ``` The positions of interactions are in the `interactions` slot of the object: ```{r} InteractionSet::interactions(hic.experiment) ``` ## Additional utils functions A function `mergeInteractionSet` to merge `InteractionSet` objects, from the same experiment (for differents replicates or conditions). It merges the the data containing bins of interactions and fill the assays matrix accordingly, returning an assays matrix with several columns. The object returned by the function is an `InteractionSet`. Here is a fictitious example: ```{r} path <- system.file("extdata", "hicsample_21.cool", package = "HiCParser") object1 <- parseCool(path, conditions = 1, replicates = 1) # Creating an object with a different condition object2 <- parseCool(path, conditions = 2, replicates = 1) ``` The merged object: ```{r} objectMerged <- mergeInteractionSet(object1, object2) SummarizedExperiment::colData(objectMerged) head(SummarizedExperiment::assay(objectMerged)) ``` # Reproducibility This package was developed using `r BiocStyle::Biocpkg("biocthis")`. `R` session information. ```{r reproduce3, echo=FALSE} ## Session info library("sessioninfo") options(width = 120) session_info() ``` # Bibliography Lun ATL, Perry M and Ing-Simmons E (2016). Infrastructure for genomic interactions: Bioconductor classes for Hi-C, ChIA-PET and related experiments. F1000Res. 5, 950