--- title: "CTCF liftOver" author: - name: Mikhail Dozmorov affiliation: - Virginia Commonwealth University email: mikhail.dozmorov@gmail.com output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('CTCF')`" vignette: > %\VignetteIndexEntry{Introduction to CTCF} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL, ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html warning = FALSE ) ``` # liftOver of CTCF coordinates As genome assemblies for model organisms continue to improve, CTCF sites for previous genome assemblies become obsolete. Typically, the actual genome sequence changes little, leading to changes in genomic coordinates. The [liftOver](https://genome.ucsc.edu/cgi-bin/hgLiftOver) method allows for conversion of genomic coordinates between genome assemblies. Some carefully curated CTCF sites are available only for older genome assemblies. Examples include the data from [CTCFBSDB](https://insulatordb.uthsc.edu/), available for hg18 and mm8 genome assemblies. To investigate whether liftOver of CTCF sites from older genome assemblies is a viable option, we tested for overlap between CTCF sites directly detected in specific genome assemblies with those lifted over. We detected CTCF sites using the [MA0139.1](https://jaspar.genereg.net/matrix/MA0139.1/) PWM from JASPAR 2022 database in hg18, hg19, hg38, and T2T genome assemblies and converted their genomic coordinates using the corresponding liftOver chains ([download_liftOver.sh](https://github.com/dozmorovlab/CTCF.dev/blob/main/scripts/download_liftOver.sh) and [convert_liftOver.sh](https://github.com/dozmorovlab/CTCF.dev/blob/main/scripts/convert_liftOver.sh) scripts). We observed high Jaccard overlap among CTCF sites detected in the original genome assemblies or lifted over. **Jaccard overlaps among CTCF binding sites detected in the original and liftOver human genome assemblies.** CTCF sites were detected using JASPAR 2022 MA0139.1 PWM. The correlogram was clustered using Euclidean distance and Ward.D clustering . White-red gradient indicate low-to-high Jaccard overlaps. Jaccard values are shown in the corresponding cells. ```{r echo=FALSE} knitr::include_graphics("../man/figures/Figure_liftOverJaccard.png") ``` Our results suggest that liftOver is a viable alternative to obtain CTCF genomic annotations for different genome assemblies. We provide [CTCFBSDB](https://insulatordb.uthsc.edu/) data converted to hg19 and hg38 genome assemblies. Date the vignette was generated. ```{r reproduce1, echo=FALSE} ## Date the vignette was generated Sys.time() ``` `R` session information. ```{r reproduce3, echo=FALSE} ## Session info library("sessioninfo") options(width = 120) session_info() ```