---
title: "crisprBowtie: alignment of gRNA spacer sequences using bowtie"
author: 
- name: Jean-Philippe Fortin
  affiliation: Department of Data Science and Statistical Computing, gRED, 
   Genentech
  email: fortin946@gmail.com
date: "`r Sys.Date()`"
output: 
  BiocStyle::html_document:
    toc_float: true
#    theme: paper
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Introduction to crisprBowtie}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
bibliography: references.bib
---



# Overview of crisprBowtie

`crisprBowtie` provides two main functions to align short DNA sequences to
a reference genome using the short read aligner bowtie [@langmead2009bowtie]
and return the alignments as R objects: `runBowtie` and `runCrisprBowtie`.
It utilizes the Bioconductor package `Rbowtie` to access the Bowtie program
in a platform-independent manner. This means that users do not need to install
Bowtie prior to using `crisprBowtie`. 


The latter function (`runCrisprBowtie`) is specifically designed
to map and annotate CRISPR guide RNA (gRNA) spacer sequences using
CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor package
[crisprBase](https://github.com/crisprVerse/crisprBase).
This enables a fast and accurate on-target and off-target search of 
gRNA spacer sequences for virtually any type of CRISPR nucleases. 
It also provides an off-target search engine for our main gRNA design package [crisprDesign](https://github.com/crisprVerse/crisprDesign) of the 
[crisprVerse](https://github.com/crisprVerse) ecosystem. See the 
`addSpacerAlignments` function in `crisprDesign` for more details. 


# Installation and getting started

## Software requirements

### OS Requirements

This package is supported for macOS, Linux and Windows machines.
Package was developed and tested on R version 4.2.


## Installation from Bioconductor

`crisprBowtie` can be installed from from the Bioconductor devel branch
using the following commands in a fresh R session:

```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(version="devel")
BiocManager::install("crisprBowtie")
```




# Building a bowtie index

To use `runBowtie` or `runCrisprBowtie`, users need to first build a Bowtie
genome index. For a given genome, this step has to be done only once. 
The `Rbowtie` package conveniently provides the function `bowtie_build`
to build a Bowtie index from any custom genome from a FASTA file.

As an example, we build a Bowtie index for a small portion of the human
chromosome 1 (`chr1.fa` file provided in the `crisprBowtie` package) and
save the index file as `myIndex` to a temporary directory:

```{r}
library(Rbowtie)
fasta <- file.path(find.package("crisprBowtie"), "example/chr1.fa")
tempDir <- tempdir()
Rbowtie::bowtie_build(fasta,
                      outdir=tempDir,
                      force=TRUE,
                      prefix="myIndex")
```

To learn how to create a Bowtie index for a complete genome or transcriptome,
please visit our [tutorial page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices).



# Alignment using `runCrisprBowtie`

As an example, we align 6 spacer sequences (of length 20bp) to the
custom genome built above, allowing a maximum of 3 mismatches between the 
spacer and protospacer sequences. 

We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the `CrisprNuclease` object `SpCas9` available through the 
`crisprBase` package. The argument `canonical=FALSE` specifies that 
non-canonical PAM sequences are also considered (NAG and NGA for SpCas9).
The function `getAvailableCrisprNucleases` in `crisprBase` returns a character
vector of available `crisprNuclease` objects found in `crisprBase`. 

```{r, warning=FALSE, message=FALSE}
library(crisprBowtie)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
spacers <- c("TCCGCGGGCGACAATGGCAT",
             "TGATCCCGCGCTCCCCGATG",
             "CCGGGAGCCGGGGCTGGACG",
             "CCACCCTCAGGTGTGCGGCC",
             "CGGAGGGCTGCAGAAAGCCT",
             "GGTGATGGCGCGGGCCGGGC")
runCrisprBowtie(spacers,
                crisprNuclease=crisprNuclease,
                n_mismatches=3,
                canonical=FALSE,
                bowtie_index=file.path(tempDir, "myIndex"))
```



# Applications beyond CRISPR

The function `runBowtie` is similar to `runCrisprBowtie`,
but does not impose constraints on PAM sequences.
It can be used to search for any short read sequence in a genome.

## Example using RNAi (siRNA design)

Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNA) experiments. `runBowtie` can be used to map shRNA/siRNA seed
sequences to reference genomes to predict putative off-targets:

```{r, eval=TRUE}
seeds <- c("GTAAAGGT", "AAGGATTG")
runBowtie(seeds,
          n_mismatches=2,
          bowtie_index=file.path(tempDir, "myIndex"))
```




# Reproducibility

```{r}
sessionInfo()
```



# References