CRISPR package demo Bioc2014 Boston
First load the required packages and specify the input file path. We are going to use a sequence from human as input, which has been included as as fasta file in the CRISPRseek package. To perform off target analysis, we need to load Human BSgenome package To annotate the target and off-targets, we need to load Human Transcript package Additionaly, need to specify the file containing all restriction enzyme (RE) cut patterns. You have the option to use the RE pattern file in the CRISPR package, or specify your own RE pattern file. Furthermore, you need to specify the output directory which will be the directory to look for all the output files.
library(CRISPRseek)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
##
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
##
## The following object is masked from 'package:stats':
##
## xtabs
##
## The following objects are masked from 'package:base':
##
## Filter, Find, Map, Position, Reduce, anyDuplicated, append,
## as.data.frame, as.vector, cbind, colnames, do.call,
## duplicated, eval, evalq, get, intersect, is.unsorted, lapply,
## mapply, match, mget, order, paste, pmax, pmax.int, pmin,
## pmin.int, rank, rbind, rep.int, rownames, sapply, setdiff,
## sort, table, tapply, union, unique, unlist
##
## Loading required package: Biostrings
## Loading required package: IRanges
## Loading required package: XVector
## Loading required package: BSgenome
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
## Loading required package: GenomicFeatures
## Loading required package: AnnotationDbi
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
##
## Attaching package: 'AnnotationDbi'
##
## The following object is masked from 'package:BSgenome':
##
## species
outputDir <- file.path(getwd(),"CRISPRseekDemo")
inputFilePath <- system.file('extdata', 'inputseq.fa', package = 'CRISPRseek')
REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package = 'CRISPRseek')
Here is the command to learn more about offTargetAnalysis function and different
?offTargetAnalysis
?compare2Sequences
?CRISPRseek
browseVignettes('CRISPRseek')
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,
BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0,
outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Maximum mismatch can be altered. The larger it is, the slower it runs.
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,
BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 2,
outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Scenario 2: Finding paired gRNAs with restriction enzyme cut site(s) and
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,
REpatternFile = REpatternFile, findPairedgRNAOnly = TRUE,
BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0,
outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Please note that max.mismatch is set to 3 so that we can view the off-targets
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = FALSE,
REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE,
BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 3,
outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Scenario 4: Finding gRNAs with restriction enzyme cut site(s) and off-target
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,
REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE,
BSgenomeName = Hsapiens, chromToSearch ="chrX", min.gap = 0, max.gap = 20,
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, max.mismatch = 0,
outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Calling the function offTargetAnalysis with findgRNAs = FALSE results in target and off-target searching, scoring and annotating for the input gRNAs. The gRNAs will be annotated with restriction enzyme cut sites for users to review later. However, paired information will not be available.
gRNAFilePath <- system.file('extdata', 'testHsap_GATA1_ex2_gRNA1.fa',
package = 'CRISPRseek')
offTargetAnalysis(inputFilePath = gRNAFilePath,
findgRNAsWithREcutOnly = TRUE, REpatternFile = REpatternFile,
findPairedgRNAOnly = FALSE, findgRNAs = FALSE,
BSgenomeName = Hsapiens, chromToSearch = 'chrX',
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene,
max.mismatch = 2, outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## >>> Finding all hits in sequences chrX ...
## >>> DONE searching
## Building feature vectors for scoring ...
## Calculating scores ...
## Annotating, filtering and generating reports ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
Calling the function offTargetAnalysis with chromToSearch = â€â€ results in quick gRNA search without performing on-target and off-target analysis. Parameters findgRNAsWithREcutOnly and find- PairedgRNAOnly can be tuned to indicate whether searching for gRNAs overlap restriction enzyme cut sites or not, and whether searching for gRNAs in paired configuration or not.
offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE,
REpatternFile = REpatternFile,findPairedgRNAOnly = TRUE,
chromToSearch = "", outputDir = outputDir, overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/
## A DNAStringSet instance of length 2
## width seq names
## [1] 23 TGTCCTCCACACCAGAATCAGGG gRNAf1_Hsap_GATA1...
## [2] 23 CCAGAGCAGGATCCACAAACTGG gRNAr1_Hsap_GATA1...
Scenario 7. Find potential gRNAs preferentially targeting one of two alleles
Below is an example to search for all gRNAs that target at least one of the alleles. Two files are provided containing sequences that differ by a single nucleotide polymorphism (SNP). The results are saved in file scoresFor2InputSequences.xls in outputDir directory.
inputFile1Path <- system.file("extdata", "rs362331C.fa", package = "CRISPRseek")
inputFile2Path <- system.file("extdata", "rs362331T.fa", package = "CRISPRseek")
seqs <- compare2Sequences(inputFile1Path, inputFile2Path,
outputDir = outputDir , REpatternFile = REpatternFile,
overwrite = TRUE)
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/rs362331C.fa/
## Validating input ...
## Searching for gRNAs ...
## Done. Please check output files in directory /home/ubuntu/CRISPRseekDemo/rs362331T.fa/
## [1] "Scoring ..."
## [1] "Done!"
To preferentially target one allele, select gRNA sequences that have the lowest score for the other allele. Selected gRNAs can then be examined for off-target sequences as described in Scenario 6.
Identify gRNAs that target the following two input sequences equally well with minimized off-target cleavage
MfSerpAEx2 GACGATGGCATCCTCCGTTCCCTGGGGCCTCCTGCTGCTGGCGGGGCTGTGCTGCCTGGCCCCCCGCTCCCTGGCCTCGAGTCCCCTGGGAGCCGCTGTCCAGGACACAGGTGCACCCCACCACGACCATGAGCACCATGAGGAGCCAGCCTGCCACAAGATTGCCCCGAACCTGGCCGACTTCGCCTTCAGCATGTACCGCCAGGTGGCGCATGGGTCCAACACCACCAACATCTTCTTCTCCCCCGTGAGCATCGCGACCGCCTTTGCGTTGCTTTCTCTGGGGGCCAAGGGTGACACTCACTCCGAGATCATGAAGGGCCTTAGGTTCAACCTCACTGAGAGAGCCGAGGGTGAGGTCCACCAAGGCTTCCAGCAACTTCTCCGCACCCTCAACCACCCAGACAACCAGCTGCAGCTGACCACTGGCAATGGTCTCTTCATCGCTGAGGGCATGAAGCTACTGGATAAGTTTTTGGAGGATGTCAAGAACCTGTACCACTCAGAAGCCTTCTCCACCAATTTCGGGGACACCGAAGCAGCCAAGAAACAGATCAACGATTATGTTGAGAAGGGAACCCAAGGGAAAATTGTGGATTTGGTCAAAGACCTTGACAAAGACACAGCTTTCGCTCTGGTGAATTACATTTTCTTTAAAG
HsSerpAEx2 GACAATGCCGTCTTCTGTCTCGTGGGGCATCCTCCTGCTGGCAGGCCTGTGCTGCCTGGTCCCTGTCTCCCTGGCTGAGGATCCCCAGGGAGATGCTGCCCAGAAGACAGATACATCCCACCATGATCAGGATCACCCAACCTTCAACAAGATCACCCCCAACCTGGCTGAGTTCGCCTTCAGCCTATACCGCCAGCTGGCACACCAGTCCAACAGCACCAATATCTTCTTCTCCCCAGTGAGCATCGCTACAGCCTTTGCAATGCTCTCCCTGGGGACCAAGGCTGACACTCACGATGAAATCCTGGAGGGCCTGAATTTCAACCTCACGGAGATTCCGGAGGCTCAGATCCATGAAGGCTTCCAGGAACTCCTCCGTACCCTCAACCAGCCAGACAGCCAGCTCCAGCTGACCACCGGCAATGGCCTGTTCCTCAGCGAGGGCCTGAAGCTAGTGGATAAGTTTTTGGAGGATGTTAAAAAGTTGTACCACTCAGAAGCCTTCACTGTCAACTTCGGGGACACCGAAGAGGCCAAGAAACAGATCAACGATTACGTGGAGAAGGGTACTCAAGGGAAAATTGTGGATTTGGTCAAGGAGCTTGACAGAGACACAGTTTTTGCTCTGGTGAATTACATCTTCTTTAAAG
Constraint gRNA Sequence by setting gRNA.pattern to require or exclude specific features within the target site.
3a. Synthesis of gRNAs in vivo from host U6 promoters is more efficient if the first base is guanine. To maximize the efficiency, what can we set gRNA.pattern?
3b. Synthesis of gRNAs in vitro using T7 promoters is most efficient when the first two bases are GG. To maximize the efficiency, what can we set gRNA.pattern?
3c. Five consecutive uracils in any position of a gRNA will affect transcription elongation by RNA polymerase III. To avoid premature termination during gRNA synthesis using U6 promoter, what can we set gRNA.pattern?
3d. Some studies have identified sequence features that broadly correlate with lower nuclease cleavage activity, such as uracil in the last 4 positions of the guide sequence. To avoid uracil in these positions, what can we specify gRNA.pattern?
In the examples we went through, we deliberately restricted searching off-targets in chromosome X. If we are interested in genome-wide search, what should we set chromToSearch to?
Find gRNAs in a paired configration with distance apart between 5 and 15 without performing off-target analysis
Create a transcriptDB object
It is known that different CRISPR-cas system uses different PAM sequence, what parameter needs to be reset?
It is known that different CRISPR-cas system has different gRNA length, what parameter needs to be reset?
Which parameter needs to be reset to 8 if we are interested in finding gRANs with restriction enzyme pattern of size 8 or above?
New penalty matrix has been recently derived, which parameter needs to be set accordingly?
It has been shown that although PAM sequence NGG is preferred, a variant NAG is also recognized with less effecieny. The researcher is interested in performing off-target searching to include both NGG and NAG variants, but requiring that gRNAs must precede NGG. What parameter(s) need to be set correctly to carry such a search?
Could you think of any other use cases?