There is a nice vignette in snpStats concerning linkage disequilibrium (LD) analysis as supported by software in that package. The purpose of this package is to simplify handling of existing population-level data on LD for the purpose of flexibly defining LD blocks.
The hmld
function imports gzipped tabular data
from hapmap’s repository
.
suppressPackageStartupMessages({
library(ldblock)
library(GenomeInfoDb)
}) dir(system.file("hapmap", package="ldblock"), full=TRUE)
path = hmld(path, poptag="CEU", chrom="chr17") ceu17 =
## importing /tmp/Rtmp0DsUak/Rinst14db5f3579d0e7/ldblock/hapmap/ld_chr17_CEU.txt.gz
## done.
ceu17
## ldstruct for population CEU, chrom chr17
## dimensions: 36621 x 36621 ; statistic is Dprime
## object structure:
## ldmat chrom genome allpos poptag statInUse
## "dsCMatrix" "character" "character" "numeric" "character" "character"
## allrs
## "character"
For some reason knitr/render will not display this image nicely.
library(Matrix)
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
image(ceu17@ldmat[1:400,1:400],
col.reg=heat.colors(120), colorkey=TRUE, useRaster=TRUE)
This ignores physical distance and MAF. The bright stripes are probably due to SNP with low MAF.
We’ll use ceu17
and the gwascat
package to enumerate
SNP that are in LD with GWAS hits.
library(gwascat)
## Warning: replacing previous import 'utils::findMatches' by
## 'S4Vectors::findMatches' when loading 'AnnotationDbi'
## gwascat loaded. Use makeCurrentGwascat() to extract current image.
## from EBI. The data folder of this package has some legacy extracts.
load(system.file("legacy/ebicat37.rda", package="gwascat"))
#seqlevelsStyle(ebicat37) = "NCBI" # noop?
seqlevels(ebicat37) = gsub("chr", "", seqlevels(ebicat37))
ebicat37[ which(as.character(seqnames(ebicat37)) == "17") ] e17 =
Some dbSNP names for GWAS hits on chr17 are
unique(e17$SNPS)
rsh17 =head(rsh17)
## [1] "rs11078895" "rs11891" "rs7501939" "rs9905704" "rs4796793"
## [6] "rs78378222"
We will use expandSnpSet
to obtain names for SNP
that were found in HapMap CEU to have which D′>.9
with any of these hits. These names are added to
the input set.
length(rsh17)
## [1] 530
ldblock::expandSnpSet( rsh17, ldstruct= ceu17, lb=.9 ) exset =
## Warning in ldblock::expandSnpSet(rsh17, ldstruct = ceu17, lb = 0.9): dropping
## 149 rsn not matched in ld matrix
length(exset)
## [1] 13209
all(rsh17 %in% exset)
## [1] TRUE
Not all GWAS SNP are in the HapMap LD resource. You can use your own LD data as long as the format agrees with that of the HapMap distribution.