--- title: "Import GWAS summary statistics from Open GWAS" author: "
Authors: Alan Murphy, Brian Schilder and Nathan Skene
" date: "
Updated: `r format(Sys.Date(), '%b-%d-%Y')`
" csl: nature.csl output: BiocStyle::html_document: vignette: > %\VignetteIndexEntry{OpenGWAS} %\usepackage[utf8]{inputenc} %\VignetteEngine{knitr::rmarkdown} --- ```{r style, echo=FALSE} knitr::opts_chunk$set(tidy = FALSE, message = FALSE) ``` ```{r} library(MungeSumstats) ``` MungeSumstats now offers high throughput query and import functionality to data from the MRC IEU [Open GWAS Project](https://gwas.mrcieu.ac.uk/). This is made possible by the use the [IEU OpwnGWAS R package](https://github.com/MRCIEU/ieugwasr): `ieugwasr`. Before you can use this functionality however, please complete the following steps: # Authenticate Access IEU OpenGWAS API To authenticate, you need to generate a token from the OpenGWAS website. The token behaves like a password, and it will be used to authorise the requests you make to the OpenGWAS API. Here are the steps to generate the token and then have `ieugwasr` automatically use it for your queries: 1. Login to https://api.opengwas.io/profile/ 2. Generate a new token 3. Add `OPENGWAS_JWT=` to your .Renviron file, thi can be edited in R by running `usethis::edit_r_environ()` 4. Restart your R session 5. To check that your token is being recognised, run `ieugwasr::get_opengwas_jwt()`. If it returns a long random string then you are authenticated. 6. To check that your token is working, run `ieugwasr::user()`. It will make a request to the API for your user information using your token. It should return a list with your user information. If it returns an error, then your token is not working. 7. Make sure you have submitted user information to increasse you API limit at https://api.opengwas.io/profile/. # Find GWAS datasets We can search by terms and with other filters like sample size: ```{r,eval=FALSE} #### Search for datasets #### metagwas <- MungeSumstats::find_sumstats(traits = c("parkinson","alzheimer"), min_sample_size = 1000) head(metagwas,3) ids <- (dplyr::arrange(metagwas, nsnp))$id ``` ```{r,echo=FALSE} #### Search for datasets #### #error_dwnld <- # tryCatch( # metagwas <- MungeSumstats::find_sumstats(traits = c("parkinson", # "alzheimer"), # min_sample_size = 1000), # error = function(e) e, # warning = function(w) w # ) #if(exists("metagwas")&&is.data.frame(metagwas)){#if exists downloaded fine # head(metagwas,3) # ids <- (dplyr::arrange(metagwas, nsnp))$id #} #to speed up build just create results metagwas2 <- data.frame( id=c("ieu-a-298","ieu-b-2","ieu-a-297"), trait = rep("Alzheimer's disease",3), group_name = rep("public",3), year=c(2013, 2019, 2013), author=c("Lambert","Kunkle BW","Lambert"), consortium=c("IGAP", "Alzheimer Disease Genetics Consortium (ADGC), European Alzheimer's Disease Initiative (EADI), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer's Disease Consortium (GERAD/PERADES),", "IGAP" ), sex=rep("Males and Females",3), population=rep("European",3), unit=c("log odds","NA","log odds"), nsnp=c(11633,10528610,7055882), sample_size=c(74046,63926,54162), build=rep("HG19/GRCh37",3), category=c("Disease","Binary","Disease"), subcategory=rep("Psychiatric / neurological",3), ontology=rep("NA",3), mr=rep(1,3), priority=c(1,0,2), pmid=c(24162737,30820047,24162737), sd=rep(NA,3), note=c("Exposure only; Effect allele frequencies are missing; forward(+) strand", "NA","Effect allele frequencies are missing; forward(+) strand"), ncase=c(25580,21982,17008), ncontrol=c(48466,41944,37154), N=c(74046,63926,54162) ) print(head(metagwas2,3)) ``` You can also search by ID: ```{r,eval=FALSE} ### By ID and sample size metagwas <- find_sumstats( ids = c("ieu-b-4760", "prot-a-1725", "prot-a-664"), min_sample_size = 5000 ) ``` # Import full results You can supply `import_sumstats()` with a list of as many OpenGWAS IDs as you want, but we'll just give one to save time. ```{r,eval=FALSE} datasets <- MungeSumstats::import_sumstats(ids = "ieu-a-298", ref_genome = "GRCH37") ``` ```{r,echo=FALSE} #don't actually run as this takes some time, use stashed version datasets <- list("ieu-a-298"=paste0(tempdir(),"/ieu-a-298.tsv.gz")) #stashed value datasets_data <- list("ieu-a-298"=system.file("extdata","ieu-a-298.tsv.gz", package="MungeSumstats")) ``` ## Summarise results By default, `import_sumstats` results a named list where the names are the Open GWAS dataset IDs and the items are the respective paths to the formatted summary statistics. ```{r} print(datasets) ``` You can easily turn this into a data.frame as well. ```{r} results_df <- data.frame(id=names(datasets), path=unlist(datasets)) print(results_df) ``` # Import full results (parallel) *Optional*: Speed up with multi-threaded download via [axel](https://github.com/axel-download-accelerator/axel). ```{r, eval=FALSE} datasets <- MungeSumstats::import_sumstats(ids = ids, vcf_download = TRUE, download_method = "axel", nThread = max(2,future::availableCores()-2)) ``` # Further functionality See the [Getting started vignette](https://neurogenomics.github.io/MungeSumstats/articles/MungeSumstats.html) for more information on how to use MungeSumstats and its functionality. # Session Info
```{r} utils::sessionInfo() ```