--- title: "Complete Guide for Seven Bridges API R Client" output: BiocStyle::html_document: toc: true toc_depth: 4 number_sections: false highlight: haddock css: style.css --- ```{r include=FALSE} knitr::opts_chunk$set(eval = FALSE) ``` ## Introduction This package is designed to support as many Seven Bridges supported platforms as possible, including the NCI Cancer Genomic Cloud Pilot developed by Seven Bridges, make sure you provide the correct API URL for the platform you are using. Currently tested platform including - [Cancer Genomics Cloud Platform](http://www.cancergenomicscloud.org/) - [Seven Bridges US platform on AWS](https://www.sbgenomics.com/) - [Seven Bridges US platform on Google Cloud](https://gcp.sbgenomics.com/) To read complete platform documentation - [Full doc for Cancer Genomics Cloud Platform](http://docs.cancergenomicscloud.org/docs) - [Full doc for Seven Bridges US platform](https://docs.sbgenomics.com/display/developerhub/Seven+Bridges+Genomics+Developer+Hub) Full API documentation on the platform - [API doc for Cancer Genomics Cloud Platform](http://docs.cancergenomicscloud.org/docs/the-cgc-api) - [API doc for Seven Bridges US platform](https://docs.sbgenomics.com/display/developerhub/API) ### API R client Current public V2 API is CWL compatable, public API url: - __NCI Cancer Genomics Cloud__ : https://cgc-api.sbgenomics.com/v2/ - __Seven Bridges on AWS__: https://api.sbgenomics.com/v2/ - __Seven Bridges on Google__: https://gcp-api.sbgenomics.com/v2/ `sevenbbridges` package only support V2 and later API, which support CWL compatible project. It provides simple interface for easy access and friendly showing methos. You will learn them through the tutorials. __For advanced users__, you can directly use `httr` package to construct your API calls, or you can still use low level API call for all APIs like this, the most used arguments are "path", "query", "body". For example, when you read API documentatio you will see a section called "list all your projects", it tells you to use method "get" and path "/projects", so you can simply call ```{r} library(sevenbridges) a <- Auth(token = "8c3329a4de664c35bb657499bb2f335c", url = "https://api.sbgenomics.com/v2/") a$api(path = "project", method = "GET") ``` you can also pass query and body as a list. With this package, you can simply call ```{r} a$project() ``` ### API principles Before we continue, there are couple things you may want to keep it in mind - Every API call accept arguments called `offset` and `limit`, offset defines where the retrieved items started, and limit defines quantity of items you want to get, and by default it's always `offset = 0` and `limit = 100`, which means the __first 100 items__. This applies when you want to list items or search items by a name matching. To force the search and list for all items, please use `complete = TRUE` in your call. By default `complete = FALSE` that it always perform those search and list operations for particular offset and limit. - When search by ID, it's unqiue don't worry about offset or limit. So __it's always a good practice__ to get your item by ID and pass as input to your task. To find id, you can either find it on the UI in the url or in the API list items call or defail infomation. - Search by name pattern is not unique, it returned you all matched items, unless you set `exact = TRUE`. ### Installation `sevenbridges` package is now on Bioconductor released and devel branch. ```{r} ## try http:// if https:// URLs are not supported source("https://bioconductor.org/biocLite.R") biocLite("sevenbridges") ``` Our API keep improving, please also visit our [github homepage](https://github.com/sbg/sevenbridges-r) for most recent news and for latest version. __If you don't have devtools__ This require you have `devtools` package, install it from CRAN if you don't have it ```r install.packages("devtools") ``` You may got an error and need system dependecies sometimes for curl and ssl, for example, in ubuntu you probably need to do this first in order to install `devtools` and in order to build vigenttes (you need pandoc) ``` apt-get update apt-get install libcurl4-gnutls-dev libssl-dev pandoc pandoc-citeproc ``` __If devtools is already installed__ Now install latest version from github for `sevenbridges` ```r source("http://bioconductor.org/biocLite.R") biocLite(c("readr", "BiocStyle")) library(devtools) install_github("sbg/sevenbridges-r", build_vignettes=TRUE, repos=BiocInstaller::biocinstallRepos(), dependencies=TRUE) ``` If you have trouble with pandoc and don't want to install pandoc, set `build_vignettes = FALSE` to avoid vignettes build. ## Quickstart For more details about how to use the API client in R, please go for the second section for complete guide. This section, I am going to use a simple example for a quick start. ### Create Auth object Everything starts from an `Auth` object, so let's set up the `Auth` object, it remembers your auth token and url, every action started from this object. You have three different ways to setup the token. 1. Direct setup via `Auth` function, explicityly setup your token and API url. 2. Configuration file in your home folder ".sbg.auth.yml", very easy to manage and implicitly loaded, everytime you start a new R session. 3. Tempoerary setup via option in R session Load the library first ```{r, eval = TRUE, message = FALSE} library(sevenbridges) ``` This is most common way to construct your Auth object ```{r} ## direct setup (a <- Auth(token = "", url = "https://cgc-api.sbgenomics.com/v2/")) ## or load default from config file (autoloaded into options) ``` ``` == Auth == token : url : https://cgc-api.sbgenomics.com/v2/ ``` Or loaded from your configuration file ```{r} ## from platfrom "us" for user "tengfei" a <- Auth(platform = "us", username = "tengfei") ``` or update Auth object from another config file ```{r} updateAuthList("new_config.yml") ``` ### Infomation about a user This call returns information about your account. ```{r} a$user() ``` ``` == User == href : https://cgc-api.sbgenomics.com/v2/users/tengfei username : tengfei email : tengfei.yin@sbgenomics.com first_name : Tengfei last_name : Yin affiliation : Seven Bridges Genomics country : United States ``` To list user resrouces, This call returns information about the specified user. Note that currently you can view only your own user information, and so this call is equivalent to the call to Get my information. ```{r} a$user(username = "tengfei") ``` ### Rate limit This call returns information about your current rate limit. This is the number of API calls you can make in one hour. ```{r} a$rate_limit() ``` ``` == Rate Limit == limit : 1000 remaining : 993 reset : 1457980957 ``` ### Show billing information Billing information, every project is associated with a billing group ```{r} ## check your billing info a$billing() a$invoice() ``` For more information, use `breakdown = TRUE` ```{r} a$billing(id = "your_billing_id", breakdown = TRUE) ``` ### Create a project Create a new project called "api testing", with the billing group id. ```{r} ## get billing group id bid <- a$billing()$id ## create new project (p <- a$project_new(name = "api testing", bid, description = "Just a testing")) ``` ``` == Project == id : tengfei/api-testing name : api testing description : Just a testing billing_group_id : type : v2 -- Permission -- ``` ### Get exisitng project ```{r} ## list first 100 a$project() ## list all a$project(complete = TRUE) ## return all named match "demo" a$project(name = "demo", complete = TRUE) ## get the project you want by id p = a$project(id = "tengfei/api-tutorial") ``` ### Copy public app into your project To find out avaialbe public apps, you can - Browse them online, check out the [tutorial](http://docs.cancergenomicscloud.org/docs/) for "Find apps" section. - You can simply use this package to find it, see following examples. ```{r} ## search by name matching, complete = TRUE search all apps, not ## limited by offset or limit. a$public_app(name = "STAR", complete = TRUE) ## search by id is accurate a$public_app(id = "admin/sbg-public-data/rna-seq-alignment-star/5") ## you can also get everything a$public_app(complete = TRUE) ## default limit = 100, offset = 0 which means first 100 a$public_app() ``` Now, from your `Auth` object, you copy an App `id` into your `project` id with a new `name`, following this logic. ```{r} ## copy a$copy_app(id = "admin/sbg-public-data/rna-seq-alignment-star/5", project = "tengfei/api-testing", name = "new copy of star") ## check its' copyed p = a$project(id = "tengfei/api-testing") ## list apps your got in your project p$app() ``` The short name is changed to "newcopyofstar" ``` == App == id : tengfei/api-testing/newcopyofstar/0 name : RNA-seq Alignment - STAR project : tengfei/api-testing-2 revision : 0 ``` Alternatively you can copy from app object ```{r} app = a$public_app(id = "admin/sbg-public-data/rna-seq-alignment-star") app$copy_to(project = "tengfei/api-testing", name = "copy of star") ``` ### Import CWL app and run a task You can also upload your own CWL json file that describe your app to your project. Note: alternatively you can directly describe your CWL tool in R with this package, please read another vignettes on "Describe CWL Tools/Workflows in R and Execution" ```{r} ## Add an CWL file to your project f.star = system.file("extdata/app", "flow_star.json", package = "sevenbridges") app = p$app_add("starlocal", fl.runif) (aid <- app$id) ``` You get an app id like this ``` "tengfei/api-testing/starlocal/0" ``` It's composed of 1. __project id__ : tengfei/api 2. __app short name__ : runif 3. __revision__ : 0 Alternatively, you can describe tools in R directly, you can learn this later, feel free to go to next section ```{r comment='', eval = TRUE} fl <- system.file("docker", "sevenbridges/rabix/generator.R", package = "sevenbridges") cat(readLines(fl), sep = '\n') ``` And add it like this ```{r} ## rbx is the object returned by Tool function app = p$app_add("runif", rbx) (aid <- app$id) ``` Please read another tutorial about how to describe tools and flows in R. ### Excute a new task #### Understand your app input Now assume you already copied the public app "admin/sbg-public-data/rna-seq-alignment-star/5" into your project "tengfei/api-testing" , the app id in your current project is "tengfei/api-testing/newcopyofstar" Or if you already have an app to run in your project. To draft a new task, you need to specify - name of the task - description - app id - inputs of your task: you need to know about the app you are running, in this case, the cwl app accept 4 parameters (number, min, max, seed). You can always go to the UI to check App details or task input requirements, but how to do it in R? To check your inputs name and types you need to get an `App` object first. ```{r} app = a$app(id = "tengfei/api-testing-2/newcopyofstar") ## get input matrix app$input_matrix() app$input_matrix(c("id", "label", "type")) app$input_matrix(c("id", "label", "type"), required = TRUE) ## get required input names and types only app$get_required() ``` Or loaded from a CWL JSON and convert it into a R object first. ```{r, eval = TRUE} f1 = system.file("extdata/app", "flow_star.json", package = "sevenbridges") app = convert_app(f1) ## get input matrix app$input_matrix() app$input_matrix(c("id", "label", "type")) app$input_matrix(c("id", "label", "type"), required = TRUE) ## get required input names and types only app$get_required() ``` For what it returned, the names is the one you want to use as your task input list. Task inputs need to match the expected data type and name, requried has to be provided. In above example, we see two required field - fastq: File array as indicated by "File..." - genomeFastaFiles: single file as indicated by "File" We also want to provide gene feature file - sjdbGTFfile: single file as indicated by "File" Note some other input types includes: - number, character or integer, you can directly pass to input parameters as it is. - an Enum type, just pass the value to the input parameter - File, you have to pay attention to this. Some parameters accept only single file type 'File' as input while other input takes more than one files (File arrays or FilesList or 'File...' ). This input require you to pass a `Files` object (for sinlge file input) or `FilesList` object (for input accept more than one files) or simply a list of "Files" object. Get your file by `id` or by `name` with `exact = TRUE` is always accurate. Sounds complicated let's just see an example. #### Get your input files ready ```{r} fastqs <- c("SRR1039508_1.fastq", "SRR1039508_2.fastq") ## get all 2 exact files fastq_in = p$file(name= fastqs, exact = TRUE) ## get single file fasta_in = p$file(name = "Homo_sapiens.GRCh38.dna.primary_assembly.fa", exact = TRUE) ## get all single file gtf_in = p$file(name = "Homo_sapiens.GRCh38.84.gtf", exact = TRUE) ``` #### Draft a new task ```{r} ## Add new tasks taskName = paste0("tengfei_star-alignment ",date()) tsk = p$task_add(name = taskName, description = "star test", app = "tengfei/api-testing-2/newcopyofstar/0", inputs = list(sjdbGTFfile = gtf_in, fastq = fastq_in, genomeFastaFiles = fasta_in)) ``` Remember "fastq" expect a list of files. You can also do something like ```{r} f1 = p$file(name = "SRR1039508_1.fastq", exact = TRUE) f2 = p$file(name = "SRR1039508_2.fastq", exact = TRUE) ## get all 2 exact files fastq_in = list(f1, f2) ## or if you know you only have 2 file name matching SRR924146*.fastq fastq_in = p$file(name = "SRR1039508*.fastq", complete = TRUE) ``` Using `complete = TRUE` when items is over 100. #### Draft a batch task Now let's do a batch with 8 files in 4 group, which is batch by metadata sample_id and library_id, assume each file has these two metadata fields in the system. And it could be evenly grouped into 4. So we will have a single parent task with 4 sub tasks. ```{r} fastqs <- c("SRR1039508_1.fastq", "SRR1039508_2.fastq", "SRR1039509_1.fastq", "SRR1039509_2.fastq", "SRR1039512_1.fastq", "SRR1039512_2.fastq", "SRR1039513_1.fastq", "SRR1039513_2.fastq") ## get all 8 files fastq_in = p$file(name= fastqs, exact = TRUE) ## can also try to returned all SRR*.fastq files ## fastq_in = p$file(name= "SRR*.fastq", complete = TRUE) tsk = p$task_add(name = taskName, description = "Batch Star Test", app = "tengfei/api-testing-2/newcopyofstar/0", batch = batch(input = "fastq", criteria = c("metadata.sample_id","metadata.noexist_id")), inputs = list(sjdbGTFfile = gtf_in, fastq = fastqs_in, genomeFastaFiles = fasta_in)) ``` Now you have a draft batch task, please check it out in the UI. ### Run a task Now run it. ```{r} ## Run your task tsk$run() ``` Before you run it, you can delete your draft task or update it. ```{r} ## not run ## tsk$delete() ``` After you run a task, you can abort it ```{r} ## Abort your task tsk$abort() ``` If you want to update your task then re-run. ```{r} tsk$getInputs() ## missing number input, only update number tsk$update(inputs = list(sjdbGTFfile = "some new file")) ## double check tsk$getInputs() ``` ### Task monitoring To monitor the task, you can always call `update` on task object to check the status. ```{r} tsk$update() ``` Or more fun, you can monitor a running task with hook function, so trigger a function when that status is "completed", "running" etc, please check the details in section about hook of task. By default it just show message when the task is completed. ```{r} ## Monitor your task (skip this part) ## tsk$monitor() ``` To download all files from a completed tasks ```{r} tsk$download("~/Downloads") ``` More fun to set task hook, `setTaskHook`: connect a function call to the status of a task, when you run tsk$monitor(time = 30) it will check that task every 30 seconds for the task running status by api call for following status, ("queued", "draft", "running", "completed", "aborted", "failed") and it triggered the function call based on returned status and `getTaskHook` return the function call for specific status By default when you monitor a running task, it's only printing status and exit when it's completed. ```{r, eval = TRUE} getTaskHook("completed") ``` If you want to customize the monitor function, there is a requirements - Your function must return `TRUE` or `FALSE` in the end. When it's `TRUE` (or non-logical value) it means the monitoring will be terminated after status matched and function execution, for example when task is completed. When it's `FALSE` it means the monitoring will continue for next iteration of checking, e.g. when it's "runing", you want to keep tracking. To set a new function to monitor the status "completed", when it's completed, download all task output files to local folder. ```{r} setTaskHook("completed", function(){ tsk$download("~/Downloads") return(TRUE) }) tsk$monitor() ``` ## User-friendly API This is what the package try to help, and provide a user-friendly interface that we suggest our users to use, so you don't have to combine several `api()` calls and refer to the API documentation all the times to finish a simple task. ### Authentification #### Set up default token for different platforms You can create a file called '.sbg.auth.yml' in your home folder, and maintain multiple account for a list of platforms, including private or public ones. ``` us: url: https://api.sbgenomics.com/v2/ user: tengfei: token: fake_token yintengfei: token: fake_token cgc: url: https://cgc-api.sbgenomics.com/ user: tengfei: token: fake_token gcp: url: https://gcp-api.sbgenomics.com/v2/ user: tengfei: token: fake_token ``` When you load sevenbridges package, it will first try to parse your token configuration file first into an options list. ```{r} ## Create Auth object from config file a <- Auth(username = "yintengfei", platform = "us") ## show all getToken() ## show all pre-set user token for platform getToken("cgc") ## show individual token for a user getToken(platform = "cgc", username = "tengfei") ``` Note: when you edit your .sbg.auth.yml, you have to reload your package. #### Create Auth object directly First thing first, you need to construct an Auth object, everything begins with this object, it stores - The authentication token - The API URL - The platform (US platform, Cancer Genomics Cloud etc), this is optional, will translate into API url. The logic is like this 1. If you didn't pass url or token, we think you are loading from config file 2. if no platform or user provided, will use the first item in your token config file, __this is not recommended, at least provide platform/username set.__ ```{r} library(sevenbridges) ## direct setup a <- Auth(token = "1c0e6e202b544030870ccc147092c257", url = "https://cgc-api.sbgenomics.com/v2/") ``` By default it points to Cancer Genomics Cloud platform, unless you specify - API URL (more flexible) - or Platform (currently support 'cgc', 'us', 'gcp') Note: when you construct the Auth object, make sure you input the correct platform or API url for your authentication. On Seven Bridges related platforms, you can always find it under your account setting and developer tab. For the tutorial about how to get your authentication, please check - [Get token from US platform](https://docs.sbgenomics.com/display/developerhub/Authentication+Token) - [Get token from Cancer Genomics Cloud Platform](http://docs.cancergenomicscloud.org/docs/the-cgc-api) ### List All API calls If we didn't pass any parameters to api() from Auth, it will list all API calls, and anything parameter we provided will pass on to api() function, but you don't need to input token and url again! The Auth object will know that information already.\ And this call from Auth object will check the response too. ```{r} a$api() ``` ### Offset and Limit and Search `offset` specify where it is started, and `limit` specify how many you want to show from there (max: 100). Because the item could be thousands of files and apps, so by default the offset and limit is set to 0 and 100 accordingly. ```{r} getOption("sevenbridges")$offset getOption("sevenbridges")$limit ``` Please pay attention to this - Search by `id` is most accurate and fast for any Item like Project, App, Task, File. - Search by name will only search across current pulled content, so use `complete = TRUE` if you want to search across everything, this might be slow. For example, to list all public apps, use `visibility` argument, but make sure you pass `complete = TRUE` to it, to show every single things. This arguments generally works for items like "App", "Project", "Task", "File" etc ```{r} ## first, search by id is fast x <- a$app(visibility = "public", id = "djordje_klisic/public-apps-by-seven-bridges/sbg-ucsc-b37-bed-converter/0") ## show 100 items from public x <- a$app(visibility = "public") length(x) ## 100 x <- a$app(visibility = "public", complete = TRUE) length(x) ## 211 by March, 2016 ## this return nothing, because it's not in the first 100 a$app(visibility = "public", name = "bed converter") ## this return an app, because it pulls all apps and did serach. a$app(visibility = "public", name = "bed converter", complete = TRUE) ``` ### Rate Limits This call returns information about your current rate limit. This is the number of API calls you can make in one hour. ```{r} a$rate_limit() ``` ### Users This call returns a list of the resources, such as projects, billing groups, and organizations, that are accessible to you. If you are not an administrator, this call will only return a successful response if {username} is replaced with your own username. If you are an administrator, you can replace {username} with the username of any CGC user, to return information on their resources. _Case sensitivity_: Don't forget to capitalize your username in the same way as you set it when you registered on the CGC. If you don't provide a username, your user information will be shown. ```{r} ## return your information a$user() ## return user 'tengfei''s information a$user("tengfei") ``` ### Billing Group and Invoices #### For billing if no id provided, This call returns a list of paths used to access billing information via the API. else, This call lists all your billing groups, including groups that are pending or have been disabled. if `breakdown = TRUE`, This call returns a breakdown of spending per-project for the billing group specified by billing_group. For each project that the billing group is associated with, information is shown on the tasks run, including their initiating user (the runner), start and end times, and cost. ```{r} ## return a BillingList object (b <- a$billing()) a$billing(id = b$id, breakdown = TRUE) ``` #### For invoices If no id provided, This call returns a list of invoices, with information about each, including whether or not the invoice is pending and the billing period it covers. The call returns information about all your available invoices, unless you use the query parameter bg_id to specify the ID of a particular billing group, in which case it will return the invoice incurred by that billing group only. if id provided, This call retrieves information about a selected invoice, including the costs for analysis and storage, and the invoice period. ```{r} a$invoice() a$invoice(id = "fake_id") ``` Note (TODO): Invoice is not an object yet, it currently just return a list. ### Project Operation Project is the basic unit to organize different entities: files, tasks, apps, etc. So lots actions comes from this `Project' object. #### List All Projects This call returns a list of all projects you are a member of. Each project's project_id and URL on the CGC will be returned. ```{r} a$project() ``` Then if you want to list the projects owned by and accessible to a particular user, specify the `owner` argument. Each project's ID and URL will be returned. ```{r} a$project(owner = "tengfei") a$project(owner = "yintengfei") ``` To get details about project(s), use `detail = TRUE` ```{r} a$project(detail = TRUE) ``` #### Partial Match Project Name For more friendly interface and convenient search, we support partial name match in this interface. The first argument for the call is "name", users can provide part of the name and we do a search for you automatically. ```{r} ## want to return a project called a$project("hello") ``` #### Create a New Project To create a new project, user need to specify - name (required) - billing_group_id (required) - description (optional) - tags (optional): this has to be a list(), only if you are "TCGA" user, you can create TCGA project by passing tags list("tcga") - type (optional): by default, we are creating a cwl project "v2" ```{r} a$project_new("api_testing_tcga", b$id, description = "Test for API") ``` #### Create a new project with TCGA controlled data on CGC Just need to pass a "tags" list with value "tcga" ```{r} a$project_new("controlled_project", b$id, description = "Test for API", tags = list("tcga")) ``` #### Delete a Project Next we delete what we created for testing, only *single* project could be deleted now by call `$delete()`, so please pay attention to the returned object from `a$project()`, sometimes if you are using partial matching by name, it will return a list. If you want to operate on a list of object, we provide some batch function, please read relevant section. ```{r} ## remove it, not run a$project("api_testing")$delete() ## check ## will delete all projects matcht the name delete(a$project("api_testing_donnot_delete_me")) ``` #### Update/Edit a Project You can update information about an existing project, including - name - description - billing_group ```{r} a$project(id = "tengfei/helloworld") a$project(id = "tengfei/helloworld")$update(name = "Hello World Update", description = "Update description") ``` #### Project Member ##### List members This call returns a list of the members of the specified project. For each member, the response lists: - The member's username on the CGC - The member's permissions in the project specified ```{r} a$project(id = "tengfei/demo-project")$member() ``` ##### Add a member This call adds a new user to a specified project. It can only be successfully made by a user who has admin permissions in the project. Requests to add a project member must include the key permissions. However, if you do not include a value for some permission, it will be set to false by default. Set permission by passing: copy, write, execute, admin, read argument. Note: read is implicit and set by default, you can not be project member without having read permission ```{r} m <- a$project(id = "tengfei/demo-project")$member_add(username = "yintengfei") ``` #### Update a member This call edits a user's permissions in a specified project. It can only be successfully made by a user who has admin permissions in the project. ```{r} m <- a$project(id = "tengfei/demo-project")$ member(username = "yintengfei") m$update(copy = TRUE) ``` ``` == Member == username : yintengfei -- Permission -- read : TRUE write : FALSE copy_permission : TRUE execute : FALSE admin : FALSE ``` ##### Delete a member To delete an existing member, just to call `delete()` action on `Member` object. ```{r} m$delete() ## confirm a$project(id = "tengfei/demo-project")$member() ``` #### List all Files To list all files belongs to a project simple use ```{r} p <- a$project(id = "tengfei/demo-project") p$file() ``` ### Files and Metadata and Tags #### List all files A better way to list files in a project ```{r} ## first 100 files, default offset = 0, limit = 100 p$file() ## list all files p$file(complete = TRUE) ``` Alternatively get files from Auth object. ```{r} a$file(project = p$id) a$file(name = "omni", project = p$id, detail = TRUE) ``` #### Search and filter file(s) ##### Rule of thumb You can easily search by partial name, exact name, id for many items like project, apps, files. Fortunately, we have more powerred filters for files. User could serach by metadata, tags and original task id. 1. __First rule: understand the scope__: we know we have default `limit = 100` and `offset = 0` for first 100 items, unless you are using arguments `complete = TRUE`. Things are different when it comes to files, when you search with exact name (search by name with `exact = TRUE` on), metadata, or tags, it searches ALL files, you don't have to use `complete = TRUE` at all. Project id is required or you can simply calling from a Project object for file searching like examples shows. 2. __Second rule: understand the operation__: is it AND or OR? You can read more docs [here](http://docs.sevenbridges.com/docs/list-files-primary-method). "When filtering on any resource, including the same field several times with different filtering criteria results in an implicit OR operation for that field and the different criteria.When filtering by different specified fields, an implicit AND is performed between those criteria." so which are different fields, metadata, origin.task, tags, those are different fields. __Note__ different metadata fields are treated as different fields as well, so it will be __AND__ operation for different metadata field. So for a quick example, this examples gives me all files in project "tengfei/demo" that with metadata sample_id "Sample1" __OR__ "Sample2", __AND__ the library id has to be "EXAMPLE", __AND__ tags has either "hello" __OR__ "world". ```{r} p = a$project(id = "tengfei/demo") p$file(metadata = list(sample_id = "Sample1", sample_id = "Sample2", library_id = "EXAMPLE"), tag = c("hello", "world")) ``` ##### Search by names and id There are two ways to return exact file you want, one is by `id` the other one is by exact `name`. For advanced users: when using name with `exact = TRUE`, it directly use our public API call to match exact name and return the object with single query. When you turn off `exact`, it will grep all files or files defined with `limit` and `offset` first, then match by name which could be slow. To get file id, you can check the url at single file detail page or mouse hover the file to check browser link info in the bottom corner. ```{r} ## return single object id is "some_file_id" p$file(id = "some_file_id") ## return single object named a.fastq p$file(name = "a.fastq", exact = TRUE) ## public file search using Auth object a$public_file(name = "ucsc.hg19.fasta.fai", exact = TRUE) a$public_file(id = "578cf94a507c17681a3117e8") ``` Get more than one objects, the arguments `id` and `name` both accept vectors, so you can pass more than one id or name to it. ```{r} ## get two files p$file(name = c("test1.fastq", "test2.fastq"), exact = TRUE) ## get two files from public files using shorhand a$public_file(name = c("ucsc.hg19.fasta.fai", "ucsc.hg19.fasta"), exact = TRUE) ``` If you don't use exact, it will assume it's pattern grep, so partically matching will count ```{r} ## get matchd the pattern for searching first 100 files p$file(name = c("gz", "fastq")) ## get all matched files from the project p$file(name = c("gz", "fastq"), complete = TRUE) ## get all files matched ucsc a$public_file(name = "ucsc.hg19", complete = TRUE) ``` ##### Search by metadata Still the same example, just make sure different metadata fields, they have __AND__ operation and the same metadata fields they have __OR__ operation. So list all files in project "tengfei/demo", that has sampple_id "Sample1" __OR__ "Sample2"; __AND__ it must has another metadata field lirary_id set to "EXAMPLE". ```{r} p = a$project(id = "tengfei/demo") p$file(metadata = list(sample_id = "Sample1", sample_id = "Sample2", library_id = "EXAMPLE")) ``` ##### Search by tags Tags are more flexible than metadata, you can use multiple tags filter and it's __OR__ operation. This examples shows how to return all files with tag "s1" __OR__ "s2". ```{r} p = a$project(id = "tengfei/demo") p$file(tag = c("s1", "s2")) ``` ##### Search by original task id You can also get all files from a task, there are two ways to do it you can started from "Task" object as you imagined. You can also use the filter crateria. ```{r} ## list all outputs file from a task id a$task(id = "53020538-6936-422f-80de-02fa65ae4b39")$file() ## OR p = a$project(id = "tengfei/demo") p$file(origin.task = "53020538-6936-422f-80de-02fa65ae4b39") ``` Pick your favourite way. #### Copy a file or group of files This call copies the specified file to a new project. Files retain their metadata when copied, but may be assigned new names in their target project. Note that Controlled Data files may not be copied to Open Data projects. To make this call, you should have copy permission within the project you are copying from. Let's try to copy a file from CGC public files, the id you can tell from the url is "561e1b33e4b0aa6ec48167d7" You must provide - `id` file id, or list/vector of files ids. - `project` parameter: project id. - `name` is optional, if omitted, use the same. ```{r} ## 1000G_omni2.5.b37.vcf fid <- "561e1b33e4b0aa6ec48167d7" fid2 <- "561e1b33e4b0aa6ec48167d3" pid <- a$project("demo")$id a$copyFile(c(fid, fid2), project = pid) a$project(id = pid)$file() ``` NOTE: to copy a group of files, you need `Auth$copyFile()` interface. The id of those files in your project will be different from public id. Alternatively you can do __single file__ copy like this ```{r} a$project("hello")$file(id = fid)$copyTo(pid) ``` #### Delete file(s) Note: the `delete` action only work for single file now, make sure your `file` call return a single file not a file list. ```{r} a$project("demo")$file()[[1]]$delete() ## confirm the deletion a$project("demo")$file() ``` You can also delete a group of files or `FilesList` object, __be careful__ with this function! ```{r} ## return 5 files a$project("demo")$file("phase1") ## delete all of them delete(a$project("demo")$file("phase1")) a$project("demo")$file("phase1") ``` #### Download files To get the download information, basically a url, please use ```{r} a$project("demo")$file()[[1]]$download_url() ``` To download directly from R, use `download` call directly from single File object. ```{r} fid <- a$project("demo")$file()[[1]]$id a$project("demo")$file(id = fid3)$download("~/Downloads/") ``` I also created `download` function for `FilesList` object to save your time ```{r} fls <- a$project("demo")$file() download(fls, "~/Downloads/") ``` To download all files from a project. ```{r} a$project("demo")$download("~/Downloads") ``` #### Upload files Seven Bridges platforms provide couple different ways for data import - command line uploader - graphic UI uploader - from ftp, http etc from interface directly - api uploader that you can directly call with sevenbridges package API client uploader is working like this, simply call `project$upload` function to upload a file a file list or a folder recursively... ```{r} a <- Auth(username = "tengfei", platform = "cgc") fl <- system.file("extdata", "sample1.fastq", package = "sevenbridges") (p <- a$project(id = "tengfei/quickstart")) ## by default load .meta for the file p$upload(fl, overwrite = TRUE) ## pass metadata p$upload(fl, overwrite = TRUE, metadata = list(library_id = "testid2", platform = "Illumina x11")) ## rename p$upload(fl, overwrite = TRUE, name = "sample_new_name.fastq", metadata = list(library_id = "new_id")) ``` Upload a folder ```{r} dir.ext <- system.file("extdata", package = "sevenbridges") list.files(dir.ext) p$upload(dir.ext, overwrite = TRUE) ``` Upload a file list ```{r} dir.ext <- system.file("extdata", package = "sevenbridges") ## enable full name fls <- list.files(dir.ext, recursive = TRUE, full.names = TRUE) p$upload(fls, overwrite = TRUE) p$upload("~/Documents/Data/sbgtest/1000G_phase1.snps.high_confidence.b37.vcf") ``` #### Update a file You can call `update()` function from Files object, following things could be updated - name - metadata (list): this is going to overwrite all meta for the file, so please provide the full list. For more flexible operation, please check next section about Metadata. If no parameters provided, will just get detail for the same file and update the object itself. ```{r} (fl <- a$project(id = "tengfei/demo-project")$file(name = "sample.fastq")) ``` ``` == File == id : 56c7916ae4b03b56a7d7 name : sample.fastq project : tengfei/demo-project ``` Show metadata ```{r} ## show metadata fl$meta() ``` Update meta ```{r} fl$update(name = "sample.fastq", metadata = list(new_item1 = "item1", new_item2 = "item2", file_extension = "fastq")) ## check it out fl$meta() ``` #### Metadata Operation A full list of metadata fields and their permissible values on the CGC is available on the page [TCGA Metadata](http://docs.cancergenomicscloud.org/v1.0/docs/tcga-metadata-on-the-cgc). Note that the file name is not the same as its ID. The ID is a hexadecimal string, automatically assigned to a file in a project. The file's name is a human-readable string. For information, please see the API overview. To get metadata for a file call `meta()`. ```{r} ## meta is pulling the latest information via API fl$meta() ## field meta data saved the previous saved one fl$metadata ``` Although CGC defined a set of meta schema, which is visible on the UI of the platform, but you can pass any free form of meta for the file, it's just not visible on UI, but it's stored with the data. Only the value being specified stored with files, to set metadata please call `set_meta()` from Files object. __Important__: - By default, we are not overwriting the meta field using `set_meta` call, unless you pass the `overwrite = TRUE` argument ```{r} fl$set_meta(new_item3 = "item3") fl ## oops it removed rest of the meta fl$set_meta(new_item4 = "item4", overwrite = TRUE) fl ``` Let's keep playing with meta, if you are really interested in the default schema that shown on the UI, you can use `Metadata()` constructor and check details of each meta; Simply call the function (name of meta), it will show description and enumerated items. Please pay attention to `suggested_values` field. ```{r} ## check which schema we have Metadata()$show(full = TRUE) ## check details for each, play with it platform() paired_end() quality_scale() ``` You can see some have suggested value, to construct the Metadata, we encourage you use `Metadata()` directly, pass metadata directly into the call, it will do the validation. ```{r} Metadata(platform = "Affymetrix SNP Array 6.0", paired_end = 1, quality_scale = "sanger", new_item = "new test") ``` #### Tag for file(s) Tags are different from metadata, it's more convenient and flexible and visible from file list UI on the platform. To add a new tag using `obj$add_tag()` method for `File` object with a single tag string or a list/vector of tags. To complete rewrite the tags for file, use `obj$set_tag()` method, by default the argument `overwrite = TRUE` which means swipe and reset, when it's set to `FALSE`, it is basically `add_tag` call. You can also filter files by tags. Let's get a file called "sample.bam" and play with it's tag. ```{r} p <- a$project(id = "tengfei/s3tutorial") fl = p$file("sample.bam", exact = TRUE) ## show tags for single file fl$tag() ## add new tags fl$add_tag("new tag") ## equavilent to fl$set_tag("new tag 2", overwrite = FALSE) ## set tags to overwrite existing x = list("this", "is", 1234) fl$set_tag(x) ## filter by tags p$file(tag = c("1234", "new")) p$file(tag = list("1234", "new")) p$file(tag = "1234") ``` We also provide convenient methods for `FilesList`, let's add tag "s1" to a group files matched "Sample1" and add tag "s2" to a group of files that matched "s2", then try to get files by tag filter. ```{r} ## work on a group of files ## add tag "s2" to a group of files named with "Sample2" in it fl2 = p$file("Sample2") add_tag(fl2, "s2") ## add tag "s2" to a group of files named with "Sample1" in it fl1 = p$file("Sample1") add_tag(fl1, "s1") ## filter by tag s1 or s2 p$file(tag = "s1") p$file(tag = "s2") ## get files tagged with s2 and 1234 p$file(tag = list("s2", "s1")) ``` How to do that on the UI, please check out the tutorial [here](http://docs.sevenbridges.com/v1.0/docs/tag-your-files) ### App From now on we are going to have fun with Apps! The CWL(Common Workflow Language) based approach. It gets more and more popular and really designed for reproducible pipeline description and execution. All Seven Bridges platforms support cwl naively in the cloud. So in this section, I will introduce how we are going to do this via API and inside R. #### List all apps This call lists all the apps available to you. ```{r} a$app() ## or show details a$app(detail = TRUE) ``` To search a name, please pass a pattern for the `name` argument; or provide a unique `id`. ```{r} ## pattern match a$app(name = "STAR") ## unique id aid <- a$app()[[1]]$id aid a$app(id = aid) ## get a specific revision from an app a$app(id = aid, revision = 0) ``` To list all apps belong to one project use `project` argument ```{r} ## my favorite, always a$project("demo")$app() ## or alternatviely pid <- a$project("demo")$id a$app(project = pid) ``` To list all public apps, use `visibility` argument ```{r} ## show 100 items from public x = a$app(visibility = "public") length(x) x = a$app(visibility = "public", complete = TRUE) length(x) x = a$app(project = "tengfei/helloworld", complete = TRUE) length(x) a$app(visibility = "public", limit = 5, offset = 150) ``` To search an app cross all published apps (this may take a while) ```{r} a$app("STAR", visibility = "public", complete = TRUE) ``` #### Copy an App This call copies the specified app to the specified project. The app should be one in a project that you can access; this could be an app that has been uploaded to the CGC by a project member, or a publicly available app that has been copied to the project. Need two arguments - project: id character - name: optional, to re-name your app ```{r} aid <- a$public_app()[[1]]$id a$copy_app(aid, project = pid, name = "copy-rename-test") ## check it is copied a$app(project = pid) ``` Or you can copy directly from an app object ```{r} app = a$public_app(id = "admin/sbg-public-data/rna-seq-alignment-star") app$copy_to(project = "tengfei/api-testing", name = "copy of star") ``` #### Get CWL from an App This call returns information about the specified app, as raw CWL. The call differs from the call to GET details of an app by returning a JSON object that is the CWL. The app should be one in a project that you can access; this could be an app that has been uploaded to the CGC by a project member, or a publicly available app that has been copied to the project. To get a specific revision, pass `revision` argument. ```{r} ap <- a$app(visibility = "public")[[1]] a$project("demo")$app("index") ## get a specific revision a$project("demo")$app("index", revision = 0) ``` TODO: convert it to an CWL object #### Add CWL as an APP Use `app_add` function call from a `Project` object, two parameters required - short_name: a short id for your app, alphanumeric character, no spacing; this not name field. - filename: you json file for cwl. ```{r} cwl.fl <- system.file("extdata", "bam_index.json", package = "sevenbridges") a$project("demo")$app_add(short_name = "new_bam_index_app", filename = cwl.fl) a$project("demo")$app_add(short_name = "new_bam_index_app", revision = 2, filename = cwl.fl) ``` Note: provide the same short_name will add new revision #### Directly Describe CWL in R This is fun and is introduced in another vignette. ### Task Operation #### List tasks This call returns a list of tasks that you can access. You are able to filter by status ```{r} ## all tasks a$task() ## filter a$task(status = "completed") a$task(status = "running") ``` To list all tasks in a project ```{r} ## better way a$project("demo")$task() ## alternatively pid <- a$project("demo")$id a$task(project = pid) ``` To list all tasks with details just pass `detail = TRUE`. ```{r} p$task(id = "your task id here", detail = TRUE) p$task(detail = TRUE) ``` To list a batch task using `parent` parameter, pass the batch parent task id. ```{r} p = a$project(id = "tengfei/demo") p$task(id = "2e1ebed1-c53e-4373-870d-4732acacbbbb") p$task(parent = "2e1ebed1-c53e-4373-870d-4732acacbbbb") p$task(parent = "2e1ebed1-c53e-4373-870d-4732acacbbbb", status = "completed") p$task(parent = "2e1ebed1-c53e-4373-870d-4732acacbbbb", status = "draft") ``` #### Create a draft task To create a draft, you need to call the `task_add` function from Project object. And you need to pass following arguments - name: name for this task - description: description for this task - app: app id you have access to - inputs: inputs list for this task ```{r} ## push an app first fl.runif <- system.file("extdata", "runif.json", package = "sbgr") a$project("demo")$app_add("runif_draft", fl.runif) runif_id <- "tengfei/demo-project/runif_draft" ## create a draft task a$project("demo")$task_add(name = "Draft runif 3", description = "Description for runif 3", app = runif_id, inputs = list(min = 1, max = 10)) ## confirm a$project("demo")$task(status = "draft") ``` #### Modify a task Call `update` function fro a Task object, you can update - name - description - inputs list (only update items you provided.) ```{r} ## get the single task you want to update tsk <- a$project("demo")$task("Draft runif 3") tsk tsk$update(name = "Draft runif update", description = "draft 2", inputs = list(max = 100)) ## alternative way to check all inputs tsk$getInputs() ``` #### Run a task This call runs (executes) the specified task. Only tasks whose status is "DRAFT" may be run. ```{r} tsk$run() ## run update without information just return latest information tsk$update() ``` #### Monitor a running task and set function hook To monitor a running task, call `monitor` from a task object - first argument set interval time to check the status - rest arguments might be used for hook function ```{r} tsk$monitor() ``` get and set default hook function for task status, currently failed, completed tasks will break the monitoring. Note: Hook function has to return `TRUE` (break monitoring) or `FALSE` (continuing) in the end. ```{r} getTaskHook("completed") getTaskHook("draft") setTaskHook("draft", function(){message("never happens"); return(TRUE)}) getTaskHook("draft") ``` #### Abort a runing task This call aborts the specified task. Only tasks whose status is "RUNNING" may be aborted. ```{r} ## abort tsk$abort() ## check tsk$update() ``` #### Delete a task Note that you can only delete draft tasks, not running tasks. ```{r} tsklst <- a$task(status = "draft") ## delete a single task tsklst[[1]]$delete() ## confirm a$task(status = "draft") ## delete a list of tasks delete(tsklst) ``` #### Download all files from a completed task ```{r} tsk$download("~/Downloads") ``` #### Run task in bacth mode To run task in batch mode, (check `?batch`) for more details, here is an mock running ```{r} ## batch by items (tsk <- p$task_add(name = "RNA DE report new batch 2", description = "RNA DE analysis report", app = rna.app$id, batch = batch(input = "bamfiles"), inputs = list(bamfiles = bamfiles.in, design = design.in, gtffile = gtf.in))) ## batch by metadata, input files has to have metadata fields specified (tsk <- p$task_add(name = "RNA DE report new batch 3", description = "RNA DE analysis report", app = rna.app$id, batch = batch(input = "fastq", c("metadata.sample_id", "metadata.library_id")), inputs = list(bamfiles = bamfiles.in, design = design.in, gtffile = gtf.in))) ``` ### Volume #### Create a volume ```{r} a = Auth(user = "tengfei", platform = "us") a$add_volume(name = "tutorial_volume", type = "s3", bucket = "tengfei-demo", prefix = "", access_key_id = "AKIAJQENSIA4DJQNZO3A", secret_access_key = "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", sse_algorithm = "AES256", access_mode = "RW") ``` #### List and search all volume ```{r} ## list all volume a$volume() ## get unique volume by id a$volume(id = "tengfei/tengfei_demo") ## partial search by name a$volume(name = "demo") ``` #### Get volume detail ```{r} v = a$volume() v[[1]]$detail() ``` #### Delete volume ```{r} a$volume(id = "tengfei/tengfei_demo")$delete() ``` #### Import file from volume to project This call import a file from volume like s3 bucket to your project. ```{r} v = a$volume(id = "tengfei/tutorial_volume") res = v$import(location = "A-RNA-File.bam.bai", project = "tengfei/s3tutorial", name = "new.bam.bai", overwrite = TRUE) ## get job status update ## state will be "COMPLETED" when it's finished other wise "PENDING" v$get_import_job(res$id) v ``` #### Export file from project to volume __Important__ : - The file selected for export must not be a public file or an alias. Aliases are objects stored in your cloud storage bucket which have been made available on the Platform. The volume you are exporting to must be configured for read-write access. To do this, set the access_mode parameter to RW when creating or modifying a volume. - If this call is successful, the original project file will become an alias to the newly exported object on the volume. The source file will be deleted from the Platform and, if no more copies of this file exist, it will no longer count towards your total storage price on the Platform. When test please update your file to a project. ```{r} res = v$export(file = "579fb1c9e4b08370afe7903a", volume = "tengfei/tutorial_volume", location = "", ## when "" use old name sse_algorithm = "AES256") ## get job status update ## state will be "COMPLETED" when it's finished other wise "PENDING" v$get_export_job(res$id) v ``` ### Public Files and Apps How to get public content for files and apps? So you can get their ids and copy them to your project? In this package, we provide two easy function calls from Authentification object. When you search and get what you want, you can use their id to do more operation like copying to a project. #### Public files ```{r} ## list first 100 files a$public_file() ## list by offset and limit a$public_file(offset = 100, limit = 100) ## simply list everything! a$public_file(complete = TRUE) ## get exact file by id a$public_file(id = "5772b6f0507c175267448700") ## get exact file by name with exact = TRUE a$public_file(name = "G20479.HCC1143.2.converted.pe_1_1Mreads.fastq", exact = TRUE) ## with exact = FALSE by default search by name pattern a$public_file(name = "fastq") a$public_file(name = "G20479.HCC1143.2.converted.pe_1_1Mreads.fastq") ``` Actually the public files are hosted in the project called "admin/sbg-public-data", so of course you can just use `file` api to get files you need. #### Public apps For public apps we have the similar API calls. ```{r} ## list for 100 apps a$public_app() ## list by offset and limit a$public_app(offset = 100, limit = 50) ## search by id a$public_app(id = "admin/sbg-public-data/control-freec-8-1/12") ## search by name in ALL apps a$public_app(name = "STAR", complete = TRUE) ## search by name with exact match a$public_app(name = "Control-FREEC", exact = TRUE, complete = TRUE) ``` ### Get Raw Response from httr In easy API, we return an object which contains the raw response from httr as a field, you can either call `response()` on that object or just get the field out of it ### Batch operation on project/files/tasks Right now, users have to use `lapply` to do those operations themselves. It's simple implementation. In this package, we implement `delete` and `download` for some object like task and project or file. ## Cheatsheet Quick cheat sheet (in progress) ```{r} ## Authentification getToken() a <- Auth(token = token) a <- Auth(token = token, url = "https://cgc-api.sbgenomics.com/v2/") a <- Auth(platform = "us", username = "tengfei") ## list API a$api() ## Rate limits a$rate_limit() ## Users a$user() a$user("tengfei") ## billing a$billing() a$billing(id = , breakdown = TRUE) a$invoice() a$invoice(id = "fake_id") ## Project ### create new project a$project_new(name = , billing_group_id = , description = ) ### list all project owned by you a$project() a$project(owner = "yintengfei") ### partial match p <- a$project(name = , id = , exact = TRUE) ### delete p$delete() ### update p$update(name = , description = ) ### members p$member() p$member_add(username = ) p$member(username = )$update(write = , copy = , execute = ) p$memeber(usrname = )$delete() ## file ### list all files in this project p$file() ### list all public files a$file(visibility = "public") ### copy a$copyFile(c(fid, fid2), project = pid) ### delete p$file(id = fid)$delete() ### download p$file()[[1]]$download_url() p$file(id = fid3)$download("~/Downloads/") ### download all download(p$file()) ### update a file fl$update(name = , metadata = list(a = ,b = , ...)) ### meta fl$meta() fl$setMeta() fl$setMeta(..., overwrite = TRUE) ## App a$app() ### apps in a project p$app() p$app(name, id, revision = ) a$copyApp(aid, project = pid, name = ) ### add p$app_add(short_name = , filename =) ## Task a$task() a$task(name = , id = ) a$task(status = ) p$task() p$task(name = , id = ) p$task(status = ) tsk <- p$task(name = , id = ) tsk$update() tsk$abort() tsk$run() tsk$download() tsk$detele() tsk$getInputs() tsk$monitor() getTaskHook() setTaskHook(statis = , fun =) ```