This page is the main information point for Bioconductors participation in the Google summer of Code project this year (2013).
Each selected student (mentee) will be paid USD$5000 to work on a Bioconductor project for 3 months during the summer.
Students should look at the list of projects and see if any project interests them. Email the project mentors to express your interest, and describe any prior experience.
Students with ideas for Bioconductor projects not listed below are encouraged to email any of the mentors listed below with project ideas.
Students will submit project applications directly to Google.
Google will award a certain number of student slots to the Bioconductor project.
The Bioconductor administrators and mentors will rank projects in order of importance to the project, and the top projects will be funded.
Any selected students will be expected to register with the bioconductor and bioc-devel mailing lists.
There is a timeline posted at Google explaining how this works. Students are encouraged to look at this and make sure that they can commit to this. There is also a FAQ in case people have other questions that are not addressed here.
Background/Motivation: As very large genomic data sets become more and more common, computational biologists are spending inordinate time transforming data from the format of the original resource to a format amenable to computation in their programming language of choice. The R / Bioconductor community needs programmatic access to cloud-based experimental data resources that can be readily incorporated into their own work flows.
AnnotationHub and its supporting packages are primed to support such a project. AnnotationHub provides infrastructure to make well-curated resources available to R software clients, but it needs the addition of a web interface to allow addition of user-supplied resources, including transformation of data into formats amenable to direct use by R clients.
Work with us to create a web accessible interface that does the following:
Familiarity with R and with AJAX Web 2.0 programming.
Using the R packages Rook and rmongodb, write a web form that asks the user for their name and age and stores this information in a MongoDB collection. A second page asks the user for an age and then displays all records for people that age or older. This should be sent to us as an R package so that all we have to do (once we've installed MongoDb) is install the package and run:
library(testPkg)
run()
..and we'll see the desired functionality.
Dan Tenenbaum dtenenba@fhcrc.org Marc Carlson mcarlson@fhcrc.org
Martin Morgan mtmorgan@fhcrc.org
The Shiny package allows for easy creation of interactive web graphics from R objects. Bioconductor packages have many objects that represent biological data or results. For each of these Bioconductor objects, there exists a typical set of visualizations to help users explore their data. Normally, these visuals are replotted several times until certain parameters are tweaked to show the image in a way that conveys a specific insight. This project pairs these standard Bioconductor objects with more user-friendly Shiny visualizations via new display() methods.
You will also explore the following topics:
Create display() methods for each of the four object types mentioned above.
These methods should use Shiny to display a multi-tabbed version of each of the
objects in question where each tab offers an alternate view of the object in
question. Each method should allow the caller to view a table of annotations in
one tab and to also visualize the data using either a heat map or a Gviz plot (
as appropriate) in another tab.
Make sure that your R code to draws these 4 very similar methods is written in such a way so that you don't repeat yourself all the time.
Integrate these tests into the appropriate bioconductor packages along with proper documentation. Which package is appropriate will be determined later as the project develops.
Familiarity with R. Understanding of basic computational biology or an willingness/ability to learn about such things as needed.
Find the example from the manual page for Biobase::ExpressionSet, and run it. And then plot the relevant data contained in the generated ExpressionSet object as a heatmap and save the output. Then send me the plot. BONUS: make sure that all the labels in your plot are fully visible.
Marc Carlson mcarlson@fhcrc.org, Michael Lawrence lawrence.michael@gene.com
Martin Morgan mtmorgan@fhcrc.org
High-throughput sequencing generates data sets consisting of hundreds of millions of sequence reads per sample. As with any large data, timely processing depends on parallel computing. The Bioconductor project has developed the BiocParallel package, an abstraction around several parallel implementations in R. The API is tailored to typical use cases in biological data analysis and integrates with existing Bioconductor data structures. Another package, BatchJobs, executes R functions as scheduled cluster jobs, through an abstraction that has been implemented for several popular schedulers, including LSF, PBS and SGE.
As sequencing data pipelines are typically executed on managed clusters, there is a need for BiocParallel to interact with cluster schedulers. We aim to add a new backend to BiocParallel that delegates to BatchJobs for this interaction.
Source Code & Build Reports »
Source code is stored in
svn
(user: readonly
, pass: readonly
).
Software packages are built and checked nightly. Build reports:
Development Version»
Bioconductor packages under development:
Developer Resources: