\name{geneMouse} \Rdversion{1.1} \alias{geneMouse} \title{UCSC Gene Predictions for mm9} \description{ Gene coordinates and annotations for M. musculus from UCSC. Coordinates are relative to the mm9 build and are in nucleotides from the 5' end of the positive (\code{"+"}) strand. They are always *one-based*, that is, the coordinate of the first (or leftmost) nucleotide in the strand is 1. Each \dQuote{gene}, or row in the dataset, corresponds to a unique combination of transcript (TSS, TES and exons) and coding sequence (start and end). } \usage{geneMouse()} \value{ A data frame with 49409 observations on the following 12 variables. \enumerate{ \item \code{name}: The name of the gene. \item \code{chrom}: The name of the chromosome the gene is located on. \item \code{strand}: The strand the gene is coded on, \code{"+"}, or \code{"-"}. \item \code{txStart}: Transcription start site. \item \code{txEnd}: Transcription stop site. \item \code{cdsStart}: Start position of the coding sequence. \item \code{cdsEnd}: End position of the coding sequence. \item \code{exonCount}: The number of exons. \item \code{exonStarts}: A comma separated list of the exon start positions. \item \code{exonEnds}: A comma separated list of exon stop positions. \item \code{proteinID}: An ID for the protein produced, missing values are coded as NA. \item \code{alignID}: Unique identifier of each gene and RNA alignment pair, apparently redundant with \code{name}. } } \note{ For genes coded on the negative strand the \code{txStart} is really the end, and similarly for the coding regions. } \source{ This table was obtained by downloading the following database file from UCSC (on Sep 28, 2009): \url{http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/knownGene.txt.gz} and by translating the start coordinates found in the file from zero-based to one-based. The knownGene.txt.gz file is a database dump containing the UCSC track called "UCSC Genes" and described here: \url{http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=knownGene} See \url{http://genome.ucsc.edu/cgi-bin/hgTables} and Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46. All the annotations in this package are freely available for public use, except for the Swiss-Prot/UniProt data in the knownGene table, which has the following terms of use: \preformatted{ UniProt copyright (c) 2002 - 2004 UniProt consortium For non-commercial use all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. For commercial use all databases and documents in the UniProt FTP directory, except the files ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.dat.gz and ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found in: http://www.expasy.org/announce/sp_98.html From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy. } } \examples{ genes <- geneMouse() str(genes) transcripts(genes) }