Class UniprotProxySequenceReader<C extends Compound>

    • Constructor Summary

      Constructors 
      Constructor Description
      UniprotProxySequenceReader​(java.lang.String accession, CompoundSet<C> compoundSet)
      The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein.
      UniprotProxySequenceReader​(org.w3c.dom.Document document, CompoundSet<C> compoundSet)
      The xml is passed in as a DOM object so we know everything about the protein.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int countCompounds​(C... compounds)
      Returns the number of times we found a compound in the Sequence
      boolean equals​(java.lang.Object o)  
      AccessionID getAccession()
      Returns the AccessionID this location is currently bound with
      java.util.ArrayList<AccessionID> getAccessions()
      Pull uniprot accessions associated with this sequence
      java.util.ArrayList<java.lang.String> getAliases()
      Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.
      java.util.List<C> getAsList()
      Returns the Sequence as a List of compounds
      C getCompoundAt​(int position)
      Returns the Compound at the given biological index
      CompoundSet<C> getCompoundSet()
      Gets the compound set used to back this Sequence
      java.util.LinkedHashMap<java.lang.String,​java.util.ArrayList<DBReferenceInfo>> getDatabaseReferences()
      The Uniprot mappings to other database identifiers for this sequence
      java.util.ArrayList<java.lang.String> getGeneAliases()
      Pull uniprot gene aliases associated with this sequence
      java.lang.String getGeneName()
      Get the gene name associated with this sequence.
      int getIndexOf​(C compound)
      Scans through the Sequence looking for the first occurrence of the given compound
      SequenceView<C> getInverse()
      Does the right thing to get the inverse of the current Sequence.
      java.util.ArrayList<java.lang.String> getKeyWords()
      Pull UniProt key words which is a mixed bag of words associated with this sequence
      int getLastIndexOf​(C compound)
      Scans through the Sequence looking for the last occurrence of the given compound
      int getLength()
      The sequence length
      java.lang.String getOrganismName()
      Get the organism name assigned to this sequence
      java.util.ArrayList<java.lang.String> getProteinAliases()
      Pull uniprot protein aliases associated with this sequence
      java.lang.String getSequenceAsString()
      Returns the String representation of the Sequence
      java.lang.String getSequenceAsString​(java.lang.Integer bioBegin, java.lang.Integer bioEnd, Strand strand)  
      SequenceView<C> getSubSequence​(java.lang.Integer bioBegin, java.lang.Integer bioEnd)
      Returns a portion of the sequence from the different positions.
      static java.lang.String getUniprotbaseURL()
      The current UniProt URL to deal with caching issues.
      static java.lang.String getUniprotDirectoryCache()
      Local directory cache of XML that can be downloaded
      int hashCode()  
      java.util.Iterator<C> iterator()  
      static void main​(java.lang.String[] args)  
      static <C extends Compound>
      UniprotProxySequenceReader<C>
      parseUniprotXMLString​(java.lang.String xml, CompoundSet<C> compoundSet)
      The passed in xml is parsed as a DOM object so we know everything about the protein.
      void setCompoundSet​(CompoundSet<C> compoundSet)  
      void setContents​(java.lang.String sequence)
      Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence.
      static void setUniprotbaseURL​(java.lang.String aUniprotbaseURL)  
      static void setUniprotDirectoryCache​(java.lang.String aUniprotDirectoryCache)  
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Field Detail

      • UP_AC_PATTERN

        public static final java.util.regex.Pattern UP_AC_PATTERN
      • DEFAULT_UNIPROT_BASE_URL

        public static final java.lang.String DEFAULT_UNIPROT_BASE_URL
        See Also:
        Constant Field Values
    • Constructor Detail

      • UniprotProxySequenceReader

        public UniprotProxySequenceReader​(java.lang.String accession,
                                          CompoundSet<C> compoundSet)
                                   throws CompoundNotFoundException,
                                          java.io.IOException
        The UniProt id is used to retrieve the UniProt XML which is then parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id or network error
        Parameters:
        accession -
        compoundSet -
        Throws:
        CompoundNotFoundException
        java.io.IOException - if problems while reading the UniProt XML
      • UniprotProxySequenceReader

        public UniprotProxySequenceReader​(org.w3c.dom.Document document,
                                          CompoundSet<C> compoundSet)
                                   throws CompoundNotFoundException
        The xml is passed in as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
        Parameters:
        document -
        compoundSet -
        Throws:
        CompoundNotFoundException
    • Method Detail

      • parseUniprotXMLString

        public static <C extends CompoundUniprotProxySequenceReader<C> parseUniprotXMLString​(java.lang.String xml,
                                                                                               CompoundSet<C> compoundSet)
        The passed in xml is parsed as a DOM object so we know everything about the protein. If an error occurs throw an exception. We could have a bad uniprot id
        Parameters:
        xml -
        compoundSet -
        Returns:
        UniprotProxySequenceReader
        Throws:
        java.lang.Exception
      • setContents

        public void setContents​(java.lang.String sequence)
                         throws CompoundNotFoundException
        Once the sequence is retrieved set the contents and make sure everything this is valid Some uniprot records contain white space in the sequence. We must strip it out so setContents doesn't fail.
        Specified by:
        setContents in interface SequenceReader<C extends Compound>
        Parameters:
        sequence -
        Throws:
        CompoundNotFoundException
      • getLength

        public int getLength()
        The sequence length
        Specified by:
        getLength in interface Sequence<C extends Compound>
        Returns:
      • getCompoundAt

        public C getCompoundAt​(int position)
        Description copied from interface: Sequence
        Returns the Compound at the given biological index
        Specified by:
        getCompoundAt in interface Sequence<C extends Compound>
        Parameters:
        position -
        Returns:
      • getIndexOf

        public int getIndexOf​(C compound)
        Description copied from interface: Sequence
        Scans through the Sequence looking for the first occurrence of the given compound
        Specified by:
        getIndexOf in interface Sequence<C extends Compound>
        Parameters:
        compound -
        Returns:
      • getLastIndexOf

        public int getLastIndexOf​(C compound)
        Description copied from interface: Sequence
        Scans through the Sequence looking for the last occurrence of the given compound
        Specified by:
        getLastIndexOf in interface Sequence<C extends Compound>
        Parameters:
        compound -
        Returns:
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
        Returns:
      • getSequenceAsString

        public java.lang.String getSequenceAsString()
        Description copied from interface: Sequence
        Returns the String representation of the Sequence
        Specified by:
        getSequenceAsString in interface Sequence<C extends Compound>
        Returns:
      • getAsList

        public java.util.List<C> getAsList()
        Description copied from interface: Sequence
        Returns the Sequence as a List of compounds
        Specified by:
        getAsList in interface Sequence<C extends Compound>
        Returns:
      • equals

        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • getInverse

        public SequenceView<C> getInverse()
        Description copied from interface: Sequence
        Does the right thing to get the inverse of the current Sequence. This means either reversing the Sequence and optionally complementing the Sequence.
        Specified by:
        getInverse in interface Sequence<C extends Compound>
        Returns:
      • getSequenceAsString

        public java.lang.String getSequenceAsString​(java.lang.Integer bioBegin,
                                                    java.lang.Integer bioEnd,
                                                    Strand strand)
        Parameters:
        bioBegin -
        bioEnd -
        strand -
        Returns:
      • getSubSequence

        public SequenceView<C> getSubSequence​(java.lang.Integer bioBegin,
                                              java.lang.Integer bioEnd)
        Description copied from interface: Sequence
        Returns a portion of the sequence from the different positions. This is indexed from 1
        Specified by:
        getSubSequence in interface Sequence<C extends Compound>
        Parameters:
        bioBegin -
        bioEnd -
        Returns:
      • iterator

        public java.util.Iterator<C> iterator()
        Specified by:
        iterator in interface java.lang.Iterable<C extends Compound>
        Returns:
      • getAccessions

        public java.util.ArrayList<AccessionID> getAccessions()
                                                       throws javax.xml.xpath.XPathExpressionException
        Pull uniprot accessions associated with this sequence
        Returns:
        Throws:
        javax.xml.xpath.XPathExpressionException
      • getAliases

        public java.util.ArrayList<java.lang.String> getAliases()
                                                         throws javax.xml.xpath.XPathExpressionException
        Pull uniprot protein aliases associated with this sequence Provided for backwards compatibility now that we support both gene and protein aliases via separate methods.
        Returns:
        Throws:
        javax.xml.xpath.XPathExpressionException
      • getProteinAliases

        public java.util.ArrayList<java.lang.String> getProteinAliases()
                                                                throws javax.xml.xpath.XPathExpressionException
        Pull uniprot protein aliases associated with this sequence
        Returns:
        Throws:
        javax.xml.xpath.XPathExpressionException
      • getGeneAliases

        public java.util.ArrayList<java.lang.String> getGeneAliases()
                                                             throws javax.xml.xpath.XPathExpressionException
        Pull uniprot gene aliases associated with this sequence
        Returns:
        Throws:
        javax.xml.xpath.XPathExpressionException
      • countCompounds

        public int countCompounds​(C... compounds)
        Description copied from interface: Sequence
        Returns the number of times we found a compound in the Sequence
        Specified by:
        countCompounds in interface Sequence<C extends Compound>
        Parameters:
        compounds -
        Returns:
      • getUniprotbaseURL

        public static java.lang.String getUniprotbaseURL()
        The current UniProt URL to deal with caching issues. www.uniprot.org is load balanced but you can access pir.uniprot.org directly.
        Returns:
        the uniprotbaseURL
      • setUniprotbaseURL

        public static void setUniprotbaseURL​(java.lang.String aUniprotbaseURL)
        Parameters:
        aUniprotbaseURL - the uniprotbaseURL to set
      • getUniprotDirectoryCache

        public static java.lang.String getUniprotDirectoryCache()
        Local directory cache of XML that can be downloaded
        Returns:
        the uniprotDirectoryCache
      • setUniprotDirectoryCache

        public static void setUniprotDirectoryCache​(java.lang.String aUniprotDirectoryCache)
        Parameters:
        aUniprotDirectoryCache - the uniprotDirectoryCache to set
      • main

        public static void main​(java.lang.String[] args)
      • getGeneName

        public java.lang.String getGeneName()
        Get the gene name associated with this sequence.
        Returns:
      • getOrganismName

        public java.lang.String getOrganismName()
        Get the organism name assigned to this sequence
        Returns:
      • getKeyWords

        public java.util.ArrayList<java.lang.String> getKeyWords()
        Pull UniProt key words which is a mixed bag of words associated with this sequence
        Specified by:
        getKeyWords in interface FeaturesKeyWordInterface
        Returns: