Class FastaStructureParser


  • public class FastaStructureParser
    extends Object
    Reads a protein sequence from a fasta file and attempts to match it to a 3D structure. Any gaps ('-') in the fasta file are preserved as null atoms in the output, allowing structural alignments to be read from fasta files.

    Structures are loaded from an AtomCache. For this to work, the accession for each protein should be parsed from the fasta header line into a form understood by AtomCache.getStructure(String).

    Lowercase letters are sometimes used to specify unaligned residues. This information can be preserved by using a CasePreservingSequenceCreator, which allows the case of residues to be accessed through the AbstractSequence.getUserCollection() method.

    Author:
    Spencer Bliven
    • Constructor Summary

      Constructors 
      Constructor Description
      FastaStructureParser​(File file, org.biojava.nbio.core.sequence.io.template.SequenceHeaderParserInterface<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> headerParser, org.biojava.nbio.core.sequence.io.template.SequenceCreatorInterface<org.biojava.nbio.core.sequence.compound.AminoAcidCompound> sequenceCreator, AtomCache cache)  
      FastaStructureParser​(InputStream is, org.biojava.nbio.core.sequence.io.template.SequenceHeaderParserInterface<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> headerParser, org.biojava.nbio.core.sequence.io.template.SequenceCreatorInterface<org.biojava.nbio.core.sequence.compound.AminoAcidCompound> sequenceCreator, AtomCache cache)  
      FastaStructureParser​(org.biojava.nbio.core.sequence.io.FastaReader<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> reader, AtomCache cache)  
    • Constructor Detail

      • FastaStructureParser

        public FastaStructureParser​(InputStream is,
                                    org.biojava.nbio.core.sequence.io.template.SequenceHeaderParserInterface<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> headerParser,
                                    org.biojava.nbio.core.sequence.io.template.SequenceCreatorInterface<org.biojava.nbio.core.sequence.compound.AminoAcidCompound> sequenceCreator,
                                    AtomCache cache)
      • FastaStructureParser

        public FastaStructureParser​(File file,
                                    org.biojava.nbio.core.sequence.io.template.SequenceHeaderParserInterface<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> headerParser,
                                    org.biojava.nbio.core.sequence.io.template.SequenceCreatorInterface<org.biojava.nbio.core.sequence.compound.AminoAcidCompound> sequenceCreator,
                                    AtomCache cache)
                             throws FileNotFoundException
        Throws:
        FileNotFoundException
      • FastaStructureParser

        public FastaStructureParser​(org.biojava.nbio.core.sequence.io.FastaReader<org.biojava.nbio.core.sequence.ProteinSequence,​org.biojava.nbio.core.sequence.compound.AminoAcidCompound> reader,
                                    AtomCache cache)
    • Method Detail

      • getSequences

        public org.biojava.nbio.core.sequence.ProteinSequence[] getSequences()
        Gets the protein sequences read from the Fasta file. Returns null if process() has not been called.
        Returns:
        An array ProteinSequences from parsing the fasta file, or null if process() hasn't been called.
      • getStructures

        public Structure[] getStructures()
        Gets the protein structures mapped from the Fasta file. Returns null if process() has not been called.
        Returns:
        An array of Structures for each protein in the fasta file, or null if process() hasn't been called.
      • getResidues

        public ResidueNumber[][] getResidues()
        For each residue in the fasta file, return the ResidueNumber in the corresponding structure. If the residue cannot be found in the structure, that entry will be null. This can happen if that residue was not included in the PDB file (eg disordered residues), if the fasta sequence does not match the PDB sequence, or if errors occur during the matching process.
        Returns:
        A 2D array of ResidueNumbers, or null if process() hasn't been called.
        See Also:
        StructureSequenceMatcher.matchSequenceToStructure(ProteinSequence, Structure)
      • getAccessions

        public String[] getAccessions()
        Gets the protein accessions mapped from the Fasta file. Returns null if process() has not been called.
        Returns:
        An array of Structures for each protein in the fasta file, or null if process() hasn't been called.