Package dna

Class DNA


  • public class DNA
    extends Object
    • Constructor Detail

      • DNA

        public DNA()
        Constructor for FASTA file process.
      • DNA

        public DNA​(String dnaString)
        Constructor that takes DNA string, convert them into Base, and insert into list.
        Parameters:
        dnaString - DNA sequence
      • DNA

        public DNA​(int n)
        Constructor for random generation of DNA sequence.
        Parameters:
        n - of random sequence to be generated
      • DNA

        public DNA​(File file)
        Constructor that takes file as an argument and saves it in class.
        Parameters:
        file - FASTA file
    • Method Detail

      • buildIndexFile

        public void buildIndexFile​(int k)
                            throws Exception
        This method calculates k-mer at every index from a DNA sequence and insert them into database.
        Parameters:
        k - length of the k-mer
        Throws:
        Exception - if errors occur on database end, throws exception
      • clearTable

        public void clearTable​(int k)
                        throws Exception
        Clears the table.
        Parameters:
        k - length of the k-mer
        Throws:
        Exception - if the table cannot be cleared
      • viewDB

        public void viewDB​(int k)
                    throws Exception
        Views the database.
        Parameters:
        k - length of the k-mer
        Throws:
        Exception - when there is no database
      • getIndexDB

        public List<Long> getIndexDB​(DNA kmer)
                              throws Exception
        This method finds the index of the DNA hash value that matches that of the k-mer.
        Parameters:
        kmer - The sequence of the k-mer
        Returns:
        the list with target indices
        Throws:
        Exception - when an error occurs during reading the input file or the database.
      • getIndex

        public List<Integer> getIndex​(DNA kmer)
        Brute force search.
        Parameters:
        kmer - sequence of the k-mer
        Returns:
        list of first index of the sequence when the k-mer finds its match in the sequence
      • getIndexFile

        public List<Long> getIndexFile​(DNA kmer)
        Parameters:
        kmer - sequence of the k-mer
        Returns:
        the list with indices of k-mer from the sequence that matches the actual k-mer
      • getIndexRange

        public List<Integer> getIndexRange​(DNA kmer,
                                           int start,
                                           int end)
        This method uses brute force to find indices of k-mer from the sequence that matches the actual k-mer.
        Parameters:
        kmer - sequence of the k-mer
        start - start index of the sequence
        end - end index of the sequence
        Returns:
        the list with target indices
      • isSame

        public boolean isSame​(DNA kmer,
                              int index)
        Compare each character of k-mer to each character of the entire sequence.
        Parameters:
        kmer - the sequence of the actual k-mer
        index - index from the DNA sequence
        Returns:
        true if the k-mer finds its match in the sequence
      • getIndexHash

        public List<Integer> getIndexHash​(DNA kmer)
        This method calculates hash value of k-mer from the sequence using Rabin-Karp algorithm.
        Parameters:
        kmer - sequence of the actual k-mer
        Returns:
        list with indices of k-mer from the sequence that matches the hash value of the actual k-mer
      • getIndexBit

        public List<Integer> getIndexBit​(DNA kmer)
        Calculate hash values of sub-sequences using Rabin-Karp algorithm and bit operation.
        Parameters:
        kmer - sequence of the actual k-mer
        Returns:
        list of indices of matching sub k-mer from the DNA sequence
      • buildIndex

        public Map<Long,​List<Integer>> buildIndex​(int k)
        Stores DNA hash values and its corresponding indices in hash table.
        Parameters:
        k - length of the k-mer
        Returns:
        hash table with DNA hash values and list of target indices
      • buildIndex

        public void buildIndex​(int start,
                               int end,
                               int k,
                               Map<Long,​List<Integer>> map)
        This method builds hash table using DNA hash values and its indices. If there are duplicate DNA hash values in the map, this method simply adds the corresponding index to the existing list. Otherwise, it inserts the DNA hash value into the map and a new list with the current index.
        Parameters:
        start - start index
        end - end index
        k - length of the k-mer
        map - hash table with DNA hash values and its corresponding indices
      • buildIndexFast

        public Map<Long,​List<Integer>> buildIndexFast​(int k)
        This method uses multi-threading to build the hash table.
        Parameters:
        k - length of the k-mer
        Returns:
        map with specific hash value with corresponding index in a list
      • findIndexFast

        public List<Integer> findIndexFast​(Map<Long,​List<Integer>> map,
                                           DNA kmer)
        This method utilizes stream to compare k-mer from the DNA sequence and the actual k-mer and find the indices that matches.
        Parameters:
        map - the hash table with DNA hash values and its corresponding list of indices
        kmer - the sequence of the actual k-mer
        Returns:
        the list of indices that matches between k-mer from the DNA sequence and the actual k-mer
      • getSize

        public int getSize()
        Returns:
        size of the base
      • findIndex

        public List<Integer> findIndex​(Map<Long,​List<Integer>> map,
                                       DNA kmer)
        This method searches for start indices that contains the k-mer.
        Parameters:
        map - map with hash value with corresponding indices in a list
        kmer - target k-mer
        Returns:
        list of indices
      • getIndexFast

        public List<Integer> getIndexFast​(DNA kmer)
        This method utilizes multi-threading to find the matching indices of the DNA sequence to that of the actual k-mer.
        Parameters:
        kmer - the sequence of the k-mer
        Returns:
        the list with matching indices