Class AtomDataRDD


  • public class AtomDataRDD
    extends java.lang.Object
    A wrapper around JavaRDD and Dataset of atoms.
    Author:
    Anthony Bradley
    • Constructor Summary

      Constructors 
      Constructor Description
      AtomDataRDD​(org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> atomRdd)
      Construct from an JavaRDD
      AtomDataRDD​(org.apache.spark.sql.Dataset<org.biojava.nbio.structure.Atom> atomDataset)
      Construct from a Dataset
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void cacheData()
      Cache the data - for multi-processing.
      java.util.Map<java.lang.String,​java.lang.Long> countByAtomName()
      Count the number of times each atom name appears.
      java.util.Map<java.lang.String,​java.lang.Long> countByElement()
      Count the number of times each element appears.
      java.util.Map<java.lang.String,​java.lang.Long> countByGroupAtomName()
      Get the unique group atom name combinations in this.
      org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> getRdd()
      Get the underlying JavaRDD for this AtomDataRDD.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • AtomDataRDD

        public AtomDataRDD​(org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> atomRdd)
        Construct from an JavaRDD
        Parameters:
        atomRdd - the input JavaRDD
      • AtomDataRDD

        public AtomDataRDD​(org.apache.spark.sql.Dataset<org.biojava.nbio.structure.Atom> atomDataset)
        Construct from a Dataset
        Parameters:
        atomDataset - the input Dataset
    • Method Detail

      • getRdd

        public org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> getRdd()
        Get the underlying JavaRDD for this AtomDataRDD.
        Returns:
        the underlying JavaRDD for this AtomDataRDD
      • cacheData

        public void cacheData()
        Cache the data - for multi-processing.
      • countByElement

        public java.util.Map<java.lang.String,​java.lang.Long> countByElement()
        Count the number of times each element appears.
        Returns:
        the map of element names (e.g. Ca for Calcium) and the number of times they appear in the RDD
      • countByAtomName

        public java.util.Map<java.lang.String,​java.lang.Long> countByAtomName()
        Count the number of times each atom name appears.
        Returns:
        the map of element names (e.g. CA for C-alpha) and the number of times they appear in the RDD
      • countByGroupAtomName

        public java.util.Map<java.lang.String,​java.lang.Long> countByGroupAtomName()
        Get the unique group atom name combinations in this.
        Returns:
        the map of counts by a given atom name