Package org.biojava.spark.data
Class AtomDataRDD
- java.lang.Object
-
- org.biojava.spark.data.AtomDataRDD
-
public class AtomDataRDD extends java.lang.ObjectA wrapper aroundJavaRDDandDatasetof atoms.- Author:
- Anthony Bradley
-
-
Constructor Summary
Constructors Constructor Description AtomDataRDD(org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> atomRdd)Construct from anJavaRDDAtomDataRDD(org.apache.spark.sql.Dataset<org.biojava.nbio.structure.Atom> atomDataset)Construct from aDataset
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcacheData()Cache the data - for multi-processing.java.util.Map<java.lang.String,java.lang.Long>countByAtomName()Count the number of times each atom name appears.java.util.Map<java.lang.String,java.lang.Long>countByElement()Count the number of times each element appears.java.util.Map<java.lang.String,java.lang.Long>countByGroupAtomName()Get the unique group atom name combinations in this.org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom>getRdd()Get the underlyingJavaRDDfor thisAtomDataRDD.
-
-
-
Constructor Detail
-
AtomDataRDD
public AtomDataRDD(org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> atomRdd)
Construct from anJavaRDD- Parameters:
atomRdd- the inputJavaRDD
-
AtomDataRDD
public AtomDataRDD(org.apache.spark.sql.Dataset<org.biojava.nbio.structure.Atom> atomDataset)
Construct from aDataset- Parameters:
atomDataset- the inputDataset
-
-
Method Detail
-
getRdd
public org.apache.spark.api.java.JavaRDD<org.biojava.nbio.structure.Atom> getRdd()
Get the underlyingJavaRDDfor thisAtomDataRDD.- Returns:
- the underlying
JavaRDDfor thisAtomDataRDD
-
cacheData
public void cacheData()
Cache the data - for multi-processing.
-
countByElement
public java.util.Map<java.lang.String,java.lang.Long> countByElement()
Count the number of times each element appears.- Returns:
- the map of element names (e.g. Ca for Calcium) and the number of times they appear in the RDD
-
countByAtomName
public java.util.Map<java.lang.String,java.lang.Long> countByAtomName()
Count the number of times each atom name appears.- Returns:
- the map of element names (e.g. CA for C-alpha) and the number of times they appear in the RDD
-
countByGroupAtomName
public java.util.Map<java.lang.String,java.lang.Long> countByGroupAtomName()
Get the unique group atom name combinations in this.- Returns:
- the map of counts by a given atom name
-
-