Package org.biojava.spark.data
Class GroupDataRDD
- java.lang.Object
-
- org.biojava.spark.data.GroupDataRDD
-
public class GroupDataRDD extends java.lang.ObjectAn RDD to compriseGrouplevel data.- Author:
- Anthony Bradley
-
-
Constructor Summary
Constructors Constructor Description GroupDataRDD(org.apache.spark.api.java.JavaPairRDD<java.lang.String,org.biojava.nbio.structure.Group> groupRdd)Constructor of the RDD from aJavaPairRDDofGroup
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcacheData()Cache the data - for multi-processing.java.util.Map<java.lang.String,java.lang.Long>countByGroupName()Count the number of times each group name appears.AtomDataRDDgetAtoms()Get the atoms from the groups.org.apache.spark.api.java.JavaPairRDD<java.lang.String,org.biojava.nbio.structure.Group>getGroupRdd()Get theJavaPairRDDofGroupdata.
-
-
-
Method Detail
-
cacheData
public void cacheData()
Cache the data - for multi-processing.
-
getGroupRdd
public org.apache.spark.api.java.JavaPairRDD<java.lang.String,org.biojava.nbio.structure.Group> getGroupRdd()
Get theJavaPairRDDofGroupdata.- Returns:
- the
JavaPairRDDofGroupdata
-
countByGroupName
public java.util.Map<java.lang.String,java.lang.Long> countByGroupName()
Count the number of times each group name appears.- Returns:
- the map of group names (e.g. LYS for Lysine) and the number of times they appear in the RDD
-
getAtoms
public AtomDataRDD getAtoms()
Get the atoms from the groups.- Returns:
- the atoms for all the groups
-
-