Packages

package graphx

ALPHA COMPONENT GraphX is a graph processing framework built on top of Spark.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. graphx
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Package Members

  1. package impl
  2. package lib

    Various analytics functions for graphs.

  3. package util

    Collections of utilities used by graphx.

Type Members

  1. case class Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0, attr: ED = null.asInstanceOf[ED]) extends Serializable with Product

    A single directed edge consisting of a source id, target id, and the data associated with the edge.

    A single directed edge consisting of a source id, target id, and the data associated with the edge.

    ED

    type of the edge attribute

    srcId

    The vertex id of the source vertex

    dstId

    The vertex id of the target vertex

    attr

    The attribute associated with the edge

  2. abstract class EdgeContext[VD, ED, A] extends AnyRef

    Represents an edge along with its neighboring vertices and allows sending messages along the edge.

    Represents an edge along with its neighboring vertices and allows sending messages along the edge. Used in Graph#aggregateMessages.

  3. class EdgeDirection extends Serializable

    The direction of a directed edge relative to a vertex.

  4. abstract class EdgeRDD[ED] extends RDD[Edge[ED]]

    EdgeRDD[ED, VD] extends RDD[Edge[ED]] by storing the edges in columnar format on each partition for performance.

    EdgeRDD[ED, VD] extends RDD[Edge[ED]] by storing the edges in columnar format on each partition for performance. It may additionally store the vertex attributes associated with each edge to provide the triplet view. Shipping of the vertex attributes is managed by impl.ReplicatedVertexView.

  5. class EdgeTriplet[VD, ED] extends Edge[ED]

    An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.

    An edge triplet represents an edge along with the vertex attributes of its neighboring vertices.

    VD

    the type of the vertex attribute.

    ED

    the type of the edge attribute

  6. abstract class Graph[VD, ED] extends Serializable

    The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges.

    The Graph abstractly represents a graph with arbitrary objects associated with vertices and edges. The graph provides basic operations to access and manipulate the data associated with vertices and edges as well as the underlying structure. Like Spark RDDs, the graph is a functional data-structure in which mutating operations return new graphs.

    VD

    the vertex attribute type

    ED

    the edge attribute type

    Note

    GraphOps contains additional convenience operations and graph algorithms.

  7. class GraphOps[VD, ED] extends Serializable

    Contains additional functionality for Graph.

    Contains additional functionality for Graph. All operations are expressed in terms of the efficient GraphX API. This class is implicitly constructed for each Graph object.

    VD

    the vertex attribute type

    ED

    the edge attribute type

  8. type PartitionID = Int

    Integer identifier of a graph partition.

    Integer identifier of a graph partition. Must be less than 2^30.

  9. trait PartitionStrategy extends Serializable

    Represents the way edges are assigned to edge partitions based on their source and destination vertex IDs.

  10. class TripletFields extends Serializable

    Represents a subset of the fields of an EdgeTriplet or EdgeContext.

    Represents a subset of the fields of an EdgeTriplet or EdgeContext. This allows the system to populate only those fields for efficiency.

  11. type VertexId = Long

    A 64-bit vertex identifier that uniquely identifies a vertex within a graph.

    A 64-bit vertex identifier that uniquely identifies a vertex within a graph. It does not need to follow any ordering or any constraints other than uniqueness.

  12. abstract class VertexRDD[VD] extends RDD[(VertexId, VD)]

    Extends RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins.

    Extends RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently. All operations except reindex preserve the index. To construct a VertexRDD, use the VertexRDD object.

    Additionally, stores routing information to enable joining the vertex attributes with an EdgeRDD.

    VD

    the vertex attribute associated with each vertex in the set.

    Example:
    1. Construct a VertexRDD from a plain RDD:

      // Construct an initial vertex set
      val someData: RDD[(VertexId, SomeType)] = loadData(someFile)
      val vset = VertexRDD(someData)
      // If there were redundant values in someData we would use a reduceFunc
      val vset2 = VertexRDD(someData, reduceFunc)
      // Finally we can use the VertexRDD to index another dataset
      val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile)
      val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b }
      // Now we can construct very fast joins between the two sets
      val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)

Value Members

  1. object Edge extends Serializable
  2. object EdgeContext
  3. object EdgeDirection extends Serializable

    A set of EdgeDirections.

  4. object EdgeRDD extends Serializable
  5. object Graph extends Serializable

    The Graph object contains a collection of routines used to construct graphs from RDDs.

  6. object GraphLoader extends Logging

    Provides utilities for loading Graphs from files.

  7. object GraphXUtils
  8. object PartitionStrategy extends Serializable

    Collection of built-in PartitionStrategy implementations.

  9. object Pregel extends Logging

    Implements a Pregel-like bulk-synchronous message-passing API.

    Implements a Pregel-like bulk-synchronous message-passing API.

    Unlike the original Pregel API, the GraphX Pregel API factors the sendMessage computation over edges, enables the message sending computation to read both vertex attributes, and constrains messages to the graph structure. These changes allow for substantially more efficient distributed execution while also exposing greater flexibility for graph-based computation.

    Example:
    1. We can use the Pregel abstraction to implement PageRank:

      val pagerankGraph: Graph[Double, Double] = graph
        // Associate the degree with each vertex
        .outerJoinVertices(graph.outDegrees) {
          (vid, vdata, deg) => deg.getOrElse(0)
        }
        // Set the weight on the edges based on the degree
        .mapTriplets(e => 1.0 / e.srcAttr)
        // Set the vertex attributes to the initial pagerank values
        .mapVertices((id, attr) => 1.0)
      
      def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double =
        resetProb + (1.0 - resetProb) * msgSum
      def sendMessage(id: VertexId, edge: EdgeTriplet[Double, Double]): Iterator[(VertexId, Double)] =
        Iterator((edge.dstId, edge.srcAttr * edge.attr))
      def messageCombiner(a: Double, b: Double): Double = a + b
      val initialMessage = 0.0
      // Execute Pregel for a fixed number of iterations.
      Pregel(pagerankGraph, initialMessage, numIter)(
        vertexProgram, sendMessage, messageCombiner)
  10. object VertexRDD extends Serializable

    The VertexRDD singleton is used to construct VertexRDDs.

Inherited from AnyRef

Inherited from Any

Ungrouped