RDDs vs DataFrames

Spark provides two core abstractions:

  • RDD – Low-level, resilient distributed dataset
  • DataFrame – High-level abstraction similar to SQL tables

RDD Example

val rdd = spark.sparkContext.parallelize(Seq(1, 2, 3))
rdd.map(_ * 2).collect()

DataFrame Example

import spark.implicits._
val df = Seq((1, "Alice"), (2, "Bob")).toDF("id", "name")
df.show()

Prefer DataFrames for performance and expressiveness.

← PrevNext →