RDDs vs DataFrames
Spark provides two core abstractions:
- RDD – Low-level, resilient distributed dataset
- DataFrame – High-level abstraction similar to SQL tables
RDD Example
val rdd = spark.sparkContext.parallelize(Seq(1, 2, 3))
rdd.map(_ * 2).collect()
DataFrame Example
import spark.implicits._
val df = Seq((1, "Alice"), (2, "Bob")).toDF("id", "name")
df.show()
Prefer DataFrames for performance and expressiveness.