7:["$","div",null,{"style":{"fontSize":"1.125rem","lineHeight":"1.75rem","fontFamily":"\"Inter\", sans-serif","display":"flex","flexDirection":"column","minHeight":"100vh","justifyContent":"space-between","padding":"2rem","color":"whitesmoke","maxWidth":"800px","margin":"0 auto"},"children":[["$","h1",null,{"style":{"fontSize":"1.75rem","marginBottom":"1rem"},"children":"RDDs vs DataFrames"}],["$","div",null,{"dangerouslySetInnerHTML":{"__html":"

Spark provides two core abstractions:

RDD – Low-level, resilient distributed dataset
DataFrame – High-level abstraction similar to SQL tables

RDD Example

val rdd = spark.sparkContext.parallelize(Seq(1, 2, 3))\nrdd.map(_ * 2).collect()

DataFrame Example

import spark.implicits._\nval df = Seq((1, \"Alice\"), (2, \"Bob\")).toDF(\"id\", \"name\")\ndf.show()

Prefer DataFrames for performance and expressiveness.

"}}],["$","div",null,{"style":{"marginTop":"3rem","display":"flex","justifyContent":"space-between","width":"100%"},"children":[["$","a",null,{"href":"/learn/scala/data-science/spark-intro","style":{"backgroundColor":"#3b82f6","padding":"0.75rem 2rem","borderRadius":"0.75rem","color":"#000","textDecoration":"none","fontWeight":500,"fontFamily":"\"Inter\", sans-serif","fontSize":"1rem","boxShadow":"0 2px 6px rgba(0, 0, 0, 0.1)"},"children":"← Prev"}],["$","a",null,{"href":"/learn/scala/data-science/data-processing","style":{"backgroundColor":"#3b82f6","padding":"0.75rem 2rem","borderRadius":"0.75rem","color":"#000","textDecoration":"none","fontWeight":500,"fontFamily":"\"Inter\", sans-serif","fontSize":"1rem","boxShadow":"0 2px 6px rgba(0, 0, 0, 0.1)"},"children":"Next →"}]]}]]}]