7:["$","div",null,{"style":{"fontSize":"1.125rem","lineHeight":"1.75rem","fontFamily":"\"Inter\", sans-serif","display":"flex","flexDirection":"column","minHeight":"100vh","justifyContent":"space-between","padding":"2rem","color":"whitesmoke","maxWidth":"800px","margin":"0 auto"},"children":[["$","h1",null,{"style":{"fontSize":"1.75rem","marginBottom":"1rem"},"children":"Data Processing with Spark"}],["$","div",null,{"dangerouslySetInnerHTML":{"__html":"

DataFrames allow for powerful data processing using a fluent API.

Filtering and Aggregation

df.filter($\"id\" > 1).groupBy(\"name\").count().show()

Reading from CSV

val csvDf = spark.read.option(\"header\", \"true\").csv(\"data.csv\")

Writing Output

csvDf.write.mode(\"overwrite\").parquet(\"output\")

Spark supports many formats: CSV, Parquet, JSON, and more.

"}}],["$","div",null,{"style":{"marginTop":"3rem","display":"flex","justifyContent":"space-between","width":"100%"},"children":[["$","a",null,{"href":"/learn/scala/data-science/rdd-vs-df","style":{"backgroundColor":"#3b82f6","padding":"0.75rem 2rem","borderRadius":"0.75rem","color":"#000","textDecoration":"none","fontWeight":500,"fontFamily":"\"Inter\", sans-serif","fontSize":"1rem","boxShadow":"0 2px 6px rgba(0, 0, 0, 0.1)"},"children":"← Prev"}],["$","a",null,{"href":"/learn/scala/deployment/packaging","style":{"backgroundColor":"#3b82f6","padding":"0.75rem 2rem","borderRadius":"0.75rem","color":"#000","textDecoration":"none","fontWeight":500,"fontFamily":"\"Inter\", sans-serif","fontSize":"1rem","boxShadow":"0 2px 6px rgba(0, 0, 0, 0.1)"},"children":"Next →"}]]}]]}]