Data Processing with Spark

DataFrames allow for powerful data processing using a fluent API.

Filtering and Aggregation

df.filter($"id" > 1).groupBy("name").count().show()

Reading from CSV

val csvDf = spark.read.option("header", "true").csv("data.csv")

Writing Output

csvDf.write.mode("overwrite").parquet("output")

Spark supports many formats: CSV, Parquet, JSON, and more.

← PrevNext →