Delta Lake in Apache Spark

Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

Delta Lake offers:

ACID Transactions
Scalable Metadata Handling
Time Travel (data versioning)
Open Format
Unified Batch and Streaming Source & Sink
Schema Enforcement
Schema Evolution
Updates and Delete

Start spark Shell with Delta

Just add io.delta:delta-core_2.12:0.1.0 package and you are ready to use Delta:

spark-shell --packages io.delta:delta-core_2.12:0.1.0

Save as Delta format

To save as Delta, you just have to add format("delta") to your dataset:

dataframe.write.format("delta").save("/path/to/delta")

Read Delta format

To read a Delta file, you just have to add format("delta") to your dataset:

spark.read.format("delta").load("/path/to/delta")

Create Delta Table

To Create a Delta table:

CREATE TABLE table_name(
date DATE,
eventId STRING,
eventType STRING,
data STRING)
USING DELTA
PARTITIONED BY (date)
LOCATION '/path/to/delta/table_name'

CREATE TABLE table_name
USING delta
AS SELECT *
FROM parquet.`/path/of/parquet_file/`

Read a Delta table

You can access data in Delta tables either by specifying the path (/path/to/delta/table_name) or the table name (table_name):

SELECT * FROM table_name
or
SELECT * FROM delta. `/path/to/delta/table_name`

Display table history

To view the history of a table, use the DESCRIBE HISTORY statement, which provides provenance information, including the table version, operation, user, and so on, for each write to a table.

DESCRIBE HISTORY table_name

Query an earlier version of the table (time travel)

To query an older version of a table, specify a version or timestamp in a SELECT statement.

For example, to query version 0 from the history above, use:

SELECT * FROM table_name VERSION AS OF 0

Follow here if you interested to develop Modern Data Warehouse solution using Delta Lake.

Delta Lake in Apache Spark - Basics

Post a Comment

Post a Comment

Contact Form

Delta Lake in Apache Spark - Basics

You might like

Post a Comment

Post a Comment

Contact Form