Delta Lake is an
open-source storage layer that brings ACID transactions to Apache Spark and big
data workloads.
Delta
Lake offers:
- ACID Transactions
- Scalable Metadata Handling
- Time Travel (data versioning)
- Open Format
- Unified Batch and Streaming Source & Sink
- Schema Enforcement
- Schema Evolution
- Updates and Delete
Start spark
Shell with Delta
Just add io.delta:delta-core_2.12:0.1.0 package and you are ready to use
Delta:
spark-shell --packages
io.delta:delta-core_2.12:0.1.0
Save as
Delta format
To save as Delta, you
just have to add format("delta") to your dataset:
dataframe.write.format("delta").save("/path/to/delta")
Read
Delta format
To read a Delta file,
you just have to add format("delta") to your dataset:
spark.read.format("delta").load("/path/to/delta")
Create
Delta Table
To Create a Delta table:
CREATE TABLE table_name(
date DATE,
eventId STRING,
eventType STRING,
data STRING)
USING DELTA
PARTITIONED BY (date)
LOCATION '/path/to/delta/table_name'
date DATE,
eventId STRING,
eventType STRING,
data STRING)
USING DELTA
PARTITIONED BY (date)
LOCATION '/path/to/delta/table_name'
Or
CREATE TABLE table_name
USING delta
AS SELECT *
FROM parquet.`/path/of/parquet_file/`
USING delta
AS SELECT *
FROM parquet.`/path/of/parquet_file/`
Read a Delta table
You can access data in
Delta tables either by specifying the path (/path/to/delta/table_name) or the table name (table_name):
SELECT * FROM
table_name
or
SELECT * FROM delta. `/path/to/delta/table_name`
or
SELECT * FROM delta. `/path/to/delta/table_name`
Display
table history
To view the history of a
table, use the DESCRIBE HISTORY statement, which provides provenance
information, including the table version, operation, user, and so on, for each
write to a table.
DESCRIBE HISTORY
table_name
Query
an earlier version of the table (time travel)
To query an older
version of a table, specify a version or timestamp in a SELECT statement.
For example, to query
version 0 from the history above, use:
SELECT * FROM
table_name VERSION AS OF 0
Follow here if you interested to develop Modern Data Warehouse solution using Delta Lake.
Post a Comment
Post a Comment
Thanks for your comment !
I will review your this and will respond you as soon as possible.