Exam 203 back-end services Spark Delta Lake

From MillerSql.com
Revision as of 18:07, 17 November 2024 by NeilM (talk | contribs)

Spark Delta Lake.

This is a layer on top of Spark that provides for relational databases.

By using Delta Lake, you can implement a data lakehouse architecture in Spark.

Delta Lake supports:

  1. CRUD (create, read, update, and delete) operations
  2. ACID atomicity (transactions complete as a single unit of work), consistency (transactions leave the database in a consistent state), isolation (in-process transactions can't interfere with one another), and durability (when a transaction completes, the changes it made are persisted
  3. Data versioning and time travel
  4. Streaming as well as Batch data. Spark Structured Streaming API
  5. Underlying data is in Parquet format only, not CSV.
  6. Can use the Serverless pool in Synapse Studio to query it.