One of the cool feature of cloud is easy to adapt cutting edge technologies. A modern data warehouse(MDW) lets you bring together all your data at any scale easily, and means you can get insights through analytical dashboards, operational reports or advanced analytics for all your users. From Microsoft Modern data Warehouse architecture diagram at here you can divide it into 4 logical group as Ingest, Store, Prep/train and last as Model & serve.
As the above diagram shows, the MDW enables you to converge relational and non-relational, or structured and unstructured data into a single hub. Basically there are 3 main steps in this design.
- Data lands in the data lake (ADLS Gen 2) from multiple sources using ADF.
- Azure Databricks transforms the data into different layer (ADLS Gen2) and also move the data into Azure Synapse DW using Databricks notebooks
- AAS models are being refreshed on top of Azure Synapse data and then fed to reporting platforms.
At
the time of writing this blog ADF does not support the ingestion into delta
format, so it is better we ingest into parquet and convert into delta format.
Why
to use Databrciks Delta ?
- Easy, Simple, Fast Development
- Scalable
- Reliable
- SQL like All logic CDC, Evolving Schema
- Stability
- Easy Support and Troubleshooting
- Merge Schema
- Support ACID
- Schema enforcement (Data Integrity)
- Time Travel
- History of records
- Rollback of data
Sample
whole CDC in Delta Lake, in Spark Scala this can convert into thousands line of
code.
The
cleansed data is loaded into Azure Synapse to combine with your
existing data and make it all readily available for analysis through visual
tools like Power BI. Other important Azure resource in this design it to use
Azure key vault for Secrets Versioning,.
Note
Azure data warehouse is now known as Azure synapse.
You
can combine now Modern Data Warehouse solution with Advanced Analytics as shown
in below and store the data to Cosmos DB. This enables you to build highly
responsive and highly available applications worldwide.
One suggestion in adopting above design is to avoid multiple unnecessary resource hops. Why because an application that relies on multiple services has more potential failure points and composite SLA will be lower than the lowest take example if in your pipeline the SLA of 2 services is 99.95 and 99.99 then you have composite SLA of 99.94 only
To learn more follow How to secure Azure Synapse and AAS ? and How to Refresh AAS using Logic App ?
Post a Comment
Post a Comment
Thanks for your comment !
I will review your this and will respond you as soon as possible.