Design Modern Data Warehouse using Azure Synapse and Azure Databrciks

One of the cool feature of cloud is easy to adapt cutting edge technologies. A modern data warehouse(MDW) lets you bring together all your data at any scale easily, and means you can get insights through analytical dashboards, operational reports or advanced analytics for all your users. From Microsoft Modern data Warehouse architecture diagram at here you can divide it into 4 logical group as Ingest, Store, Prep/train and last as Model & serve. 
As the above diagram shows, the MDW enables you to converge relational and non-relational, or structured and unstructured data into a single hub. Basically there are 3 main steps in this design.
  • Data lands in the data lake (ADLS Gen 2) from multiple sources using ADF.
  • Azure Databricks transforms the data into different layer (ADLS Gen2) and also move the data into Azure Synapse DW using Databricks notebooks
  • AAS models are being refreshed on top of Azure Synapse data and then fed to reporting platforms.
 At the time of writing this blog ADF does not support the ingestion into delta format, so it is better we ingest into parquet and convert into delta format.
 Why to use Databrciks Delta ? 
  • Easy, Simple, Fast Development
  • Scalable
  • Reliable
  • SQL like All logic CDC, Evolving Schema
  • Stability
  • Easy Support and Troubleshooting
  • Merge Schema
  • Support ACID
  • Schema enforcement (Data Integrity)
  • Time Travel
    • History of records
    • Rollback of data
 Sample whole CDC in Delta Lake, in Spark Scala this can convert into thousands line of code.
 
 The cleansed data is loaded into Azure Synapse to combine with your existing data and make it all readily available for analysis through visual tools like Power BI. Other important Azure resource in this design it to use Azure key vault for Secrets Versioning,.
 Note Azure data warehouse is now known as Azure synapse.
 You can combine now Modern Data Warehouse solution with Advanced Analytics as shown in below and store the data to Cosmos DB. This enables you to build highly responsive and highly available applications worldwide.


One suggestion in adopting above design is to avoid multiple unnecessary resource hops. Why because an application that relies on multiple services has more potential failure points and composite SLA will be lower than the lowest take example if in your pipeline the SLA of 2 services is 99.95 and 99.99 then you have composite SLA of 99.94 only

Post a Comment

Thanks for your comment !
I will review your this and will respond you as soon as possible.