Skip to content

anjijava16/Databricks_fs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

# Databricks_fs

Like 👍 Share 🤝

✳️ Different Layers in Databricks Lakehouse Architecture? ✳️

✳️ Landing Layer: (native Format) -
✅ This layer is an optional and depends on source systems and data.
✅  Landing is just container in data lake to store raw source data.
✅  This layer represents the area where data land from the data source before processing into delta layers.
✅  Different external systems data ingesting in data lake in native foramt.
✅  Landing is just source systems data in native files like (csv,json,xml,parquet...)
✅  Landing data can be structured , semi-strucutred and un-strucutred files.
✅  Landing data comes from Different sources as a Batch/Streaming Process.

✳️ Bronze layer (Delta Format) 
✅  source data converted and loaded as delta format
✅  everyday data will be appended in delta tables.
✅  bronze tabels are partitioned with updated_date/load_Date to get better performance.
✅  Different external source systems data managed in bronze layer. 
✅  The table structures in this layer correspond to the source system table structures "as-is,".
✅  Bronze tabels will have additional metadata columns that capture the load date/time, process ID, etc. 
✅  The focus in this layer is quick Change Data Capture and the ability to provide an historical archive of source (cold storage).
✅  Bronze can be used for reload scenarios in future.
All Historical data will be managed here with audit columns.


✳️ Silver Layer  (Delta Format)
✅  Uses DeltaLake tables (with SQL table names)
✅  Preserves grain of original data (no aggregation)
✅  Eliminates duplicate records
✅  Production schema enforced
✅  Data quality checks passed
✅  Corrupt data quarantined
✅  Data stored to support production workloads
✅  Optimized for long-term retention and ad-hoc queries
✅  Validate data quality and schema
✅  Enrich and transform data
✅  Optimize data layout and storage for downstream queries
✅  Provide single source of truth for analytics


✳️ Gold layer (Delta Format)
✅  Validated and business-level tables
✅  lakehouse is typically organized in consumption-ready "project-specific" databases. 
✅  The Gold layer is for reporting and uses more de-normalized and read-optimized data models with fewer joins. 
✅  The final layer of data transformations and data quality rules are applied here. 
✅  Final presentation layer of projects are business data wise models.
✅  We see a lot of Kimball style star schema-based data models or Inmon style Data marts fit in this Gold Layer of the lakehouse.

✳️ Benefits of multiple layers
✅ Simple data model
✅ Easy to understand and implement
✅ Enables incremental ETL
✅Can recreate your tables from raw data at any time
✅ ACID transactions, time travel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors