Skip to content

anjijava16/myweekend_work

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

myweekend_work

Reference's

  1. https://blog.min.io/streaming-data-lakes-hudi-minio/
  2. https://github.com/aws-samples/emr-on-eks-hudi-iceberg-delta/blob/main/hudi/hudi_scd_script.py
  3. https://aws.amazon.com/blogs/big-data/get-a-quick-start-with-apache-hudi-apache-iceberg-and-delta-lake-with-amazon-emr-on-eks/
  4. https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/
  5. https://iomete.com/blog/cheat-sheet-for-apache-iceberg
  6. https://bigdataboutique.com/blog/introduction-to-apache-hudi-c83367
  7. https://www.tencentcloud.com/document/product/1026/35587
  8. https://github.com/vasveena/DemoNotes
  9. https://github.com/ev2900/Iceberg_EMR_Athena/tree/main
  10. https://medium.com/@parveen.jindal/having-your-cake-and-eating-it-too-how-vizio-built-a-next-generation-data-platform-to-enable-bi-4fc42c539543
  11. https://medium.com/@florent.moiny/databricks-cost-reduction-cheat-sheet-126be465e09

Weekend work

  1. Create a EMR Cluster (should be 6.10)
  2. Connect to SSH
  3. Check Table Format all required Jars
  4. Understand Iceberg Files
  5. Understand Hudi Files
  6. Understand Delta Lake Files
  7. Create a MYSQL DB (create Retail_DB)
  8. Copy MYSQL DB to S3 as Parquet Files
  9. Source as : S3 Parquet

Flow Like below

  1. MYSQL ---> (Spark Read as Files) ---> Write as S3 Files
  2. S3 Read Files ---> Write as Hudi FS
  3. S3 Read Files ---> Write to Iceberg
  4. S3 Read Files ----> Write to HUDI

Clean process

  1. Terminate EMR Cluster
  2. Terminate EMR S3 Logs

Next Steps

  1. Create FastAPi Microservice ---> Configure Read as JDBC MYSQL Table ---> Write to S3 ---> Write to S3 Different Table Format FS

Do some experments on Livy Server on EMR

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages