- https://blog.min.io/streaming-data-lakes-hudi-minio/
- https://github.com/aws-samples/emr-on-eks-hudi-iceberg-delta/blob/main/hudi/hudi_scd_script.py
- https://aws.amazon.com/blogs/big-data/get-a-quick-start-with-apache-hudi-apache-iceberg-and-delta-lake-with-amazon-emr-on-eks/
- https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/
- https://iomete.com/blog/cheat-sheet-for-apache-iceberg
- https://bigdataboutique.com/blog/introduction-to-apache-hudi-c83367
- https://www.tencentcloud.com/document/product/1026/35587
- https://github.com/vasveena/DemoNotes
- https://github.com/ev2900/Iceberg_EMR_Athena/tree/main
- https://medium.com/@parveen.jindal/having-your-cake-and-eating-it-too-how-vizio-built-a-next-generation-data-platform-to-enable-bi-4fc42c539543
- https://medium.com/@florent.moiny/databricks-cost-reduction-cheat-sheet-126be465e09
- Create a EMR Cluster (should be 6.10)
- Connect to SSH
- Check Table Format all required Jars
- Understand Iceberg Files
- Understand Hudi Files
- Understand Delta Lake Files
- Create a MYSQL DB (create Retail_DB)
- Copy MYSQL DB to S3 as Parquet Files
- Source as : S3 Parquet
- MYSQL ---> (Spark Read as Files) ---> Write as S3 Files
- S3 Read Files ---> Write as Hudi FS
- S3 Read Files ---> Write to Iceberg
- S3 Read Files ----> Write to HUDI
- Terminate EMR Cluster
- Terminate EMR S3 Logs
- Create FastAPi Microservice ---> Configure Read as JDBC MYSQL Table ---> Write to S3 ---> Write to S3 Different Table Format FS