- Graduate School, Leavey School of Business
- Department of Information Systems & Analytics
- Course MSIS 2627: Big Data Modeling & Analytics
- Big-Data-MapReduce Course @ Santa Clara University
- Class meeting dates:
- Start: March 28, 2022
- End: June 9, 2022
- Class hours:
- Tuesday 7:35 PM - 9:10 PM PST (online, via Zoom)
- Thursday 7:35 PM - 9:10 PM PST (online, via Zoom)
- Instructor: Mahmoud Parsian
- Class room: Lucas Hall 210
- Office: 216AA, 2nd Floor, Lucas Hall (not used due to covid-19)
- Office Hours: Monday 2:00 pm PST (or by appointment)
- Office Hours ethics: if you are planning to attend an office hour, then you should send me an email
1.PySpark Algorithms Book by Mahmoud Parsian2.Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer3.Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey D. Ullman
- 1. A Very Brief Introduction to MapReduce by Diana MacLean
- 2. Introduction to MapReduce by Mahmoud Parsian
- Apache Spark Site
- Apache Spark Download, Use version 3.2.1
The main focus of this class is to cover the following concepts:
- Concepts of Big Data
- Distributed File Systems
- Distributed Computing
- Distributed and Parallel Algorithms
- MapReduce Paradigm
- MapReduce Algorithms
- Scale-out Architectures (using Hadoop, Spark, PySpark)
- Apache Spark
- Use Spark, Py-Spark, and Python to teach MapReduce and distributed computing
- SQL for NoSQL Data, How?
- Amazon Athena
- Amazon Athena, S3, Data Partitioning
