Welcome to the Medical X-Ray Imaging: Pneumonia Detection repository! 🎉
This project is a collaborative initiative brought to you by SuperDataScience, a thriving community dedicated to advancing the fields of data science, machine learning, and AI. We are excited to have you join us in this journey of learning, experimentation, and growth.
This project involves building a convolutional neural network (CNN) to classify medical X-ray images and detect pneumonia. Targeted at beginner to intermediate-level data scientists, the project will focus on leveraging deep learning techniques to develop a robust classification model. The final model will be deployed using Streamlit, providing a user-friendly interface for real-time predictions.
- Use the publicly available dataset of X-ray images for model training.
- Perform data preprocessing, including resizing, normalization, and augmentation, to prepare the images for training.
Link to dataset: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
- Build a convolutional neural network (CNN) using deep learning frameworks such as TensorFlow or PyTorch.
- Train and evaluate the model to classify X-ray images as normal or pneumonia.
- Develop a Streamlit application to allow users to upload X-ray images and receive a prediction.
- Include visualization of prediction confidence and model explanation (e.g., Grad-CAM).
- Dataset Handling: Pandas, NumPy.
- Deep Learning Frameworks: TensorFlow or PyTorch.
- Image Processing: OpenCV, Pillow.
- Model Deployment: Streamlit.
- Python 3.8+
- Libraries:
tensorflow,pytorch,pandas,numpy,opencv-python,pillow,streamlit,matplotlib.
- Setup GitHub repo and project folders.
- Setup virtual environment and respective libraries.
- Download the chest X-ray dataset from a trusted source (e.g., Kaggle).
- Explore and preprocess the dataset:
- Resize images to a uniform size.
- Normalize pixel values for faster model convergence.
- Perform data augmentation to improve model generalization.
- Design a CNN architecture tailored for image classification.
- Train the model on the dataset with proper validation.
- Evaluate the model's performance using metrics like accuracy, precision, recall, and F1-score.
- Fine-tune the model for optimal performance.
- Build a Streamlit app to:
- Allow users to upload X-ray images.
- Display the model's predictions (Normal or Pneumonia).
- Provide additional insights using Grad-CAM visualizations for explainability.
| Phase | Task | Duration |
|---|---|---|
| Phase 1: Setup | Setup GitHub repo and project folder | Week 1 |
| Phase 2: Dataset | Acquire and preprocess data | Week 2 |
| Phase 3: Model Development | Design, train, and evaluate CNN | Week 3 |
| Phase 4: Model Deployment | Build and deploy Streamlit app | Week 4 |
Follow these steps to set up the project locally:
To work on your own copy of this project:
- Navigate to the SDS GitHub repository for this project.
- Click the Fork button in the top-right corner of the repository page.
- This will create a copy of the repository under your GitHub account.
After forking the repository:
- Open a terminal on your local machine.
- Clone your forked repository by running:
git clone https://github.com/<your-username>/<repository-name>.git
- Navigate to the project directory:
cd <repository-name>
Setup a virtual environment to isolate project dependancies
- Run the following command in the terminal to create a virtual environment
python3 -m venv .venv
- Activate the virtual environment
- On a mac/linux:
source .venv/bin/activate - On a windows:
.venv\Scripts\activate
- Verify the virtual environment is active (the shell prompt should show (.venv))
Install the required libraries for the project
- Run the following command in the terminal to isntall dependancies from the requirements.txt file:
pip install -r requirements.txt
Once the setup is complete, you can proceed with building your project