This project, a collaboration with Kavya Malhotra, implements advanced object localization and detection techniques using Convolutional Neural Networks (CNNs) in PyTorch. We work with an augmented version of the MNIST dataset to tackle two main challenges:
- Object Localization: Classifying a single object in an image and predicting its bounding box.
- Object Detection: Identifying and classifying multiple objects in an image, with bounding boxes for each.
- Custom CNN architectures optimized for object localization and detection
- Specialized loss functions for accurate bounding box prediction
- Performance evaluation using accuracy and Intersection over Union (IoU)
- Visualization tools for predicted vs. ground truth bounding boxes
- Grid-based approach for multi-object detection
We use an enhanced version of MNIST with the following modifications:
- Image dimensions: 48 x 60 pixels
- Randomly positioned, rotated, and resized digits
- Added background noise for increased complexity
- Python 3.7+
- PyTorch 1.8+
- torchvision
- NumPy
- Matplotlib
- Jupyter Notebook
- Clone the repository:
git clone git@github.com:KhalilIbrahimm/DeepLearning-PyTorch-ObjectDetection.git- Set up a virtual environment and activate it:
python -m venv venvsource venv/bin/activate # On Windows, use venv\Scripts\activate- Install the required dependencies:
pip install -r requirements.txt-
Launch Jupyter Notebook: jupyter notebook
-
Open and run the notebooks in the
notebooks/directory.
Our best models achieved:
- Accuracy: 91.63%
- IoU: 0.4808
- Mean Performance: 0.6985
- Accuracy: 33.46%
- IoU: 0.7322
- Mean Performance: 0.5334
Note: The object detection accuracy is lower due to class imbalance in the grid-based approach, with empty grid cells dominating. The high IoU suggests good bounding box prediction despite the accuracy metric limitations.
- Implement more advanced architectures like YOLO or SSD
- Experiment with different data augmentation techniques
- Extend the model to work with more complex, real-world datasets
- Optimize for real-time detection on video streams
Contributions are welcome! Please feel free to submit a Pull Request.
Khalil Ibrahim - GitHub
Project Link: https://github.com/KhalilIbrahimm/DeepLearning-PyTorch-ObjectDetection