This project implements a deep learning pipeline to detect 68 facial landmarks using the 300W dataset. It leverages PyTorch and a modified ResNet-18 model for precise landmark regression on grayscale facial images.
🔧 Tools & Frameworks Used PyTorch – model building, training, and evaluation
Torchvision – pretrained ResNet18 and data transformations
OpenCV, PIL, imutils – image preprocessing and augmentation
XML Parsing – to extract annotations from 300W XML files
Matplotlib – visual comparison of predicted vs. actual landmarks
📦 Features Custom Dataset class to parse XML annotations
Data augmentation: cropping, rotation, resizing, and color jitter
Landmark normalization and denormalization
Model checkpointing with best validation loss
Visualization of predictions vs. ground-truth landmarks
📈 Results Uses MSELoss to regress 68 (x, y) landmark coordinates
Trained and validated with dynamic plotting for comparison
Final model saved as face_landmarks.pth
💡 Future Improvements Add facial bounding box detector for end-to-end inference
Explore lightweight models (MobileNetV2, EfficientNet)
Train on color images for robustness to lighting variation