This repository contains the code and analyses for the project in Introduction to Intelligent Systems (02461) at DTU, January 2025.
Authors:
- Valdemar Stamm Kristensen (s244742)
- Frederik Lysholm Jรธnsson (s245362)
- William Hoffmann Hyldig (s245176)
Study line: Artificial Intelligence and Data
This project explores the use of Convolutional Neural Networks (CNNs) to detect pneumonia in chest X-ray images and compares the modelโs performance to that of two medical doctors.
The motivation lies in pneumoniaโs status as a major global health challenge, particularly among children, where fast and reliable diagnosis is crucial. CNNs offer a way to assist doctors in handling the large volume of X-ray data in clinical settings.
- Dataset: Kaggle chest X-ray dataset with 5,116 images (73% pneumonia, 27% normal).
- Preprocessing: Resizing (244ร244), normalization, and tensor conversion.
- Model Architecture: Custom CNN inspired by VGG-16, reduced in complexity to prevent overfitting.
- Training Distributions: Compared original (73/27) vs. balanced (50/50) data splits.
- Evaluation: Compared CNN predictions against two non-specialist medical doctors reviewing 100 X-rays each.
- CNN Performance: Achieved 96.08% ยฑ 1.57 test accuracy.
- Medical Doctors: Achieved 67โ72% ยฑ ~9 accuracy.
- Observations:
- 73/27 model achieved highest overall accuracy but showed bias toward pneumonia.
- 50/50 model was more stable across distributions but required discarding data.
- Overfitting was observed after epoch 7 โ optimal stopping point identified.
These results confirm that CNNs can significantly outperform non-specialist medical professionals in classifying pneumonia from X-rays.
CNN-model.ipynbโ Full Jupyter Notebook with preprocessing, model design, training, evaluation, and visualizations.CNN-model.pyโ Same code asCNN-model.ipynbbut in a python file.
- Dataset limitations: Current data originates from one hospital โ limited generalizability.
- Bias risks: Model may perform differently across age, gender, and imaging conditions.
- Future improvements:
- Larger and more diverse datasets across multiple hospitals.
- Use of data augmentation (flipping, rotation, contrast adjustments).
- Certainty scores and explainable AI (XAI) to improve clinical trust and usability.
- Kaggle Pneumonia Dataset: Mooney, P. (2018). Chest X-Ray Images (Pneumonia).
- Sharma, A. (2024). Pneumonia Detection using VGG16 Transfer Learning.
- UNICEF (2021). Pneumonia statistics on child mortality.
- Additional references listed in the project report.