Abstract
This study evaluates the performance and efficiency of four deep learning models—VGG-16, ResNet-50, Inception-V3, and DenseNet-121—in detecting pneumonia from chest X-rays, addressing the critical need for balanced accuracy and computational efficiency in clinical diagnostics. Methods: A dataset of 5,234 chest X-rays (3,875 pneumonia, 1,341 normal) was augmented via rotation, flipping, and zooming to mitigate class imbalance. Models were trained on an RTX 2060 GPU for 40 epochs, with performance assessed using accuracy, F1 score, sensitivity, specificity, precision, and computational metrics (training time, memory usage). Statistical significance was validated via paired t-tests (p < 0.05). Results: DenseNet-121 achieved the highest accuracy (95.2% ± 0.8), F1 score (95.1% ± 0.7), and throughput (400 images/sec) with minimal memory usage (33MB). ResNet-50 and Inception-V3 showed moderate performance, while VGG-16 exhibited overfitting tendencies. In conclusion, DenseNet-121 showed strong performance compared to other models, both in terms of accuracy and processing speed, which is essential for use in real-time clinical settings. However, the small size of the validation set and limited population diversity are important limitations that should be addressed in future studies. Moreover, more testing on larger datasets is needed to confirm the stability of the model and see how the model will work in different settings. Future work should address ethical considerations in AI-driven diagnostics and validate findings across multi-institutional datasets.