Diffusion-Based Data Augmentation Methodology for Improved Performance in Ocular Disease Diagnosis Using Retinography Images

Deep learning models, integral components of contemporary technological landscapes, exhibit enhanced learning capabilities with larger datasets. Traditional data augmentation techniques, while effective in generating new data, have limitations, especially in fields like ocular disease diagnosis. In response, alternative augmentation approaches, including the utilization of generative AI, have emerged. In our study, we employed a diffusion-based model (Stable Diffusion) to synthesize data by faithfully recreating crucial vascular structures in the retina, vital for detecting eye diseases by using the Ocular Disease Intelligent Recognition dataset. Our goal was to augment retinography images for ocular disease diagnosis using diffusion-based models, optimizing the outputs of the fine-tuned Stable Diffusion model, and ensuring the generated data closely resembles real-world scenarios. This strategic approach resulted in improved performance in classification models and augmentation outperformed traditional methods, exhibiting high precision rates ranging from 85% to 76.2% and recall values of 86%, and 75% for 5 classes. Beyond performance enhancement, we demonstrated that the inclusion of synthetic data, coupled with data reduction using the t-SNE method, effectively addressed dataset imbalance. As a result of synthetic data addition, notable increases of 3.4% in the precision metric and 12.8% in the recall metric were observed in the 7-class case. Strategically synthesizing data addressed underrepresented classes, creating a balanced dataset for comprehensive model learning. Surpassing performance improvements, this approach underscores synthetic data's ability to overcome the limitations of traditional methods, particularly in sensitive medical domains like ocular disease diagnosis, ensuring accurate classification. The codes of the study will be shared on GitHub in a way that benefits everyone interested: https://github.com/miralab-ai/generative-data-augmentation.

Keywords

Image classification, Data augmentation, Diffusion-based models, t-SNE, Medical image synthesis, Dataset imbalance

WoS Q

Q3

Scopus Q

Q2

OpenCitations Citation Count

N/A

Source

International Journal of Machine Learning and Cybernetics

Volume

16

Start Page

3843

End Page

3864

URI

https://doi.org/10.1007/s13042-024-02485-w
https://hdl.handle.net/11147/15200

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

PlumX Metrics

Citations

Scopus : 5

Captures

Mendeley Readers : 11

Full item page

SCOPUS™ Citations

5

checked on Jun 15, 2026

Web of Science™ Citations

4

checked on Jun 15, 2026

Page Views

98

checked on Jun 15, 2026

Downloads

2

checked on Jun 15, 2026

Google Scholar™

Check

Diffusion-Based Data Augmentation Methodology for Improved Performance in Ocular Disease Diagnosis Using Retinography Images

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

Green Open Access

OpenAIRE Downloads

OpenAIRE Views

Publicly Funded

BIP! Indicators

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

SCOPUS™ Citations

5

Web of Science™ Citations

4

Page Views

98

Downloads

2

Google Scholar™

OpenAlex FWCI

4.91444291

Sustainable Development Goals

SDG data is not available