Vision-Language Model Approach for Few-Shot Learning of Attention Deficit Hyperactivity Disorder Using EEG Connectivity-Based Featured Images

Catal, Mehmet Sergen; Gumus, Abdurrahman; Karabiber Cura, Ozlem; Aydin, Ocan; Zubeyir Unlu, Mehmet

doi:10.1088/2632-2153/ae15e5

Vision-Language Model Approach for Few-Shot Learning of Attention Deficit Hyperactivity Disorder Using EEG Connectivity-Based Featured Images

Date

2025

Authors

Catal, Mehmet Sergen

Gumus, Abdurrahman

Karabiber Cura, Ozlem

Aydin, Ocan

Zubeyir Unlu, Mehmet

Publisher

IOP Publishing Ltd

Abstract

Traditional medical diagnosis approaches have predominantly relied on single-modality analysis, limiting clinicians to interpreting isolated data streams such as images or time series. The integration of vision language models (VLMs) into neurophysiological analysis represents a paradigm shift toward multimodal diagnostic frameworks, enabling clinicians to interact with diagnosis models through diverse modalities including text, audio, visual inputs, etc. This multimodal interaction capability extends beyond conventional label-based classification, offering clinicians flexibility in diagnostic reasoning and decision-making processes. Building on this foundation, this study explores the application of VLMs to electroencephalography (EEG)-based attention deficit hyperactivity disorder (ADHD) classification, addressing a gap in neurophysiological diagnostics. The proposed framework applies VLM-based few-shot ADHD classification by converting raw EEG data into EEG connectivity-based featured images compatible with contrastive language-image pre-training's (CLIP) image encoder. The adaptor-based CLIP approach (Tip-Adapter and Tip-Adapter-F) for few-shot learning improves CLIP's zero-shot classification performance, achieving 78.73% accuracy with 1-shot and 98.30% accuracy with 128-shot using the RN50x16 backbone. Experiments investigate prompt engineering effects, backbone architectures of CLIP, patient-based classification, and combinations of EEG connectivity features. Comparative analysis is performed with two datasets to evaluate the approach between different data sources. Through the adaptation of pre-trained VLMs to neurophysiological data, this technique demonstrates the potential for multimodal diagnostic frameworks that enable flexible clinician-model interactions beyond conventional label-based classification systems. The approach achieves effective ADHD classification with minimal training data while establishing foundations for applying VLMs in clinical neuroscience, where diverse modality interactions through text, visual, and audio inputs can enhance diagnostic workflows. The code is publicly available on GitHub to facilitate further research in the field: https://github.com/miralab-ai/vlm-few-shot-eeg.

Keywords

Vision Language Models, Few-Shot Learning, Electroencephalography, Attention Deficit Hyperactivity Disorder, Connectivity-Based Features

WoS Q

Q1

Scopus Q

Q2

OpenCitations Citation Count

N/A

Source

Machine Learning-Science and Technology

Volume

6

Issue

4

URI

https://doi.org/10.1088/2632-2153/ae15e5
https://hdl.handle.net/11147/18657

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

PlumX Metrics

Citations

Scopus : 0

Captures

Mendeley Readers : 4

Full item page

Google Scholar™

Check

Vision-Language Model Approach for Few-Shot Learning of Attention Deficit Hyperactivity Disorder Using EEG Connectivity-Based Featured Images

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

Google Scholar™

OpenAlex FWCI

0.0

Sustainable Development Goals