Vision-Language Model Approach for Few-Shot Learning of Attention Deficit Hyperactivity Disorder Using EEG Connectivity-Based Featured Images
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Traditional medical diagnosis approaches have predominantly relied on single-modality analysis, limiting clinicians to interpreting isolated data streams such as images or time series. The integration of vision language models (VLMs) into neurophysiological analysis represents a paradigm shift toward multimodal diagnostic frameworks, enabling clinicians to interact with diagnosis models through diverse modalities including text, audio, visual inputs, etc. This multimodal interaction capability extends beyond conventional label-based classification, offering clinicians flexibility in diagnostic reasoning and decision-making processes. Building on this foundation, this study explores the application of VLMs to electroencephalography (EEG)-based attention deficit hyperactivity disorder (ADHD) classification, addressing a gap in neurophysiological diagnostics. The proposed framework applies VLM-based few-shot ADHD classification by converting raw EEG data into EEG connectivity-based featured images compatible with contrastive language-image pre-training's (CLIP) image encoder. The adaptor-based CLIP approach (Tip-Adapter and Tip-Adapter-F) for few-shot learning improves CLIP's zero-shot classification performance, achieving 78.73% accuracy with 1-shot and 98.30% accuracy with 128-shot using the RN50x16 backbone. Experiments investigate prompt engineering effects, backbone architectures of CLIP, patient-based classification, and combinations of EEG connectivity features. Comparative analysis is performed with two datasets to evaluate the approach between different data sources. Through the adaptation of pre-trained VLMs to neurophysiological data, this technique demonstrates the potential for multimodal diagnostic frameworks that enable flexible clinician-model interactions beyond conventional label-based classification systems. The approach achieves effective ADHD classification with minimal training data while establishing foundations for applying VLMs in clinical neuroscience, where diverse modality interactions through text, visual, and audio inputs can enhance diagnostic workflows. The code is publicly available on GitHub to facilitate further research in the field: https://github.com/miralab-ai/vlm-few-shot-eeg.
Description
Keywords
Vision Language Models, Few-Shot Learning, Electroencephalography, Attention Deficit Hyperactivity Disorder, Connectivity-Based Features
Fields of Science
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
N/A
Volume
6
Issue
4
Start Page
End Page
PlumX Metrics
Citations
Scopus : 0
Captures
Mendeley Readers : 4
Google Scholar™

