GCRIS

Now showing 1 - 2 of 2

Citation - Scopus: 2
Turkmednli: a Turkish Medical Natural Language Inference Dataset Through Large Language Model Based Translation
(Peerj inc, 2025) Ogul, Iskender Ulgen; Soygazi, Fatih; Bostanoglu, Belgin Ergenc
Natural language inference (NLI) is a subfield of natural language processing (NLP) that aims to identify the contextual relationship between premise and hypothesis sentences. While high-resource languages like English benefit from robust and rich NLI datasets, creating similar datasets for low-resource languages is challenging due to the cost and complexity of manual annotation. Although translation of existing datasets offers a practical solution, direct translation of domain-specific datasets presents unique challenges, particularly in handling abbreviations, metric conversions, and cultural alignment. This study introduces a pipeline for translating a medical NLI dataset into Turkish, which is a low-resource language. Our approach employs fine-tuning the Llama-3.1 model with selected samples from the Medical Abbreviation dataset (MeDAL) to extract and resolve medical abbreviations. Consequently, NLI pairs are refined with extracted abbreviations and subjected to metric correction. Later, the processed sentences are then translated using Facebook's No Language Left Behind (NLLB) translation model. To ensure quality, we conducted comprehensive evaluations using both machine learning models and medical expert review. Our results show that BERTurk achieved 75.17% accuracy on TurkMedNLI test data and 76.30% on the normalized test set, while BioBERTurk demonstrated comparable performance with 75.59% accuracy on test data and 72.29% on the normalized dataset. Medical experts further validated the translations through manual assessment of sampled sentences. This work demonstrates the effectiveness of large language models in adapting domain-specific datasets for low-resource languages, establishing a foundation for future research in multilingual biomedical NLP.
Citation - WoS: 1
Citation - Scopus: 1
Bdac: Boundary-Driven Approximations of K-Cliques
(Mdpi, 2024) Calmaz, Busra; Bostanoglu, Belgin Ergenc
Clique counts are crucial in applications like detecting communities in social networks and recurring patterns in bioinformatics. Counting k-cliques-a fully connected subgraph with k nodes, where each node has a direct, mutual, and symmetric relationship with every other node-becomes computationally challenging for larger k due to combinatorial explosion, especially in large, dense graphs. Existing exact methods have difficulties beyond k = 10, especially on large datasets, while sampling-based approaches often involve trade-offs in terms of accuracy, resource utilization, and efficiency. This difficulty becomes more pronounced in dense graphs as the number of potential k-cliques grows exponentially. We present Boundary-driven approximations of k-cliques (BDAC), a novel algorithm that approximates k-clique counts without using recursive procedures or sampling methods. BDAC offers both lower and upper bounds for k-cliques at local (per-vertex) and global levels, making it ideal for large, dense graphs. Unlike other approaches, BDAC's complexity remains unaffected by the value of k. We demonstrate its effectiveness by comparing it with leading algorithms across various datasets, focusing on k values ranging from 8 to 50.

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results