WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7150

Browse

Search Results

Now showing 1 - 5 of 5
  • Article
    Citation - Scopus: 2
    Turkmednli: a Turkish Medical Natural Language Inference Dataset Through Large Language Model Based Translation
    (Peerj inc, 2025) Ogul, Iskender Ulgen; Soygazi, Fatih; Bostanoglu, Belgin Ergenc
    Natural language inference (NLI) is a subfield of natural language processing (NLP) that aims to identify the contextual relationship between premise and hypothesis sentences. While high-resource languages like English benefit from robust and rich NLI datasets, creating similar datasets for low-resource languages is challenging due to the cost and complexity of manual annotation. Although translation of existing datasets offers a practical solution, direct translation of domain-specific datasets presents unique challenges, particularly in handling abbreviations, metric conversions, and cultural alignment. This study introduces a pipeline for translating a medical NLI dataset into Turkish, which is a low-resource language. Our approach employs fine-tuning the Llama-3.1 model with selected samples from the Medical Abbreviation dataset (MeDAL) to extract and resolve medical abbreviations. Consequently, NLI pairs are refined with extracted abbreviations and subjected to metric correction. Later, the processed sentences are then translated using Facebook's No Language Left Behind (NLLB) translation model. To ensure quality, we conducted comprehensive evaluations using both machine learning models and medical expert review. Our results show that BERTurk achieved 75.17% accuracy on TurkMedNLI test data and 76.30% on the normalized test set, while BioBERTurk demonstrated comparable performance with 75.59% accuracy on test data and 72.29% on the normalized dataset. Medical experts further validated the translations through manual assessment of sampled sentences. This work demonstrates the effectiveness of large language models in adapting domain-specific datasets for low-resource languages, establishing a foundation for future research in multilingual biomedical NLP.
  • Article
    k-Clique counting on large scale-graphs: a survey
    (Peerj inc, 2024) Calmaz, Busra; Ergenç Bostanoğlu, Belgin; Bostanoglu, Belgin Ergenc
    Clique counting is a crucial task in graph mining, as the count of cliques provides different insights across various domains, social and biological network analysis, community detection, recommendation systems, and fraud detection. Counting cliques is algorithmically challenging due to combinatorial explosion, especially for large datasets and larger clique sizes. There are comprehensive surveys and reviews on algorithms for counting subgraphs and triangles (three-clique), but there is a notable lack of reviews addressing k-clique counting algorithms for k > 3. This paper addresses this gap by reviewing clique counting algorithms designed to overcome this challenge. Also, a systematic analysis and comparison of exact and approximation techniques are provided by highlighting their advantages, disadvantages, and suitability for different contexts. It also presents a taxonomy of clique counting methodologies, covering approximate and exact methods and parallelization strategies. The paper aims to enhance understanding of this specific domain and guide future research of k-clique counting in large-scale graphs.
  • Review
    A Qualitative Survey on Community Detection Attack Algorithms
    (Mdpi, 2024) Tekin, Leyla; Bostanoglu, Belgin Ergenc
    Community detection enables the discovery of more connected segments of complex networks. This capability is essential for effective network analysis. But, it raises a growing concern about the disclosure of user privacy since sensitive information may be over-mined by community detection algorithms. To address this issue, the problem of community detection attacks has emerged to subtly perturb the network structure so that the performance of community detection algorithms deteriorates. Three scales of this problem have been identified in the literature to achieve different levels of concealment, such as target node, target community, or global attack. A broad range of community detection attack algorithms has been proposed, utilizing various approaches to tackle the distinct requirements associated with each attack scale. However, existing surveys of the field usually concentrate on studies focusing on target community attacks. To be self-contained, this survey starts with an overview of community detection algorithms used on the other side, along with the performance measures employed to evaluate the effectiveness of the community detection attacks. The core of the survey is a systematic analysis of the algorithms proposed across all three scales of community detection attacks to provide a comprehensive overview. The survey wraps up with a detailed discussion related to the research opportunities of the field. Overall, the main objective of the survey is to provide a starting and diving point for scientists.
  • Article
    Citation - WoS: 1
    Dynamic Frequent Subgraph Mining Algorithms Over Evolving Graphs: a Survey
    (Peerj inc, 2024) Bostanoglu, Belgin Ergenc; Abuzayed, Nourhan
    Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications of the modern data science. Some of the FSM algorithms have the objective of finding all frequent subgraphs whereas some of the algorithms focus on discovering frequent subgraphs approximately. On the other hand, modern applications employ evolving graphs where the increments are small graphs or stream of nodes and edges. In such cases, FSM task becomes more challenging due to growing data size and complexity of the base algorithms. Recently we see frequent subgraph mining algorithms designed for dynamic graph data. However, there is no comparative review of the dynamic subgraph mining algorithms focusing on the discovery of frequent subgraphs over evolving graph data. This article focuses on the characteristics of dynamic frequent subgraph mining algorithms over evolving graphs. We first introduce and compare dynamic frequent subgraph mining algorithms; trying to highlight their attributes as increment type, graph type, graph representation, internal data structure, algorithmic approach, programming approach, base algorithm and output type. Secondly, we introduce and compare the approximate frequent subgraph mining algorithms for dynamic graphs with additional attributes as their sampling strategy, data in the sample, statistical guarantees on the sample and their main objective. Finally, we highlight research opportunities in this specific domain from our perspective. Overall, we aim to introduce the research area of frequent subgraph mining over evolving graphs with the hope that this can serve as a reference and inspiration for the researchers of the field.
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    Bdac: Boundary-Driven Approximations of K-Cliques
    (Mdpi, 2024) Calmaz, Busra; Bostanoglu, Belgin Ergenc
    Clique counts are crucial in applications like detecting communities in social networks and recurring patterns in bioinformatics. Counting k-cliques-a fully connected subgraph with k nodes, where each node has a direct, mutual, and symmetric relationship with every other node-becomes computationally challenging for larger k due to combinatorial explosion, especially in large, dense graphs. Existing exact methods have difficulties beyond k = 10, especially on large datasets, while sampling-based approaches often involve trade-offs in terms of accuracy, resource utilization, and efficiency. This difficulty becomes more pronounced in dense graphs as the number of potential k-cliques grows exponentially. We present Boundary-driven approximations of k-cliques (BDAC), a novel algorithm that approximates k-clique counts without using recursive procedures or sampling methods. BDAC offers both lower and upper bounds for k-cliques at local (per-vertex) and global levels, making it ideal for large, dense graphs. Unlike other approaches, BDAC's complexity remains unaffected by the value of k. We demonstrate its effectiveness by comparing it with leading algorithms across various datasets, focusing on k values ranging from 8 to 50.