Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Permanent URI for this collectionhttps://hdl.handle.net/11147/7148
Browse
9 results
Search Results
Conference Object Reframing Software Log Summarisation as Multi-Label Classification With Encoder-Decoder Transformer Model(Institute of Electrical and Electronics Engineers Inc., 2025) Türkzeybek, F.Z.; Inan, E.As software systems become more advanced and capable of meeting sophisticated demands, they also become more complex. Consequently, software system logs, which are the most effective tool programmers have for understanding system diagnostics and taking appropriate action, become as complicated as the systems that generate them. To address this issue, software system log summarisation processes the logs generated by complex systems and extracts or summarizes their meaning in a more readable, less complex format. Recent improvements in natural language processing, brought about by transformers that evolved into large language models, offer substantial capabilities that can be implemented for log summarisation tasks. In this study, we explore this capability using a transformer-based model to summarize complex software system logs. The experimental results demonstrate that the fine-tuned T5-Small model improves the average ROUGE-1 and ROUGE-L scores of the BART-Large and Pegasus-Large models by approximately 8.46% and 15.37%, respectively. Thus, the average improvement of the fine-tuned T5-Small over the fine-tuned BART-Large and Pegasus-Large models is approximately 11.92% by means of R1 and RL scores with lesser computational cost. © 2025 IEEE.Article Toward Reliable Annotation in Low-Resource NLP: A Mixture of Agents Framework and Multi-LLM Benchmarking(IEEE-Inst Electrical Electronics Engineers Inc, 2025) Onan, Aytug; Nasution, Arbi Haza; Celikten, TugbaThis paper introduces the Mixture-of-Agents (MoA) framework, a structured approach for improving the reliability of large language model (LLM)-based text annotation in low-resource NLP contexts. MoA employs coordinated agent interactions to enhance agreement, interpretability, and robustness without manual supervision. Evaluations on Turkish classification benchmarks demonstrate that MoA achieves up to 10-point improvements in macro-F1 over single-model baselines and significantly increases inter-agent consistency. Additionally, three novel reliability metrics-Conflict Rate (CR), Ambiguity Resolution Success Rate (ARSR), and Refinement Correction Rate (RCR)-are proposed to quantify annotation stability and correction dynamics. The results indicate that multi-agent coordination can substantially improve label quality, offering a scalable pathway toward trustworthy annotation in low-resource and cross-domain applications. The framework is language-agnostic and adaptable to other low-resource contexts beyond Turkish, including morphologically rich or typologically diverse languages such as Indonesian, Urdu, and Swahili. These findings highlight the scalability of MoA as a generalizable solution for multilingual and cross-domain annotation.Conference Object Adapting Language Models to Sentiment Analysis for Automatically Translated and Labelled Turkish News Texts(Institute of Electrical and Electronics Engineers Inc., 2025) Serficeli, S.C.; Udunman, B.; Inan, E.The proliferation of news sources makes it difficult to track current events and social events in real time. In order to interpret social events in this context quickly and effectively, it is important to translate news texts provided in different natural languages into Turkish and to perform sentiment analysis on them. The aim of this study is to translate multilingual news texts into Turkish and perform sentiment analysis on these texts. The generated labels were compared and the data that were given the same label by all models were separated as automatically labelled data. This automatic labelling process ensured that the data for which different models produced consistent results were reliably labelled. When the results were evaluated, F1 score of 0.946 was achieved for sentiment analysis using the automatic labelling mechanism for texts translated into Turkish. © 2025 IEEE.Conference Object A Semantic Search Engine for Turkish and English Research Resources(Institute of Electrical and Electronics Engineers Inc., 2025) Karabacak, O.; Inan, E.Research resources are growing in volume at an exponential rate across disciplines and languages. This exponential increase has created a pressing need for intelligent search systems that can help researchers efficiently access relevant academic material. To overcome this issue, this study introduces a bilingual semantic search engine designed to retrieve academic articles written in both Turkish and English. The primary goal is to improve the accuracy and relevance of academic information retrieval by using modern Natural Language Processing techniques. Instead of relying on traditional keyword-based search methods, the system leverages transformer-based sentence embedding models. To capture semantic meaning more effectively, MiniLM-L6v2, paraphrase-multilingual-MiniLM-L12-v2 and multilingual-e5-base models were chosen for their multilingual capabilities and sentence-level embedding performance. To assess the quality of search results, Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (nDCG) were used. These metrics were calculated for each model across both language groups. Evaluation results show that the multilingual-e5-base model consistently outperformed the other models in both MAP and nDCG scores, demonstrating superior semantic understanding and multilingual alignment. The system also features a simple and responsive Streamlit-based interface that allows for real-time querying and result display. © 2025 IEEE.Article Automating Software Size Measurement from Python Code Using Language Models(Springer, 2025) Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kuuk, Murat; Demirors, OnurSoftware size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.Article Citation - WoS: 1Citation - Scopus: 1Automating Software Size Measurement With Language Models: Insights From Industrial Case Studies(Elsevier Science Inc, 2026) Unlu, Huseyin; Tenekeci, Samet; Kennouche, Dhia Eddine; Demirors, OnurObjective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models - BERT and SE-BERT - to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization's requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts.Conference Object Citation - WoS: 3Citation - Scopus: 5Predicting Software Functional Size Using Natural Language Processing: an Exploratory Case Study(IEEE, 2024) Unlu, Huseyin; Tenekeci, Samet; Ciftci, Can; Oral, Ibrahim Baran; Atalay, Tunahan; Hacaloglu, Tuna; Demirors, OnurSoftware Size Measurement (SSM) plays an essential role in software project management as it enables the acquisition of software size, which is the primary input for development effort and schedule estimation. However, many small and medium-sized companies cannot perform objective SSM and Software Effort Estimation (SEE) due to the lack of resources and an expert workforce. This results in inadequate estimates and projects exceeding the planned time and budget. Therefore, organizations need to perform objective SSM and SEE using minimal resources without an expert workforce. In this research, we conducted an exploratory case study to predict the functional size of software project requirements using state-of-the-art large language models (LLMs). For this aim, we fine-tuned BERT and BERT_SE with a set of user stories and their respective functional size in COSMIC Function Points (CFP). We gathered the user stories included in different project requirement documents. In total size prediction, we achieved 72.8% accuracy with BERT and 74.4% accuracy with BERT_SE. In data movement-based size prediction, we achieved 87.5% average accuracy with BERT and 88.1% average accuracy with BERT_SE. Although we use relatively small datasets in model training, these results are promising and hold significant value as they demonstrate the practical utility of language models in SSM.Article Citation - Scopus: 2Turkmednli: a Turkish Medical Natural Language Inference Dataset Through Large Language Model Based Translation(Peerj inc, 2025) Ogul, Iskender Ulgen; Soygazi, Fatih; Bostanoglu, Belgin ErgencNatural language inference (NLI) is a subfield of natural language processing (NLP) that aims to identify the contextual relationship between premise and hypothesis sentences. While high-resource languages like English benefit from robust and rich NLI datasets, creating similar datasets for low-resource languages is challenging due to the cost and complexity of manual annotation. Although translation of existing datasets offers a practical solution, direct translation of domain-specific datasets presents unique challenges, particularly in handling abbreviations, metric conversions, and cultural alignment. This study introduces a pipeline for translating a medical NLI dataset into Turkish, which is a low-resource language. Our approach employs fine-tuning the Llama-3.1 model with selected samples from the Medical Abbreviation dataset (MeDAL) to extract and resolve medical abbreviations. Consequently, NLI pairs are refined with extracted abbreviations and subjected to metric correction. Later, the processed sentences are then translated using Facebook's No Language Left Behind (NLLB) translation model. To ensure quality, we conducted comprehensive evaluations using both machine learning models and medical expert review. Our results show that BERTurk achieved 75.17% accuracy on TurkMedNLI test data and 76.30% on the normalized test set, while BioBERTurk demonstrated comparable performance with 75.59% accuracy on test data and 72.29% on the normalized dataset. Medical experts further validated the translations through manual assessment of sampled sentences. This work demonstrates the effectiveness of large language models in adapting domain-specific datasets for low-resource languages, establishing a foundation for future research in multilingual biomedical NLP.Conference Object Citation - Scopus: 3Predicting Software Size and Effort From Code Using Natural Language Processing(CEUR-WS, 2024) Tenekeci, S.; Demirörs, Onur; Ünlü, H.; Dikenelli, E.; Selçuk, U.; Kılınç Soylu, G.; Demirörs, O.Software Size Measurement (SSM) holds a crucial role in software project management by facilitating the acquisition of software size, which serves as the primary input for development effort and schedule estimation. However, many small and medium-sized companies encounter challenges in conducting objective SSM and Software Effort Estimation (SEE) due to resource constraints and a lack of expert workforce. This often leads to inaccurate estimates and projects exceeding planned time and budget. Hence, organizations need to perform objective SSM and SEE with minimal resources and without relying on an expert workforce. In this research, we introduce two exploratory case studies aimed at predicting the functional size (COSMIC and Event-based size) and effort of software projects from the code using a deep-learning-based NLP model: CodeBERT. For this purpose, we collected and annotated two datasets consisting of 4800 Python and 1100 C# functions. Then, we trained a classification model to predict COSMIC data movements (entry, exit, read, write) and four regression models to predict Event-based size (interaction, communication, process) and effort. Despite utilizing a relatively small dataset for model training, we achieved promising results with an 84.5% accuracy for the COSMIC size, 0.13 normalized mean absolute error (NMAE) for the Event-based size, and 0.18 NMAE for the effort. These findings are particularly insightful as they demonstrate the practical utility of language models in SSM and SEE. © 2024 Copyright for this paper by its authors.
