Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 7 of 7

Reframing Software Log Summarisation as Multi-Label Classification With Encoder-Decoder Transformer Model
(Institute of Electrical and Electronics Engineers Inc., 2025) Türkzeybek, F.Z.; Inan, E.
As software systems become more advanced and capable of meeting sophisticated demands, they also become more complex. Consequently, software system logs, which are the most effective tool programmers have for understanding system diagnostics and taking appropriate action, become as complicated as the systems that generate them. To address this issue, software system log summarisation processes the logs generated by complex systems and extracts or summarizes their meaning in a more readable, less complex format. Recent improvements in natural language processing, brought about by transformers that evolved into large language models, offer substantial capabilities that can be implemented for log summarisation tasks. In this study, we explore this capability using a transformer-based model to summarize complex software system logs. The experimental results demonstrate that the fine-tuned T5-Small model improves the average ROUGE-1 and ROUGE-L scores of the BART-Large and Pegasus-Large models by approximately 8.46% and 15.37%, respectively. Thus, the average improvement of the fine-tuned T5-Small over the fine-tuned BART-Large and Pegasus-Large models is approximately 11.92% by means of R1 and RL scores with lesser computational cost. © 2025 IEEE.
Toward Reliable Annotation in Low-Resource NLP: A Mixture of Agents Framework and Multi-LLM Benchmarking
(IEEE-Inst Electrical Electronics Engineers Inc, 2025) Onan, Aytug; Nasution, Arbi Haza; Celikten, Tugba
This paper introduces the Mixture-of-Agents (MoA) framework, a structured approach for improving the reliability of large language model (LLM)-based text annotation in low-resource NLP contexts. MoA employs coordinated agent interactions to enhance agreement, interpretability, and robustness without manual supervision. Evaluations on Turkish classification benchmarks demonstrate that MoA achieves up to 10-point improvements in macro-F1 over single-model baselines and significantly increases inter-agent consistency. Additionally, three novel reliability metrics-Conflict Rate (CR), Ambiguity Resolution Success Rate (ARSR), and Refinement Correction Rate (RCR)-are proposed to quantify annotation stability and correction dynamics. The results indicate that multi-agent coordination can substantially improve label quality, offering a scalable pathway toward trustworthy annotation in low-resource and cross-domain applications. The framework is language-agnostic and adaptable to other low-resource contexts beyond Turkish, including morphologically rich or typologically diverse languages such as Indonesian, Urdu, and Swahili. These findings highlight the scalability of MoA as a generalizable solution for multilingual and cross-domain annotation.
Adapting Language Models to Sentiment Analysis for Automatically Translated and Labelled Turkish News Texts
(Institute of Electrical and Electronics Engineers Inc., 2025) Serficeli, S.C.; Udunman, B.; Inan, E.
The proliferation of news sources makes it difficult to track current events and social events in real time. In order to interpret social events in this context quickly and effectively, it is important to translate news texts provided in different natural languages into Turkish and to perform sentiment analysis on them. The aim of this study is to translate multilingual news texts into Turkish and perform sentiment analysis on these texts. The generated labels were compared and the data that were given the same label by all models were separated as automatically labelled data. This automatic labelling process ensured that the data for which different models produced consistent results were reliably labelled. When the results were evaluated, F1 score of 0.946 was achieved for sentiment analysis using the automatic labelling mechanism for texts translated into Turkish. © 2025 IEEE.
A Semantic Search Engine for Turkish and English Research Resources
(Institute of Electrical and Electronics Engineers Inc., 2025) Karabacak, O.; Inan, E.
Research resources are growing in volume at an exponential rate across disciplines and languages. This exponential increase has created a pressing need for intelligent search systems that can help researchers efficiently access relevant academic material. To overcome this issue, this study introduces a bilingual semantic search engine designed to retrieve academic articles written in both Turkish and English. The primary goal is to improve the accuracy and relevance of academic information retrieval by using modern Natural Language Processing techniques. Instead of relying on traditional keyword-based search methods, the system leverages transformer-based sentence embedding models. To capture semantic meaning more effectively, MiniLM-L6v2, paraphrase-multilingual-MiniLM-L12-v2 and multilingual-e5-base models were chosen for their multilingual capabilities and sentence-level embedding performance. To assess the quality of search results, Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (nDCG) were used. These metrics were calculated for each model across both language groups. Evaluation results show that the multilingual-e5-base model consistently outperformed the other models in both MAP and nDCG scores, demonstrating superior semantic understanding and multilingual alignment. The system also features a simple and responsive Streamlit-based interface that allows for real-time querying and result display. © 2025 IEEE.
Automating Software Size Measurement from Python Code Using Language Models
(Springer, 2025) Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kuuk, Murat; Demirors, Onur
Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.
Citation - WoS: 1
Citation - Scopus: 1
Automating Software Size Measurement With Language Models: Insights From Industrial Case Studies
(Elsevier Science Inc, 2026) Unlu, Huseyin; Tenekeci, Samet; Kennouche, Dhia Eddine; Demirors, Onur
Objective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models - BERT and SE-BERT - to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization's requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts.
Citation - Scopus: 3
Predicting Software Size and Effort From Code Using Natural Language Processing
(CEUR-WS, 2024) Tenekeci, S.; Demirörs, Onur; Ünlü, H.; Dikenelli, E.; Selçuk, U.; Kılınç Soylu, G.; Demirörs, O.
Software Size Measurement (SSM) holds a crucial role in software project management by facilitating the acquisition of software size, which serves as the primary input for development effort and schedule estimation. However, many small and medium-sized companies encounter challenges in conducting objective SSM and Software Effort Estimation (SEE) due to resource constraints and a lack of expert workforce. This often leads to inaccurate estimates and projects exceeding planned time and budget. Hence, organizations need to perform objective SSM and SEE with minimal resources and without relying on an expert workforce. In this research, we introduce two exploratory case studies aimed at predicting the functional size (COSMIC and Event-based size) and effort of software projects from the code using a deep-learning-based NLP model: CodeBERT. For this purpose, we collected and annotated two datasets consisting of 4800 Python and 1100 C# functions. Then, we trained a classification model to predict COSMIC data movements (entry, exit, read, write) and four regression models to predict Event-based size (interaction, communication, process) and effort. Despite utilizing a relatively small dataset for model training, we achieved promising results with an 84.5% accuracy for the COSMIC size, 0.13 normalized mean absolute error (NMAE) for the Event-based size, and 0.18 NMAE for the effort. These findings are particularly insightful as they demonstrate the practical utility of language models in SSM and SEE. © 2024 Copyright for this paper by its authors.

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results