Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 7 of 7

Automating Software Size Measurement From Python Code Using Language Models (Vol 33, 19, 2026)
(Springer, 2025) Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kucuk, Murat; Demirors, Onur
Automating Software Size Measurement from Python Code Using Language Models
(Springer, 2025) Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kuuk, Murat; Demirors, Onur
Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.
Citation - WoS: 1
Citation - Scopus: 1
Automating Software Size Measurement With Language Models: Insights From Industrial Case Studies
(Elsevier Science Inc, 2026) Unlu, Huseyin; Tenekeci, Samet; Kennouche, Dhia Eddine; Demirors, Onur
Objective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models - BERT and SE-BERT - to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization's requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts.
Citation - WoS: 3
Citation - Scopus: 5
Predicting Software Functional Size Using Natural Language Processing: an Exploratory Case Study
(IEEE, 2024) Unlu, Huseyin; Tenekeci, Samet; Ciftci, Can; Oral, Ibrahim Baran; Atalay, Tunahan; Hacaloglu, Tuna; Demirors, Onur
Software Size Measurement (SSM) plays an essential role in software project management as it enables the acquisition of software size, which is the primary input for development effort and schedule estimation. However, many small and medium-sized companies cannot perform objective SSM and Software Effort Estimation (SEE) due to the lack of resources and an expert workforce. This results in inadequate estimates and projects exceeding the planned time and budget. Therefore, organizations need to perform objective SSM and SEE using minimal resources without an expert workforce. In this research, we conducted an exploratory case study to predict the functional size of software project requirements using state-of-the-art large language models (LLMs). For this aim, we fine-tuned BERT and BERT_SE with a set of user stories and their respective functional size in COSMIC Function Points (CFP). We gathered the user stories included in different project requirement documents. In total size prediction, we achieved 72.8% accuracy with BERT and 74.4% accuracy with BERT_SE. In data movement-based size prediction, we achieved 87.5% average accuracy with BERT and 88.1% average accuracy with BERT_SE. Although we use relatively small datasets in model training, these results are promising and hold significant value as they demonstrate the practical utility of language models in SSM.
Citation - WoS: 4
Citation - Scopus: 4
Integrative Biological Network Analysis To Identify Shared Genes in Metabolic Disorders
(Institute of Electrical and Electronics Engineers, 2022) Tenekeci, Samet; Işık, Zerrin
Identification of common molecular mechanisms in interrelated diseases is essential for better prognoses and targeted therapies. However, complexity of metabolic pathways makes it difficult to discover common disease genes underlying metabolic disorders; and it requires more sophisticated bioinformatics models that combine different types of biological data and computational methods. Accordingly, we built an integrative network analysis model to identify shared disease genes in metabolic syndrome (MS), type 2 diabetes (T2D), and coronary artery disease (CAD). We constructed weighted gene co-expression networks by combining gene expression, protein-protein interaction, and gene ontology data from multiple sources. For 90 different configurations of disease networks, we detected the significant modules by using MCL, SPICi, and Linkcomm graph clustering algorithms. We also performed a comparative evaluation on disease modules to determine the best method providing the highest biological validity. By overlapping the disease modules, we identified 22 shared genes for MS-CAD and T2D-CAD. Moreover, 19 out of these genes were directly or indirectly associated with relevant diseases in the previous medical studies. This study does not only demonstrate the performance of different biological data sources and computational methods in disease-gene discovery, but also offers potential insights into common genetic mechanisms of the metabolic disorders.
Citation - WoS: 4
Citation - Scopus: 12
Event Oriented Vs Object Oriented Analysis for Microservice Architecture: an Exploratory Case Study
(Institute of Electrical and Electronics Engineers, 2021) Ünlü, Hüseyin; Tenekeci, Samet; Yıldız, Ali; Demirörs, Onur
The rapidly developing internet infrastructure together with the advances in software technology has enabled the development of cloud-based modern web applications that are much more responsive, flexible, and reliable compared to traditional monolithic applications. Such modern applications require new software design paradigms and architectures. Microservice-based architecture (MSbA), which aims to create small, isolated, loosely-coupled applications that work in cohesion, becoming widespread as one of these approaches. MSbA allows the developed applications to be deployed and maintained separately, as well as scaled on demand. However, there is no de facto method for the analysis and design of systems for these architectures. In this paper, we compared the usefulness of the object-oriented (OO) and event-oriented (EO) approaches for analyzing and designing MS-based systems. More specifically, we performed an exploratory case study to analyze, design, and implement a software application dealing with the 'application and evaluation process of graduate students at IzTech'. This paper discusses the results of this case study. We observe that the EO approaches have significant advantages with respect to the OO approaches.
Citation - WoS: 1
Citation - Scopus: 1
Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content
(World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, Selma
In the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results