An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

dc.contributor.author Yurum, Ozan Rasit
dc.contributor.author Unlu, Huseyin
dc.contributor.author Demirors, Onur
dc.date.accessioned 2025-08-27T16:41:00Z
dc.date.available 2025-08-27T16:41:00Z
dc.date.issued 2026
dc.description.abstract Effort estimation plays a vital role in software project planning, as accurate estimates of required human resources are essential for success. Traditional estimation models often depend on historical size and effort data, yet organizations frequently struggle to access reliable effort records. Public benchmarking datasets like ISBSG offer useful data but may lack coverage or involve licensing fees. To address this issue, we previously introduced a free, extendable benchmarking dataset that integrates functional size and effort data extracted from 18 studies. In this study, we examine the effectiveness of our dataset for predictive effort estimation and compare it with the widely used ISBSG dataset. Our analysis includes 337 records from our dataset and 732 ISBSG projects, focusing on those with COSMIC size data. We first developed and compared models using linear regression and nine machine learning algorithms - Bayesian Ridge, Ridge Regression, Decision Tree, Random Forest, XGBoost, LightGBM, k-Nearest Neighbors, Multi-Layer Perceptron, and Support Vector Regression. Then, we selected the best-performing models and applied them to an unseen evaluation dataset to assess their generalization performance. The results show that machine learning performance varies based on evaluation method and dataset characteristics. Despite having fewer records, our dataset enabled more accurate predictions than ISBSG in most cases, highlighting its potential for effort estimation. This study demonstrates the viability of our dataset for building predictive models and supports the use of machine learning in improving estimation accuracy. Expanding this dataset could offer a valuable, open-access resource for organizations seeking effective and lowcost estimation solutions. en_US
dc.description.sponsorship Scientific and Technological Research Council of Turkey (TUBITAK) ARDEB 1001 [121E389] en_US
dc.description.sponsorship This research is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) ARDEB 1001 [Project number: 121E389] program. We would like to express our gratitude to all the scientists who contributed to creating this benchmarking dataset by openly sharing the data from their research. en_US
dc.identifier.doi 10.1016/j.jss.2025.112591
dc.identifier.issn 0164-1212
dc.identifier.issn 1873-1228
dc.identifier.scopus 2-s2.0-105012595395
dc.identifier.uri https://doi.org/10.1016/j.jss.2025.112591
dc.language.iso en en_US
dc.publisher Elsevier Science Inc en_US
dc.relation.ispartof Journal of Systems and Software en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Size Measurement en_US
dc.subject Effort Estimation en_US
dc.subject Cosmic en_US
dc.subject ISBGS en_US
dc.subject Machine Learning en_US
dc.subject Dataset en_US
dc.subject Benchmarking en_US
dc.title An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning en_US
dc.title An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning
dc.type Article en_US
dspace.entity.type Publication
gdc.author.wosid Demirors, Onur/R-7023-2016
gdc.author.wosid Yurum, Ozan Rasit/Izd-9887-2023
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology en_US
gdc.description.departmenttemp [Yurum, Ozan Rasit] Izmir Bakircay Univ, Dept Comp Engn, TR-35665 Izmir, Turkiye; [Unlu, Huseyin; Demirors, Onur] Izmir Inst Technol, Dept Comp Engn, TR-35430 Izmir, Turkiye en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.volume 231 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q1
gdc.identifier.openalex W4412980653
gdc.identifier.wos WOS:001547330200001
gdc.index.type WoS
gdc.index.type Scopus
gdc.openalex.collaboration National
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.36
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 4
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.wos.citedcount 0
relation.isAuthorOfPublication.latestForDiscovery 478fdf31-7c73-4f1a-94a4-2775adf0cec4
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4003-8abe-a4dfe192da5e

Files