An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

Demirörs, Onur; Unlu, Huseyin; Demirors, Onur

doi:10.1016/j.jss.2025.112591

An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

dc.contributor.author	Demirörs, Onur
dc.contributor.author	Unlu, Huseyin
dc.contributor.author	Demirors, Onur
dc.contributor.other	01. Izmir Institute of Technology
dc.contributor.other	03. Faculty of Engineering
dc.contributor.other	03.04. Department of Computer Engineering
dc.date.accessioned	2025-08-27T16:41:00Z
dc.date.available	2025-08-27T16:41:00Z
dc.date.issued	2026
dc.description.abstract	Effort estimation plays a vital role in software project planning, as accurate estimates of required human resources are essential for success. Traditional estimation models often depend on historical size and effort data, yet organizations frequently struggle to access reliable effort records. Public benchmarking datasets like ISBSG offer useful data but may lack coverage or involve licensing fees. To address this issue, we previously introduced a free, extendable benchmarking dataset that integrates functional size and effort data extracted from 18 studies. In this study, we examine the effectiveness of our dataset for predictive effort estimation and compare it with the widely used ISBSG dataset. Our analysis includes 337 records from our dataset and 732 ISBSG projects, focusing on those with COSMIC size data. We first developed and compared models using linear regression and nine machine learning algorithms - Bayesian Ridge, Ridge Regression, Decision Tree, Random Forest, XGBoost, LightGBM, k-Nearest Neighbors, Multi-Layer Perceptron, and Support Vector Regression. Then, we selected the best-performing models and applied them to an unseen evaluation dataset to assess their generalization performance. The results show that machine learning performance varies based on evaluation method and dataset characteristics. Despite having fewer records, our dataset enabled more accurate predictions than ISBSG in most cases, highlighting its potential for effort estimation. This study demonstrates the viability of our dataset for building predictive models and supports the use of machine learning in improving estimation accuracy. Expanding this dataset could offer a valuable, open-access resource for organizations seeking effective and lowcost estimation solutions.	en_US
dc.description.sponsorship	Scientific and Technological Research Council of Turkey (TUBITAK) ARDEB 1001 [121E389]	en_US
dc.description.sponsorship	This research is supported by The Scientific and Technological Research Council of Turkey (TUBITAK) ARDEB 1001 [Project number: 121E389] program. We would like to express our gratitude to all the scientists who contributed to creating this benchmarking dataset by openly sharing the data from their research.	en_US
dc.identifier.doi	10.1016/j.jss.2025.112591
dc.identifier.issn	0164-1212
dc.identifier.issn	1873-1228
dc.identifier.scopus	2-s2.0-105012595395
dc.identifier.uri	https://doi.org/10.1016/j.jss.2025.112591
dc.language.iso	en	en_US
dc.publisher	Elsevier Science Inc	en_US
dc.relation.ispartof	Journal of Systems and Software	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Size Measurement	en_US
dc.subject	Effort Estimation	en_US
dc.subject	Cosmic	en_US
dc.subject	ISBGS	en_US
dc.subject	Machine Learning	en_US
dc.subject	Dataset	en_US
dc.subject	Benchmarking	en_US
dc.title	An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning	en_US
dc.title	An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning
dc.type	Article	en_US
dspace.entity.type	Publication
gdc.author.wosid	Demirors, Onur/R-7023-2016
gdc.author.wosid	Yurum, Ozan Rasit/Izd-9887-2023
gdc.coar.type	text::journal::journal article
gdc.collaboration.industrial	false
gdc.description.department	İzmir Institute of Technology	en_US
gdc.description.departmenttemp	[Yurum, Ozan Rasit] Izmir Bakircay Univ, Dept Comp Engn, TR-35665 Izmir, Turkiye; [Unlu, Huseyin; Demirors, Onur] Izmir Inst Technol, Dept Comp Engn, TR-35430 Izmir, Turkiye	en_US
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	Q1
gdc.description.volume	231	en_US
gdc.description.woscitationindex	Science Citation Index Expanded
gdc.description.wosquality	Q1
gdc.identifier.openalex	W4412980653
gdc.identifier.wos	WOS:001547330200001
gdc.index.type	WoS
gdc.index.type	Scopus
gdc.openalex.collaboration	National
gdc.openalex.fwci	0.0
gdc.openalex.normalizedpercentile	0.36
gdc.openalex.toppercent	TOP 10%
gdc.opencitations.count	0
gdc.plumx.mendeley	4
gdc.plumx.scopuscites	0
gdc.scopus.citedcount	0
gdc.wos.citedcount	0
relation.isAuthorOfPublication	478fdf31-7c73-4f1a-94a4-2775adf0cec4
relation.isAuthorOfPublication.latestForDiscovery	478fdf31-7c73-4f1a-94a4-2775adf0cec4
relation.isOrgUnitOfPublication	9af2b05f-28ac-4003-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4004-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4014-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication.latestForDiscovery	9af2b05f-28ac-4003-8abe-a4dfe192da5e

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

Files

Collections