An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

Demirörs, Onur; Unlu, Huseyin; Demirors, Onur

doi:10.1016/j.jss.2025.112591

An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

Date

2026

Authors

Demirörs, Onur

Unlu, Huseyin

Demirors, Onur

Publisher

Elsevier Science Inc

Abstract

Effort estimation plays a vital role in software project planning, as accurate estimates of required human resources are essential for success. Traditional estimation models often depend on historical size and effort data, yet organizations frequently struggle to access reliable effort records. Public benchmarking datasets like ISBSG offer useful data but may lack coverage or involve licensing fees. To address this issue, we previously introduced a free, extendable benchmarking dataset that integrates functional size and effort data extracted from 18 studies. In this study, we examine the effectiveness of our dataset for predictive effort estimation and compare it with the widely used ISBSG dataset. Our analysis includes 337 records from our dataset and 732 ISBSG projects, focusing on those with COSMIC size data. We first developed and compared models using linear regression and nine machine learning algorithms - Bayesian Ridge, Ridge Regression, Decision Tree, Random Forest, XGBoost, LightGBM, k-Nearest Neighbors, Multi-Layer Perceptron, and Support Vector Regression. Then, we selected the best-performing models and applied them to an unseen evaluation dataset to assess their generalization performance. The results show that machine learning performance varies based on evaluation method and dataset characteristics. Despite having fewer records, our dataset enabled more accurate predictions than ISBSG in most cases, highlighting its potential for effort estimation. This study demonstrates the viability of our dataset for building predictive models and supports the use of machine learning in improving estimation accuracy. Expanding this dataset could offer a valuable, open-access resource for organizations seeking effective and lowcost estimation solutions.

Keywords

Size Measurement, Effort Estimation, Cosmic, ISBGS, Machine Learning, Dataset, Benchmarking

WoS Q

Q1

Scopus Q

Q1

OpenCitations Citation Count

N/A

Source

Journal of Systems and Software

Volume

231

URI

https://doi.org/10.1016/j.jss.2025.112591

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

PlumX Metrics

Citations

Scopus : 0

Captures

Mendeley Readers : 4

Full item page

Page Views

21

checked on Jun 19, 2026

Google Scholar™

Check

An Alternative Software Benchmarking Dataset: Effort Estimation With Machine Learning

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

Page Views

21

Google Scholar™

OpenAlex FWCI

0.0

Sustainable Development Goals

SDG data is not available