Automating Software Size Measurement With Language Models: Insights From Industrial Case Studies
| dc.contributor.author | Unlu, Huseyin | |
| dc.contributor.author | Tenekeci, Samet | |
| dc.contributor.author | Kennouche, Dhia Eddine | |
| dc.contributor.author | Demirors, Onur | |
| dc.date.accessioned | 2025-10-25T17:44:53Z | |
| dc.date.available | 2025-10-25T17:44:53Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | Objective software size measurement is critical for accurate effort estimation, yet many organizations avoid it due to high costs, required expertise, and time-consuming manual effort. This often leads to vague predictions, poor planning, and project overruns. To address this challenge, we investigate the use of pre-trained language models - BERT and SE-BERT - to automate size measurement based on textual requirements using COSMIC and MicroM methods. We constructed one heterogeneous dataset and two industrial datasets, each manually measured by experienced analysts. Models were evaluated in three settings: (i) generic model evaluation, where the models are trained and tested on heterogeneous data, (ii) internal evaluation, where the models are trained and tested on organization-specific data, and (iii) external evaluation, where generic models were tested on organization-specific data. Results show that organization-specific models significantly outperform generic models, indicating that aligning training data with the target organization's requirement style is critical for accuracy. SE-BERT, a domain-adapted variant of BERT, improves performance, particularly in low-resource settings. These findings highlight the practical potential of tailoring training data for broader adoption and cost-effective software size measurement in industrial contexts. | en_US |
| dc.identifier.doi | 10.1016/j.jss.2025.112638 | |
| dc.identifier.issn | 0164-1212 | |
| dc.identifier.issn | 1873-1228 | |
| dc.identifier.scopus | 2-s2.0-105018172007 | |
| dc.identifier.uri | https://doi.org/10.1016/j.jss.2025.112638 | |
| dc.language.iso | en | en_US |
| dc.publisher | Elsevier Science Inc | en_US |
| dc.relation.ispartof | Journal of Systems and Software | en_US |
| dc.rights | info:eu-repo/semantics/closedAccess | en_US |
| dc.subject | Software Size Measurement | en_US |
| dc.subject | COSMIC | en_US |
| dc.subject | MICROM | en_US |
| dc.subject | Natural Language Processing | en_US |
| dc.subject | NLP | en_US |
| dc.subject | BERT | en_US |
| dc.subject | Case Study | en_US |
| dc.title | Automating Software Size Measurement With Language Models: Insights From Industrial Case Studies | |
| dc.title | Automating Software Size Measurement with Language Models: Insights from Industrial Case Studies | en_US |
| dc.type | Article | en_US |
| dspace.entity.type | Publication | |
| gdc.author.institutional | Demirörs, Onur | |
| gdc.author.wosid | Unlu, Huseyin/Ovz-3608-2025 | |
| gdc.author.wosid | Demirors, Onur/R-7023-2016 | |
| gdc.author.wosid | Tenekeci, Samet/Aar-7906-2021 | |
| gdc.coar.type | text::journal::journal article | |
| gdc.collaboration.industrial | false | |
| gdc.description.department | İzmir Institute of Technology | en_US |
| gdc.description.departmenttemp | [Unlu, Huseyin; Tenekeci, Samet; Kennouche, Dhia Eddine; Demirors, Onur] Izmir Inst Technol, Dept Comp Engn, Gulbahce Campus, TR-35430 Izmir, Turkiye | en_US |
| gdc.description.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
| gdc.description.scopusquality | N/A | |
| gdc.description.volume | 231 | en_US |
| gdc.description.woscitationindex | Science Citation Index Expanded | |
| gdc.description.wosquality | Q1 | |
| gdc.identifier.openalex | W4414938745 | |
| gdc.identifier.wos | WOS:001613165600001 | |
| gdc.index.type | WoS | |
| gdc.index.type | Scopus | |
| gdc.openalex.collaboration | National | |
| gdc.openalex.fwci | 0.0 | |
| gdc.openalex.normalizedpercentile | 0.5 | |
| gdc.openalex.toppercent | TOP 10% | |
| gdc.opencitations.count | 0 | |
| gdc.plumx.mendeley | 2 | |
| gdc.plumx.scopuscites | 1 | |
| gdc.scopus.citedcount | 1 | |
| gdc.wos.citedcount | 1 | |
| relation.isAuthorOfPublication.latestForDiscovery | 478fdf31-7c73-4f1a-94a4-2775adf0cec4 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 9af2b05f-28ac-4003-8abe-a4dfe192da5e |
