Automating Software Size Measurement from Python Code Using Language Models

dc.contributor.author Tenekeci, Samet
dc.contributor.author Unlu, Huseyin
dc.contributor.author Gul, Bedir Arda
dc.contributor.author Keles, Damla
dc.contributor.author Kuuk, Murat
dc.contributor.author Demirors, Onur
dc.date.accessioned 2025-11-25T15:10:08Z
dc.date.available 2025-11-25T15:10:08Z
dc.date.issued 2025
dc.description.abstract Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable. en_US
dc.identifier.doi 10.1007/s10515-025-00571-z
dc.identifier.issn 0928-8910
dc.identifier.issn 1573-7535
dc.identifier.scopus 2-s2.0-105019179221
dc.identifier.uri https://doi.org/10.1007/s10515-025-00571-z
dc.identifier.uri https://hdl.handle.net/11147/18640
dc.language.iso en en_US
dc.publisher Springer en_US
dc.relation.ispartof Automated Software Engineering en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Software Size Measurement en_US
dc.subject Cosmic en_US
dc.subject MICROM en_US
dc.subject Natural Language Processing en_US
dc.subject CodeBERT en_US
dc.subject BERT en_US
dc.title Automating Software Size Measurement from Python Code Using Language Models
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 57340107000
gdc.author.scopusid 57521977500
gdc.author.scopusid 60046500900
gdc.author.scopusid 60046501000
gdc.author.scopusid 60147729900
gdc.author.scopusid 55949165100
gdc.author.wosid Unlu, Huseyin/Ovz-3608-2025
gdc.author.wosid Tenekeci, Samet/Aar-7906-2021
gdc.author.wosid Demirors, Onur/R-7023-2016
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology en_US
gdc.description.departmenttemp [Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kuuk, Murat; Demirors, Onur] Izmir Inst Technol, Dept Comp Engn, Gulbahce Campus, TR-35430 Izmir, Turkiye en_US
gdc.description.issue 1 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q2
gdc.description.volume 33 en_US
gdc.description.woscitationindex Science Citation Index Expanded
gdc.description.wosquality Q2
gdc.identifier.openalex W4415323052
gdc.identifier.wos WOS:001595869700006
gdc.index.type WoS
gdc.index.type Scopus
gdc.openalex.collaboration National
gdc.openalex.fwci 0.0
gdc.openalex.normalizedpercentile 0.54
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.wos.citedcount 0
relation.isAuthorOfPublication.latestForDiscovery ac9e5966-0436-4d1b-ad4a-c94f332f3224
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4003-8abe-a4dfe192da5e

Files