Automating Software Size Measurement from Python Code Using Language Models
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.
Description
Keywords
Software Size Measurement, Cosmic, MICROM, Natural Language Processing, CodeBERT, BERT
Fields of Science
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
N/A
Volume
33
Issue
1
Start Page
End Page
PlumX Metrics
Citations
Scopus : 0
Captures
Mendeley Readers : 3
Page Views
3
checked on May 04, 2026
Google Scholar™

