Automating Software Size Measurement from Python Code Using Language Models

Tenekeci, Samet; Unlu, Huseyin; Gul, Bedir Arda; Keles, Damla; Kuuk, Murat; Demirors, Onur

doi:10.1007/s10515-025-00571-z

Automating Software Size Measurement from Python Code Using Language Models

Date

2025

Authors

Publisher

Springer

Abstract

Software size is a key input for project planning, effort estimation, and productivity analysis. While pre-trained language models have shown promise in deriving functional size from natural-language requirements, measuring size directly from source code remains under-explored. Yet, code-based size measurement is critical in modern workflows where requirement documents are often incomplete or unavailable, especially in Agile development environments. This exploratory study investigates the use of CodeBERT, a pre-trained bimodal transformer model, for measuring software size directly from Python source code according to two measurement methods: COSMIC Function Points and MicroM. We construct two curated datasets from the Python subset of the CodeSearchNet corpus, and manually annotate each function with its corresponding size. Our experimental results show that CodeBERT can successfully measure COSMIC data movements with up to 91.4% accuracy and generalize to the functional, architectural, and algorithmic event types defined in MicroM, reaching up to 81.5% accuracy. These findings highlight the potential of code-based language models for automated functional size measurement when requirement artifacts are absent or unreliable.

Keywords

Software Size Measurement, Cosmic, MICROM, Natural Language Processing, CodeBERT, BERT

WoS Q

Q2

Scopus Q

Q2

OpenCitations Citation Count

N/A

Source

Automated Software Engineering

Volume

33

Issue

1

URI

https://doi.org/10.1007/s10515-025-00571-z
https://hdl.handle.net/11147/18640

Collections

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection
Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

PlumX Metrics

Citations

Scopus : 0

Captures

Mendeley Readers : 3

Full item page

Page Views

3

checked on May 04, 2026

Google Scholar™

Check

Automating Software Size Measurement from Python Code Using Language Models

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

relationships.isProjectOf

relationships.isJournalIssueOf

Abstract

Description

Keywords

Fields of Science

Citation

WoS Q

Scopus Q

OpenCitations Citation Count

Source

Volume

Issue

Start Page

End Page

URI

Collections

PlumX Metrics

Citations

Captures

Page Views

3

Google Scholar™

OpenAlex FWCI

0.0

Sustainable Development Goals