Predicting Software Size and Effort From Code Using Natural Language Processing

dc.contributor.author Tenekeci, S.
dc.contributor.author Demirörs, Onur
dc.contributor.author Ünlü, H.
dc.contributor.author Dikenelli, E.
dc.contributor.author Selçuk, U.
dc.contributor.author Kılınç Soylu, G.
dc.contributor.author Demirörs, O.
dc.date.accessioned 2024-10-25T23:27:22Z
dc.date.available 2024-10-25T23:27:22Z
dc.date.issued 2024
dc.description.abstract Software Size Measurement (SSM) holds a crucial role in software project management by facilitating the acquisition of software size, which serves as the primary input for development effort and schedule estimation. However, many small and medium-sized companies encounter challenges in conducting objective SSM and Software Effort Estimation (SEE) due to resource constraints and a lack of expert workforce. This often leads to inaccurate estimates and projects exceeding planned time and budget. Hence, organizations need to perform objective SSM and SEE with minimal resources and without relying on an expert workforce. In this research, we introduce two exploratory case studies aimed at predicting the functional size (COSMIC and Event-based size) and effort of software projects from the code using a deep-learning-based NLP model: CodeBERT. For this purpose, we collected and annotated two datasets consisting of 4800 Python and 1100 C# functions. Then, we trained a classification model to predict COSMIC data movements (entry, exit, read, write) and four regression models to predict Event-based size (interaction, communication, process) and effort. Despite utilizing a relatively small dataset for model training, we achieved promising results with an 84.5% accuracy for the COSMIC size, 0.13 normalized mean absolute error (NMAE) for the Event-based size, and 0.18 NMAE for the effort. These findings are particularly insightful as they demonstrate the practical utility of language models in SSM and SEE. © 2024 Copyright for this paper by its authors. en_US
dc.identifier.issn 1613-0073
dc.identifier.scopus 2-s2.0-85212684670
dc.identifier.uri https://hdl.handle.net/11147/14915
dc.language.iso en en_US
dc.publisher CEUR-WS en_US
dc.relation.ispartof CEUR Workshop Proceedings -- Joint of the 33rd International Workshop on Software Measurement and the 18th International Conference on Software Process and Product Measurement, IWSM-MENSURA 2024 -- 30 September 2024 through 4 October 2024 -- Montreal -- 204467.0 en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Artificial Intelligence en_US
dc.subject Effort Estimation en_US
dc.subject Natural Language Processing en_US
dc.subject Software Size Measurement en_US
dc.title Predicting Software Size and Effort From Code Using Natural Language Processing en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.scopusid 57340107000
gdc.author.scopusid 57521977500
gdc.author.scopusid 59481631600
gdc.author.scopusid 59481946500
gdc.author.scopusid 55811008000
gdc.author.scopusid 55949165100
gdc.coar.access metadata only access
gdc.coar.type text::conference output
gdc.description.department İzmir Institute of Technology en_US
gdc.description.departmenttemp Tenekeci S., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Ünlü H., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Dikenelli E., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Selçuk U., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey; Kılınç Soylu G., İzmir University of Economics, Balçova, İzmir, 35330, Turkey; Demirörs O., İzmir Institute of Technology, Gülbahçe, İzmir, 35430, Turkey en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q4
gdc.description.volume 3852 en_US
gdc.index.type Scopus
gdc.scopus.citedcount 3
relation.isAuthorOfPublication.latestForDiscovery 478fdf31-7c73-4f1a-94a4-2775adf0cec4
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files