Automating Modern Code Review Processes With Code Similarity Measurement

dc.contributor.author Kartal,Y.
dc.contributor.author Akdeniz,E.K.
dc.contributor.author Özkan,K.
dc.date.accessioned 2024-06-19T14:29:41Z
dc.date.available 2024-06-19T14:29:41Z
dc.date.issued 2024
dc.description.abstract Context: Modern code review is a critical component in software development processes, as it ensures security, detects errors early and improves code quality. However, manual reviews can be time-consuming and unreliable. Automated code review can address these issues. Although deep-learning methods have been used to recommend code review comments, they are expensive to train and employ. Instead, information retrieval (IR)-based methods for automatic code review are showing promising results in efficiency, effectiveness, and flexibility. Objective: Our main objective is to determine the optimal combination of the vectorization method and similarity to measure what gives the best results in an automatic code review, thereby improving the performance of IR-based methods. Method: Specifically, we investigate different vectorization methods (Word2Vec, Doc2Vec, Code2Vec, and Transformer) that differ from previous research (TF-IDF and Bag-of-Words), and similarity measures (Cosine, Euclidean, and Manhattan) to capture the semantic similarities between code texts. We evaluate the performance of these methods using standard metrics, such as Blue, Meteor, and Rouge-L, and include the run-time of the models in our results. Results: Our results demonstrate that the Transformer model outperforms the state-of-the-art method in all standard metrics and similarity measurements, achieving a 19.1% improvement in providing exact matches and a 6.2% improvement in recommending reviews closer to human reviews. Conclusion: Our findings suggest that the Transformer model is a highly effective and efficient approach for recommending code review comments that closely resemble those written by humans, providing valuable insight for developing more efficient and effective automated code review systems. © 2024 Elsevier B.V. en_US
dc.identifier.doi 10.1016/j.infsof.2024.107490
dc.identifier.issn 9505-849
dc.identifier.issn 0950-5849
dc.identifier.scopus 2-s2.0-85193900630
dc.identifier.uri https://doi.org/10.1016/j.infsof.2024.107490
dc.identifier.uri https://hdl.handle.net/11147/14571
dc.language.iso en en_US
dc.publisher Elsevier B.V. en_US
dc.relation.ispartof Information and Software Technology en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Code similarity en_US
dc.subject Information retrieval en_US
dc.subject Modern code review en_US
dc.subject Vectorization en_US
dc.title Automating Modern Code Review Processes With Code Similarity Measurement en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.scopusid 24490853600
gdc.author.scopusid 58635463000
gdc.author.scopusid 15081108900
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access metadata only access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department Izmir Institute of Technology en_US
gdc.description.departmenttemp Kartal Y., Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey; Akdeniz E.K., Computer Engineering, Izmir Institute of Technology, Izmir, Turkey; Özkan K., Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.volume 173 en_US
gdc.description.wosquality Q1
gdc.identifier.openalex W4397008809
gdc.identifier.wos WOS:001245336700001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 3.0
gdc.oaire.influence 2.8237714E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.594452E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 7.63825132
gdc.openalex.normalizedpercentile 0.95
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 25
gdc.plumx.scopuscites 4
gdc.scopus.citedcount 4
gdc.wos.citedcount 4
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4003-8abe-a4dfe192da5e

Files