Automating Modern Code Review Processes With Code Similarity Measurement

Kartal,Y.; Akdeniz,E.K.; Özkan,K.

doi:10.1016/j.infsof.2024.107490

Automating Modern Code Review Processes With Code Similarity Measurement

dc.contributor.author	Kartal,Y.
dc.contributor.author	Akdeniz,E.K.
dc.contributor.author	Özkan,K.
dc.contributor.other	01. Izmir Institute of Technology
dc.date.accessioned	2024-06-19T14:29:41Z
dc.date.available	2024-06-19T14:29:41Z
dc.date.issued	2024
dc.description.abstract	Context: Modern code review is a critical component in software development processes, as it ensures security, detects errors early and improves code quality. However, manual reviews can be time-consuming and unreliable. Automated code review can address these issues. Although deep-learning methods have been used to recommend code review comments, they are expensive to train and employ. Instead, information retrieval (IR)-based methods for automatic code review are showing promising results in efficiency, effectiveness, and flexibility. Objective: Our main objective is to determine the optimal combination of the vectorization method and similarity to measure what gives the best results in an automatic code review, thereby improving the performance of IR-based methods. Method: Specifically, we investigate different vectorization methods (Word2Vec, Doc2Vec, Code2Vec, and Transformer) that differ from previous research (TF-IDF and Bag-of-Words), and similarity measures (Cosine, Euclidean, and Manhattan) to capture the semantic similarities between code texts. We evaluate the performance of these methods using standard metrics, such as Blue, Meteor, and Rouge-L, and include the run-time of the models in our results. Results: Our results demonstrate that the Transformer model outperforms the state-of-the-art method in all standard metrics and similarity measurements, achieving a 19.1% improvement in providing exact matches and a 6.2% improvement in recommending reviews closer to human reviews. Conclusion: Our findings suggest that the Transformer model is a highly effective and efficient approach for recommending code review comments that closely resemble those written by humans, providing valuable insight for developing more efficient and effective automated code review systems. © 2024 Elsevier B.V.	en_US
dc.identifier.doi	10.1016/j.infsof.2024.107490
dc.identifier.issn	9505-849
dc.identifier.issn	0950-5849
dc.identifier.scopus	2-s2.0-85193900630
dc.identifier.uri	https://doi.org/10.1016/j.infsof.2024.107490
dc.identifier.uri	https://hdl.handle.net/11147/14571
dc.language.iso	en	en_US
dc.publisher	Elsevier B.V.	en_US
dc.relation.ispartof	Information and Software Technology	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Code similarity	en_US
dc.subject	Information retrieval	en_US
dc.subject	Modern code review	en_US
dc.subject	Vectorization	en_US
dc.title	Automating Modern Code Review Processes With Code Similarity Measurement	en_US
dc.type	Article	en_US
dspace.entity.type	Publication
gdc.author.scopusid	24490853600
gdc.author.scopusid	58635463000
gdc.author.scopusid	15081108900
gdc.bip.impulseclass	C5
gdc.bip.influenceclass	C5
gdc.bip.popularityclass	C5
gdc.coar.access	metadata only access
gdc.coar.type	text::journal::journal article
gdc.collaboration.industrial	false
gdc.description.department	Izmir Institute of Technology	en_US
gdc.description.departmenttemp	Kartal Y., Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey; Akdeniz E.K., Computer Engineering, Izmir Institute of Technology, Izmir, Turkey; Özkan K., Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey	en_US
gdc.description.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
gdc.description.scopusquality	N/A
gdc.description.volume	173	en_US
gdc.description.wosquality	Q1
gdc.identifier.openalex	W4397008809
gdc.identifier.wos	WOS:001245336700001
gdc.index.type	WoS
gdc.index.type	Scopus
gdc.oaire.diamondjournal	false
gdc.oaire.impulse	3.0
gdc.oaire.influence	2.8237714E-9
gdc.oaire.isgreen	false
gdc.oaire.popularity	2.594452E-9
gdc.oaire.publicfunded	false
gdc.openalex.collaboration	National
gdc.openalex.fwci	7.63825132
gdc.openalex.normalizedpercentile	0.95
gdc.openalex.toppercent	TOP 10%
gdc.opencitations.count	0
gdc.plumx.mendeley	25
gdc.plumx.scopuscites	4
gdc.scopus.citedcount	4
gdc.wos.citedcount	4
relation.isOrgUnitOfPublication	9af2b05f-28ac-4003-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication.latestForDiscovery	9af2b05f-28ac-4003-8abe-a4dfe192da5e

Collections

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Automating Modern Code Review Processes With Code Similarity Measurement

Files

Collections