A cosine similarity-based labeling technique for vulnerability type detection using source codes

ÖZTÜRK, Mahmut Sami

dc.creator	ÖZTÜRK, Mahmut Sami
dc.date	2024-11-01T00:00:00Z
dc.date.accessioned	2025-02-25T10:16:47Z
dc.date.available	2025-02-25T10:16:47Z
dc.identifier	09a611b0-5f28-45ca-81ef-a65290760880
dc.identifier	10.1016/j.cose.2024.104059
dc.identifier	https://avesis.sdu.edu.tr/publication/details/09a611b0-5f28-45ca-81ef-a65290760880/oai
dc.identifier.uri	http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/98845
dc.description	Vulnerability detection is of great importance in providing reliability to software systems. Although existing methods achieve remarkable success in vulnerability detection, they have several disadvantages as follows: (1) The irrelevant information is removed from source codes, which have a high noise ratio, thereby utilizing deep learning methods and devising experiments featuring high accuracy. However, deep learning-based detection methods necessitate large-scale datasets. This results in computational hardship with respect to vulnerability detection in small-scale software systems. (2) The majority of the studies perform feature selection by processing vulnerability commits. Despite tremendous endeavors, there are few works detecting vulnerability with source codes. To solve these two problems, in this study, a novel labeling and vulnerability detection algorithm is proposed. The algorithm first exploits source codes with the help of a keyword vulnerability matrix. After that, an ultimate encoded matrix is generated by word2vec, thereby combining the labeling vector with the source code matrix to reveal a trainable dataset for a generalized linear model (GLM). Different from preceding studies, our method performs vulnerability detection without requiring vulnerability commits but using source codes. In addition to this, similar studies generally aim to bring sophisticated solutions for just one type of programming language. Conversely, our study develops vulnerability keywords for three programming languages including C#, Java, and C++, and creates the related labeling vectors by regarding the keyword matrix. The proposed method outperformed the baseline approaches for most of the experimental datasets with over 90% of the area under the curve (AUC). Further, there is a 7.7% margin between our method and the alternatives on average for Recall, Precision, and F1-score with respect to five types of vulnerabilities.
dc.language	eng
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A cosine similarity-based labeling technique for vulnerability type detection using source codes
dc.type	info:eu-repo/semantics/article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Akademik Veri Yönetim Sistemi
Akademik Veri Yönetim Sistemi

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

A cosine similarity-based labeling technique for vulnerability type detection using source codes

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account