DSpace Repository

A cosine similarity-based labeling technique for vulnerability type detection using source codes

Show simple item record

dc.creator ÖZTÜRK, Mahmut Sami
dc.date 2024-11-01T00:00:00Z
dc.date.accessioned 2025-02-25T10:16:47Z
dc.date.available 2025-02-25T10:16:47Z
dc.identifier 09a611b0-5f28-45ca-81ef-a65290760880
dc.identifier 10.1016/j.cose.2024.104059
dc.identifier https://avesis.sdu.edu.tr/publication/details/09a611b0-5f28-45ca-81ef-a65290760880/oai
dc.identifier.uri http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/98845
dc.description Vulnerability detection is of great importance in providing reliability to software systems. Although existing methods achieve remarkable success in vulnerability detection, they have several disadvantages as follows: (1) The irrelevant information is removed from source codes, which have a high noise ratio, thereby utilizing deep learning methods and devising experiments featuring high accuracy. However, deep learning-based detection methods necessitate large-scale datasets. This results in computational hardship with respect to vulnerability detection in small-scale software systems. (2) The majority of the studies perform feature selection by processing vulnerability commits. Despite tremendous endeavors, there are few works detecting vulnerability with source codes. To solve these two problems, in this study, a novel labeling and vulnerability detection algorithm is proposed. The algorithm first exploits source codes with the help of a keyword vulnerability matrix. After that, an ultimate encoded matrix is generated by word2vec, thereby combining the labeling vector with the source code matrix to reveal a trainable dataset for a generalized linear model (GLM). Different from preceding studies, our method performs vulnerability detection without requiring vulnerability commits but using source codes. In addition to this, similar studies generally aim to bring sophisticated solutions for just one type of programming language. Conversely, our study develops vulnerability keywords for three programming languages including C#, Java, and C++, and creates the related labeling vectors by regarding the keyword matrix. The proposed method outperformed the baseline approaches for most of the experimental datasets with over 90% of the area under the curve (AUC). Further, there is a 7.7% margin between our method and the alternatives on average for Recall, Precision, and F1-score with respect to five types of vulnerabilities.
dc.language eng
dc.rights info:eu-repo/semantics/closedAccess
dc.title A cosine similarity-based labeling technique for vulnerability type detection using source codes
dc.type info:eu-repo/semantics/article


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account