| dc.creator |
ÖZTÜRK, Muhammed Maruf |
|
| dc.date |
2022-05-01T00:00:00Z |
|
| dc.date.accessioned |
2023-01-09T12:08:11Z |
|
| dc.date.available |
2023-01-09T12:08:11Z |
|
| dc.identifier |
cdcd3dad-c3b8-4ac7-ab45-4e78ed807f5b |
|
| dc.identifier |
10.4108/eai.27-5-2022.174084 |
|
| dc.identifier |
https://avesis.sdu.edu.tr/publication/details/cdcd3dad-c3b8-4ac7-ab45-4e78ed807f5b/oai |
|
| dc.identifier.uri |
http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/98380 |
|
| dc.description |
Although there exist various machine learning and text mining techniques to identity the programming language of complete code files, multi-label code snippet prediction was not considered by the research community. This work aims at devising a tuner for multi-label programming language prediction of stack overflow posts. To that end, a Hyper Source Code Classifier (HyperSCC) is devised along with rule-based automatic labeling by considering the bottlenecks of multi-label classification. The proposed method is evaluated on seven multi-label predictors to conduct an extensive analysis. !Ile method is further compared with the three competitive alternatives in terms of one-label programming language prediction. HyperSCC outperformed the other methods in terms of the H score. Preprocessing results in a high reduction (50%) of training time when ensemble multi-label predictors are employed. In one-label programming language prediction, Gradient Boosting Machine (gbm) yields the highest accuracy (0.99) in predicting R posts that have a lot of distinctive words determining labels. The findings support the hypothesis that multi-label predictors can be strengthened with sophisticated feature selection and labeling approaches. |
|
| dc.language |
eng |
|
| dc.rights |
info:eu-repo/semantics/openAccess |
|
| dc.title |
Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC |
|
| dc.type |
info:eu-repo/semantics/article |
|