DSpace Repository

Performance of Using Tag-based Feature Sets in Web Page Classification

Show simple item record

dc.creator ÖZEL, Selma Ayşe
dc.creator ÜNAL, Havva Esin
dc.creator ÜNAL, İlker
dc.date 2018-08-15T00:00:00Z
dc.date.accessioned 2019-07-09T11:59:35Z
dc.date.available 2019-07-09T11:59:35Z
dc.identifier http://dergipark.org.tr/sdufenbed/issue/38975/456352
dc.identifier
dc.identifier.uri http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/46368
dc.description As the Web is a large collection of data growing daily, an automatic Web page classification mechanism is needed to effectively reach to useful information. Majority of the Web pages are in the form of HTML documents, therefore the aim of this study is to explore the effect of HTML tags on classification process, and try to determine the most valuable HTML tags for feature extraction of the classification task. To achieve this goal, we employ 13 different datasets, and use 5 popular classifiers that are SVM, naïve bayes (NB), kNN, C4.5, and OneR. The statistical analysis shows that, the features extracted by using solely the anchor, <p> or <title> tags can be used as an alternative to the features extracted from the whole Web page. SVM is the best among the classifiers used in this study. Using the HTML tags for feature extraction improves classification accuracy.
dc.format application/pdf
dc.publisher Süleyman Demirel University
dc.publisher Süleyman Demirel Üniversitesi
dc.relation http://dergipark.org.tr/download/article-file/528958
dc.source Volume: 22, Issue: 2 583-594 en-US
dc.source 1308-6529
dc.subject Web mining,Classification; HTML tags; Feature extraction
dc.title Performance of Using Tag-based Feature Sets in Web Page Classification en-US
dc.type info:eu-repo/semantics/article


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account