Destek vektör makineleri ve gauss karışım modeli ile  istenmeyen e-postaların tespiti = Support vector machine and gauss mixture model detection of unsolicited e-mails /

Ateş, Nurullah,
    1986-
    author
    100308; Küçüksille, Ecir Uğur,
    1976-
    thesis advisor
    9288; Süleyman Demirel Üniversitesi.
    Fen Bilimleri Enstitüsü.
    Bilgisayar Mühendisliği Anabilim Dalı.
    24579
    issuing body

dc.creator	Ateş, Nurullah, 1986- author 100308
dc.creator	Küçüksille, Ecir Uğur, 1976- thesis advisor 9288
dc.creator	Süleyman Demirel Üniversitesi. Fen Bilimleri Enstitüsü. Bilgisayar Mühendisliği Anabilim Dalı. 24579 issuing body
dc.date	2014.
dc.identifier	http://tez.sdu.edu.tr/Tezler/TF02605.pdf
dc.description	In this thesis, two different filtering methods with content based in which Support Vector Machines, a supervised learning algorithm, which detect spam mails and Gaussian Mixture Models, a unsupervised learning algorith, are used were carried out . In methods the title and the body of e-mails were used as attributes and processing was applied to character strings which belong to the messages in order to get accurate attributes.In the study carried out with Turkish messages, expressions that are not letters and attached to the character string in its beginning and the end were removed from it, the characters except the first five ones were deleted from the character string, all letters were turned into lower case letters and the character string repeating less than three times was deleted from the candidate attribute set. With Mutual Information algorithm, 49 character strings that have the highest value were chosen as attributes.In the second method, Lingspam that is a special data set was used. In content filters the most important attribute is the words of the message. A word has different writing styles depending on time, whether it is singular or plural etc. . The English word people is the plural of person, the word “plays” is the plural of “play” and “found” is the past form of “find”. Therefore, while words are examined in the process of spam filtering, it is important to examine the word according to the spelling of simple meaning of it. Lingspam data set used the simple spelling of the words in its messages which it defined as lemmitization. Also, in this data set the most commonly used words in a language were removed from it because these words cannot differentiate spam and normal e-mails from each other and extend the operating time of algorithm as they are often inclued in messages.In order to avoid his mail to be filtered, Spam sender writes the words like “viagra” that may be an attribute in different ways as “viagr*a”, ”v1a1g1r1a”, ”v.iagra”, ”viagraaaa” and even “v i ag r a”. These writing styles reduce the chance of detecting spam mails.The original side of this study is that Soundex algorithm that is used for the correction of pronunciation in many studies was used to differentiate the different writing styles of words. In the second method, the acquisition of %98,6 correct identification results in the tests that were carried out by using DVM has shown the accuracy of the use of Soundex. Keywords: Spam, electronic mail, Support Vector Machine, Gauss Mixture Model, Soundex.
dc.description	Tez (Yüksek Lisans) - Süleyman Demirel Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Anabilim Dalı, 2014.
dc.description	Kaynakça var.
dc.description	In this thesis, two different filtering methods with content based in which Support Vector Machines, a supervised learning algorithm, which detect spam mails and Gaussian Mixture Models, a unsupervised learning algorith, are used were carried out . In methods the title and the body of e-mails were used as attributes and processing was applied to character strings which belong to the messages in order to get accurate attributes.In the study carried out with Turkish messages, expressions that are not letters and attached to the character string in its beginning and the end were removed from it, the characters except the first five ones were deleted from the character string, all letters were turned into lower case letters and the character string repeating less than three times was deleted from the candidate attribute set. With Mutual Information algorithm, 49 character strings that have the highest value were chosen as attributes.In the second method, Lingspam that is a special data set was used. In content filters the most important attribute is the words of the message. A word has different writing styles depending on time, whether it is singular or plural etc. . The English word people is the plural of person, the word “plays” is the plural of “play” and “found” is the past form of “find”. Therefore, while words are examined in the process of spam filtering, it is important to examine the word according to the spelling of simple meaning of it. Lingspam data set used the simple spelling of the words in its messages which it defined as lemmitization. Also, in this data set the most commonly used words in a language were removed from it because these words cannot differentiate spam and normal e-mails from each other and extend the operating time of algorithm as they are often inclued in messages.In order to avoid his mail to be filtered, Spam sender writes the words like “viagra” that may be an attribute in different ways as “viagr*a”, ”v1a1g1r1a”, ”v.iagra”, ”viagraaaa” and even “v i ag r a”. These writing styles reduce the chance of detecting spam mails.The original side of this study is that Soundex algorithm that is used for the correction of pronunciation in many studies was used to differentiate the different writing styles of words. In the second method, the acquisition of %98,6 correct identification results in the tests that were carried out by using DVM has shown the accuracy of the use of Soundex. Keywords: Spam, electronic mail, Support Vector Machine, Gauss Mixture Model, Soundex.
dc.language	tur
dc.publisher	Isparta : Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü,
dc.subject	Süleyman Demirel Üniversitesi
dc.title	Destek vektör makineleri ve gauss karışım modeli ile istenmeyen e-postaların tespiti = Support vector machine and gauss mixture model detection of unsolicited e-mails /
dc.type	text

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Fen Bilimleri Enstitüsü
Fen Bilimleri Enstitüsü koleksiyonlarını içerir.

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Destek vektör makineleri ve gauss karışım modeli ile istenmeyen e-postaların tespiti = Support vector machine and gauss mixture model detection of unsolicited e-mails /

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account