BÜYÜK VERİDE METİN BENZERLİK ALGORİTMALARININ VERİ EŞLEME PERFORMANSLARININ KARŞILAŞTIRILMASI

AKSOY, Bekir; ısparta uygulamalı bilimler üniversitesi; UĞUZ, Sinan; ısparta uygulamalı bilimler üniversitesi; ORAL, Okan; AKDENIZ UNIVERSITY

DSpace Home
→
SDÜ Yayınları
→
Akademik Dergiler
→
Mühendislik Bilimleri ve Tasarım Dergisi
→
View Item

dc.creator	AKSOY, Bekir; ısparta uygulamalı bilimler üniversitesi
dc.creator	UĞUZ, Sinan; ısparta uygulamalı bilimler üniversitesi
dc.creator	ORAL, Okan; AKDENIZ UNIVERSITY
dc.date	2019-09-15T00:00:00Z
dc.date.accessioned	2020-01-02T08:28:01Z
dc.date.available	2020-01-02T08:28:01Z
dc.identifier	https://dergipark.org.tr/tr/pub/jesd/issue/48672/467036
dc.identifier	10.21923/jesd.467036
dc.identifier.uri	http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/49196
dc.description	Son yıllarda dünya turizmindeki büyük hareketlilik, busektörün büyük verinin çalışma alanları arasına girmesini sağlamıştır. Buçalışmada farklı sağlayıcılardan gelen otel bilgilerinin, veritabanlarınafarklı isim ve adreslerle girilmesi sonucu oluşan problemler için, büyük verive string similarity algoritmaları (SSA) kullanarak bir çözüm önerisi ortayakonulmuştur. Bunun için geniş bir otel ağına sahip bir turizm acentasınınLondra’da bulunan 2599 oteli örneklem olarak seçilmiş ve bu oteller ile yetmişfarklı sağlayıcıdan gelen yaklaşık üç milyon otel bilgisinin eşleştirilmesiiçin, soundex algoritmasından faydalanılarak Map-Reduce işlemigerçekleştirilmiştir. Map-Reduce ile eşleme işlem sayısı ve işlem süresindeönemli ölçüde azalma sağlanmıştır. Çalışmanın diğer aşamasında ise Dicecoefficient, Levenshtein ve Longest common subsequence (LCS) algoritmaları,doğru eşleyebildikleri veri ve işlem süresi açısından kıyaslanmıştır. Bu aşamadaalgoritmalar uygulanmadan önce veri tabanında algoritmaların skorunu düşürenkelimeler tespit edilerek çıkartılmıştır. Doğru eşleme bakımından Dicecoefficient algoritması, işlem süresi açısından ise Levenshtein algoritmasıdaha iyi sonuçlar üretmiştir.
dc.description	The great mobility in the world tourism in recent years hasalso enabled this sector to be included among the study areas of big data. Inthis study, a solution proposal was put forward by using the big data andstring similarity algorithms (SSA) for the problems arising from the entry ofthe hotel data coming from different providers into databases with differentnames and addresses. Therefore, 2599 hotels of a tourism agency with a widehotel network located in London were selected as the sample, and the Map-Reduceprocess was performed by using the Soundex algorithm to match these hotels withapproximately three million hotel data coming from seventy different providers.Matching with Map-Reduce ensured a significant reduction in process count andprocess time. Furthermore, the Dice coefficient, Levenshtein and Longest commonsubsequence (LCS) algorithms were compared in terms of the data that theycorrectly matched, and process time. In this stage, the words decreasing thescore of the algorithms in the database were detected and removed before thealgorithms were implemented. The Dice coefficient algorithm yielded betterresults in terms of correct matching, and the Levenshtein algorithm yieldedbetter results in terms of process time.
dc.format	application/pdf
dc.language	en
dc.publisher	Süleyman Demirel University
dc.publisher	Süleyman Demirel Üniversitesi
dc.relation	https://dergipark.org.tr/tr/download/article-file/805989
dc.source	Volume: 7, Issue: 3 608-618	en-US
dc.source	1308-6693
dc.subject	Algoritmalar,Metin analizi,Doğal dil işleme,Veri analizi,Veri tabanları
dc.subject	Algorithms,Text Analysis,Natural Language processing,Data Analysis,Databases
dc.title	BÜYÜK VERİDE METİN BENZERLİK ALGORİTMALARININ VERİ EŞLEME PERFORMANSLARININ KARŞILAŞTIRILMASI	tr-TR
dc.title	COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA	en-US
dc.type	info:eu-repo/semantics/article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Mühendislik Bilimleri ve Tasarım Dergisi
Mühendislik Bilimleri ve Tasarım Dergisini içerir.

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

BÜYÜK VERİDE METİN BENZERLİK ALGORİTMALARININ VERİ EŞLEME PERFORMANSLARININ KARŞILAŞTIRILMASI

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account