CLUDS: SOSYAL MEDYA ANALİZİ İÇİN  ETİKETLİ VE ETİKETSİZ VERİLERİ LOJİSTİK REGRESYON İLE BİRLEŞTİRME

ALTINEL, Ayşe Berna; MARMARA ÜNİVERSİTESİ

DSpace Home
→
SDÜ Yayınları
→
Akademik Dergiler
→
Mühendislik Bilimleri ve Tasarım Dergisi
→
View Item

dc.creator	ALTINEL, Ayşe Berna; MARMARA ÜNİVERSİTESİ
dc.date	2021-12-20T00:00:00Z
dc.date.accessioned	2022-05-10T10:56:43Z
dc.date.available	2022-05-10T10:56:43Z
dc.identifier	https://dergipark.org.tr/tr/pub/jesd/issue/66319/780002
dc.identifier	10.21923/jesd.780002
dc.identifier.uri	http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/96122
dc.description	Otomatik metin sınıflandırması ve duygu polarite tespiti, sosyal medya analizinin iki önemli araştırma problemidir. Kelimelerin anlamları o kadar önemlidir ki, doğru bir sınıflandırma performansına ulaşmak için bir belge sınıflandırma algoritması tarafından yakalanmaları gerekir. Metin sınıflandırmasıyla ilgili bir diğer önemli konu, etiketlenmiş verilerin azlığıdır. Bu çalışmada, yeni bir yarı denetimli metodoloji sunulmuştur. Etiketli ve Etiketlenmemiş Verilerin Anlamsal Terim Değerleri (CLUDS) ile Birleştirilmesi olarak adlandırılır. CLUDS şu adımlara sahiptir: ön işleme, örnek etiketleme, etiketli ve etiketlenmemiş verileri birleştirme ve tahmin. Ön işleme adımında Latent Dirichlet Allocation (LDA) algoritması kullanılmaktadır. Örnek etiketleme adımında Lojistik Regresyon uygulanır. CLUDS'ta, alaka değerleri hesaplaması, metin sınıflandırma alanında denetimli bir terim ağırlıklandırma yöntemi olarak uygulanmıştır. Literatüre göre, CLUDS, Destek Vektör Makineleri (SVM) için yarı denetimli bir semantik çekirdekte hem alaka düzeyi hem de ağırlık hesaplamasını kullanan ilk girişimdir. Bu çalışmada, Sprinkled-CLUDS ve Adaptive-Sprinkled-CLUDS da uygulanmıştır. Değerlendirilen deney sonuçları CLUDS, Sprinkled-CLUDS ve Adaptive-Sprinkled-CLUDS'ın test setlerinde temel algoritmalara göre değerli bir performans kazancı sağladığını göstermektedir.
dc.description	Automatic text classification and sentiment polarity detection are two important research problems of social media analysis. The meanings of the words are so important that they need to be captured by a document classification algorithm to reach an accurate classification performance. Another important issue with the text classification is the scarcity of labeled data. In this study, Combining Labeled and Unlabeled Data with Semantic Values of Terms (CLUDS) is presented. CLUDS has the following steps: preprocessing, instance labeling, combining labeled and unlabeled data, and prediction. In preprocessing step Latent Dirichlet Allocation (LDA) algorithm is used. In instance labeling step Logistic Regression is applied. In CLUDS, relevance values computation has been applied as a supervised term weighting methodology in the text classification field. Still, according to the literature, CLUDS is the first attempt that uses both relevance and weighting calculation in a semi-supervised semantic kernel for Support Vector Machines (SVM). In this study, Sprinkled-CLUDS and Adaptive-Sprinkled-CLUDS have also been implemented. Evaluated experimental results show that CLUDS, Sprinkled-CLUDS and Adaptive-Sprinkled-CLUDS generate a valuable performance gain over the baseline algorithms on test sets.
dc.format	application/pdf
dc.language	en
dc.publisher	Süleyman Demirel Üniversitesi
dc.publisher	Süleyman Demirel University
dc.relation	https://dergipark.org.tr/tr/download/article-file/1238564
dc.source	Volume: 9, Issue: 4 1048-1061	en-US
dc.source	1308-6693
dc.source	Mühendislik Bilimleri ve Tasarım Dergisi
dc.subject	Tweet Sınıflandırması,Gizli Dirichlet Analizi,Lojistik Regresyon,Sosyal Medya Analizi,Duygu Polarite Tespiti
dc.subject	Tweet Classification,Latent Dirichlet Allocation,Logistic Regression,Social Media Analysis,Sentiment Polarity Detection
dc.title	CLUDS: SOSYAL MEDYA ANALİZİ İÇİN ETİKETLİ VE ETİKETSİZ VERİLERİ LOJİSTİK REGRESYON İLE BİRLEŞTİRME	tr-TR
dc.title	CLUDS: COMBINING LABELED AND UNLABELED DATA WITH LOGISTIC REGRESSION FOR SOCIAL MEDIA ANALYSIS	en-US
dc.type	info:eu-repo/semantics/article
dc.citation	Ahmed, I., Ali, R., Guan, D., Lee, Y., Lee, S., Chung, T. 2015. Semi-Supervised Learning Using Frequent Itemset and Ensemble Learning for SMS Classification. Expert Systems with Applications, 42(3), 1065-1073.
dc.citation	Akın, A. A., & Akın, M. D., 2007. Zemberek, an open source nlp framework for Turkish languages. Structure, 10, 1-5.
dc.citation	Alsmadi, I., & Hoon, G. K., 2019. Term weighting scheme for short-text classification: Twitter corpuses. Neural Computing and Applications, 31(8), 3819-3831.
dc.citation	Altınel, B., Diri, B., Ganiz, M.C., 2015. A Novel Semantic Smoothing Kernel for Text Classification with Class-based Weighting. Knowledge-Based Systems, 89(1), 265-277.
dc.citation	Altınel, B., Ganiz, M. C., 2018. Semantic Text Classification: A Survey of Past and Recent Advances. Information Processing & Management, 54(6), 1129-1153.
dc.citation	Amasyalı, M. F., Beken, A. Türkçe Kelimelerin Anlamsal Benzerliklerinin Ölçülmesi ve Metin Siniflandirmada Kullanilmasi, In Proceedings of IEEE Sinyal İşleme ve İletişim Uygulamalari Kurultayi (SIU), 2009.
dc.citation	Amor, B. R. , Vuik, S. I. , Callahan, R. , Darzi, A. , Yaliraki, S. N. , & Barahona, M., 2016. Community detection and role identification in directed networks: Understand- ing the twitter network of the care. data debate. In Dynamic networks and cyber.
dc.citation	Asiaee T, A., Tepper, M., Banerjee, A., & Sapiro, G., 2012. If you are happy and you know it... tweet. In Proceedings of the 21st ACM international conference on Information and knowledge management, 1602-1606.
dc.citation	Bai, X., Padman, R., Airoldi, E., 2004. Sentiment Extraction From Unstructured Text Using Tabu Search-Enhanced Markov Blanket. Carnegie Mellon University, School of Computer Science [Institute for Software Research International].
dc.citation	Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H. Greedy Layer-Wise Training of Deep Networks, 2007. Advances in Neural Information Processing Systems, 19(1), 153-160.
dc.citation	Biricik, G., Diri, B., Sönmez, A. C., 2009. A New Method for Attribute Extraction with Application on Text Classification, Soft Computing. Computing with Words and Perceptions in System Analysis, Decision and Control (ICSCCW), Fifth IEEE International Conference 2009, 1-4.
dc.citation	Biricik, G., Diri, B., Sönmez, A. C., 2012. Abstract Feature Extraction for Text Classification. Turkish Journal of Electrical Engineering & Computer Sciences, 2012, 20(1), 1137-1159.
dc.citation	Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
dc.citation	Bloehdorn, S., Moschitti, A., 2007. Combined Syntactic and Semantic Kernels for Text Classification, Springer, 307-318.
dc.citation	Bordes, A., Glorot, X., Weston, J., Bengio, Y., 2012. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing. In Proceedings of International Conference on Artificial Intelligence and Statistics, 127–135.
dc.citation	Blum, A. and Mitchell, T., 1998. Semi-Supervised Learning Literature Survey, In Proceedings of Conf. on Computational Learning Theory, 92-100.
dc.citation	Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S. Sprinkling: Supervised Latent Semantic Indexing. In European Conference on Information Retrieval 2006, 510-514. Springer Berlin Heidelberg.
dc.citation	Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S. N., Harper, D. J. Supervised Latent Semantic Indexing Using Adaptive Sprinkling. In Proceedings of International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2007, 7(1), 1582-1587.
dc.citation	Chapelle, O. and Zien, A., 2005. Semi-Supervised Classification by Low Density Separation, In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 57-64.
dc.citation	Chalothom, T., & Ellman, J., 2015. Simple approaches of sentiment analysis via ensemble learning. In information science and applications (pp. 631-639). Springer, Berlin, Heidelberg.
dc.citation	Chen, J., Huang, H., Tian, S., Qu, Y., 2009. Feature Selection for Text Classification with Naïve Bayes. Expert Systems with Applications, 36(3), 5432-5435.
dc.citation	Cho, Y. , Hwang, J. , & Lee, D., 2012. Identification of effective opinion leaders in the diffusion of technological innovation: A social network approach. Technological Forecasting and Social Change, 79 (1), 97–106.
dc.citation	Dahl, G., Ranzato, M., Mohamed, A-R., Hinton, GE., 2010. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. In: Advances in Neural Information Processing Systems. Curran Associates, 469–477.
dc.citation	Dahl, G., Yu, D., Deng, L., Acero, A., 2012. Context-Dependent Pre-trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions of Audio Speech Language Processing, 20(1), 30–42.
dc.citation	Denecke, K., 2008. Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th International Conference on Data Engineering Workshop, 507-512. IEEE.
dc.citation	Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A., 2014. The rise of social bots. arXiv preprint arXiv: 1407.5225.
dc.citation	Fung, B.C.M., 2003. Hierarchical Document Clustering Using Frequent Itemsets, In Proceedings of International Conference on Data Mining, 59-70.
dc.citation	Graham, S., Weingart, S., & Milligan, I., 2012. Getting started with topic modeling and MALLET. The Editorial Board of the Programming Historian.
dc.citation	Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B., 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition, IEEE Signal Processing Magazine, 29(6), 82-97.
dc.citation	Hinton, G., Osindero, S., Teh, Y-W., 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7):1527–1554.
dc.citation	Hu, X., Tang, J., & Liu, H., 2014a. Online social spammer detection. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
dc.citation	Hu, X., Tang, J., Gao, H., & Liu, H., 2014b. Social Spammer Detection with Sentiment Information. In 2014 IEEE International Conference on Data Mining (pp. 180-189). IEEE.
dc.citation	Hu, Y., Yi, Y., Yang, T., & Pan, Q., 2018. Short Text Classification with Convolutional Neural Networks Based Method. In 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV) (pp. 1432-1435). IEEE.
dc.citation	Injadat, M., Salo, F., & Nassif, A. B., 2016. Data mining techniques in social media: A survey. Neurocomputing, 214, 654-670.
dc.citation	Kalchbrenner, N., Grefenstette, E. and Blunsom, P., 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
dc.citation	Kamber, I.H., Frank, E. Data Mining: Practical Machine Learning Tools And Techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
dc.citation	Kempe, D., Kleinberg, J., & Tardos, É., 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth acm sigkdd international conference on knowledge discovery and data mining (pp. 137–146). ACM.
dc.citation	Khan, F. H., Qamar, U., & Bashir, S., 2016. SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140-153.
dc.citation	Koehler, M., Greenhalgh, S., & Zellner, A., 2015. Potential Applications of Sentiment Analysis in Educational Research and PracticeIs SITE the Friendliest Conference?. In Society for Information Technology & Teacher Education International Conference (pp. 1348-1354). Association for the Advancement of Computing in Education (AACE).
dc.citation	Krizhevsky A., Sutskever, I., Hinton, G., 2012. Imagenet Classification with Deep Convolutional Neural Networks.In: Advances in Neural Information Processing Systems. Curran Associates, 25(1), 1106–1114.
dc.citation	Lan, M., Tan, C. L., Su, J., Lu, Y. 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721-735.
dc.citation	Liu YY, Yang M, Ramsay M, Li XS, Coid JW (2011) A comparison of logistic regression, classification and regression tree, and neural networks models in predicting violent re-offending. J Quant Criminol 27(4):547–553.
dc.citation	Luo, L., Yang, Y., Chen, Z., & Wei, Y., 2018. Identifying opinion leaders with improved weighted LeaderRank in online learning communities. International Journal of Performability Engineering, 14(2), 193-201.
dc.citation	Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., and Khudanpur, S., 2011. Recurrent Neural Network Based Language Model, In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 045–1048.
dc.citation	Mishne, G. and Glance, NS, 2006. Predicting movie sales from blogger sentiment,” in AAAI 2006 Spring Symposium on Computational Approaches to Analyzing Weblogs.
dc.citation	Moore, A. Support Vector Machines, Tutorial slides, http://www.cs.cmu.edu/~awm, 2003.
dc.citation	Muslea, I., Minton, S., Knoblock, C.A., 2002. Active Semi-Supervised Learning In Robust Multi-View Learning. In Proceedings of the Nineteenth International Conference on Machine Learning.
dc.citation	Nakagawa, T. Inui, K. and Kurohashi, S., 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 786–794. Association for Computational Linguistics.
dc.citation	Nigam, K., McCallum, A. K., Thrun, S., Mitchell, T., 2000. Text Classification From Labeled And Unlabeled Documents Using EM, Machine Learning, 39(2/3), 103-134.
dc.citation	Nigam, K., Ghani, R., 2000b. Analyzing the Effectiveness and Applicability of Co-Training. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management, Washington, DC, 86–93.
dc.citation	Pang, B., Lee, L., & Vaithyanathan, S., 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
dc.citation	Papka, R., Allan, J., 1998. Document Classification Using Multiword Features, In Proceedings of the Seventh International Conference on Information and Knowledge Management Table of Contents, Bethesda, Maryland, United States, 124–131.
dc.citation	Peng, F., Schuurmans, D., 2003. Combining Naive Bayes and n-Gram Language Models for Text Classification. In European Conference on Information Retrieval, 335-350. Springer Berlin Heidelberg.
dc.citation	Peng, Q., & Zhong, M., 2014. Detecting Spam Review through Sentiment Analysis. JSW, 9(8), 2065-2072.
dc.citation	Razon, A. R., Barnden, J. A., 2015. A New Approach to Automated Text Readability Classification based on Concept Indexing with Integrated Part-of-Speech n-Gram Features. Recent Advances in Natural Language Processing, 521-528.
dc.citation	Reborto, D. S., C., 2012 Kernel Functions for Machine Learning Applications, http://crsouza.com.
dc.citation	Rosenberg, C. et al., 2005. Semi-Supervised Self-Training of Object Detection Models, In Proc. 7th Workshop on Applications of Computer Vision, (1), 29-36.
dc.citation	Salah, Z., Al-Ghuwairi, A. R. F., Baarah, A., Aloqaily, A., Qadoumi, B. A., Alhayek, M., & Alhijawi, B., 2019. A systematic review on opinion mining and sentiment analysis in social media. International Journal of Business Information Systems, 31(4), 530-554.
dc.citation	Seide, F., Li, G., Yu, D., 2011. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. In Proceedings of International Symposium on Computer Architecture, 437–440.
dc.citation	Shinnou, H., Xiao, L., Sasaki, M., Komiya, K., 2015. Hybrid Method of Semi-supervised Learning and Feature Weighted Learning for Domain Adaptation of Document Classification, In Proceeding of the 29th Pacific Asia Conference on Language, Information and Computation, 496-503.
dc.citation	Silva, J., Coheur, L. Mendes, A.C. and Wichert, A., 2011. From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review, 35(2):137–154.
dc.citation	Song, G., Ye, Y., Du, X., Huang, X., Bie, S., 2014. Short Text Classification: A survey, Journal of Multimedia, 9/5, 635-643.
dc.citation	Ucan, A., Naderalvojoud, B., Akcapinar Sezer, E. and Sever, H., 2016. SentiWordNet for New Language: Automatic Translation Approach. 12th International Conference on Signal-Image Technology & Internet-Based Systems.
dc.citation	Uysal, A. K., Gunal, S., 2014. Text Classification Using Genetic Algorithm Oriented Latent Semantic Features. Expert Systems with Applications, 41(13), 5938-5947.
dc.citation	Van Eck, P. S., Jager, W., & Leeflang, P. S., 2011. Opinion leaders’ role in innovation diffusion: A simulation study. Journal of Product Innovation Management, 28(2), 187-203.
dc.citation	Wang, P., Xu, B., Xu, J., Tian, G., Liu, C. L., & Hao, H., 2016. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806-814.
dc.citation	Wang, S. and Manning, C. ,2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 90–94. Association for Computational Linguistics.
dc.citation	Yardi, S., Romero, D., & Schoenebeck, G., 2009. Detecting spam in a twitter network. First Monday, 15(1).
dc.citation	Yarowsky, D., 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196.
dc.citation	Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M. R., & King, I., 2018. Topic memory networks for short text classification. arXiv preprint arXiv:1809.03664.
dc.citation	Zhao, Y. , Li, S. , & Jin, F., 2016a. Identification of influential nodes in social net- works with community structure based on label propagation. Neurocomputing, 210, 34–44.
dc.citation	Zhao, Q. , Erdogdu, M. A. , He, H. Y. , Rajaraman, A. , & Leskovec, J., 2015. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data min.
dc.citation	Zhou, X., Zhang, X., Hu, X., 2008. Semantic Smoothing for Bayesian Text Classification with Small Training Data. In Proceedings of International Conference on Data Mining, 289-300.
dc.citation	Zhu, X. J., 2005. Semi-supervised Learning Literature Survey, Technical Report, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI.

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Mühendislik Bilimleri ve Tasarım Dergisi
Mühendislik Bilimleri ve Tasarım Dergisini içerir.

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

CLUDS: SOSYAL MEDYA ANALİZİ İÇİN ETİKETLİ VE ETİKETSİZ VERİLERİ LOJİSTİK REGRESYON İLE BİRLEŞTİRME

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account