DSpace Repository

CLUDS: SOSYAL MEDYA ANALİZİ İÇİN ETİKETLİ VE ETİKETSİZ VERİLERİ LOJİSTİK REGRESYON İLE BİRLEŞTİRME

Show simple item record

dc.creator ALTINEL, Ayşe Berna; MARMARA ÜNİVERSİTESİ
dc.date 2021-12-20T00:00:00Z
dc.date.accessioned 2022-05-10T10:56:43Z
dc.date.available 2022-05-10T10:56:43Z
dc.identifier https://dergipark.org.tr/tr/pub/jesd/issue/66319/780002
dc.identifier 10.21923/jesd.780002
dc.identifier.uri http://acikerisim.sdu.edu.tr/xmlui/handle/123456789/96122
dc.description Otomatik metin sınıflandırması ve duygu polarite tespiti, sosyal medya analizinin iki önemli araştırma problemidir. Kelimelerin anlamları o kadar önemlidir ki, doğru bir sınıflandırma performansına ulaşmak için bir belge sınıflandırma algoritması tarafından yakalanmaları gerekir. Metin sınıflandırmasıyla ilgili bir diğer önemli konu, etiketlenmiş verilerin azlığıdır. Bu çalışmada, yeni bir yarı denetimli metodoloji sunulmuştur. Etiketli ve Etiketlenmemiş Verilerin Anlamsal Terim Değerleri (CLUDS) ile Birleştirilmesi olarak adlandırılır. CLUDS şu adımlara sahiptir: ön işleme, örnek etiketleme, etiketli ve etiketlenmemiş verileri birleştirme ve tahmin. Ön işleme adımında Latent Dirichlet Allocation (LDA) algoritması kullanılmaktadır. Örnek etiketleme adımında Lojistik Regresyon uygulanır. CLUDS'ta, alaka değerleri hesaplaması, metin sınıflandırma alanında denetimli bir terim ağırlıklandırma yöntemi olarak uygulanmıştır. Literatüre göre, CLUDS, Destek Vektör Makineleri (SVM) için yarı denetimli bir semantik çekirdekte hem alaka düzeyi hem de ağırlık hesaplamasını kullanan ilk girişimdir. Bu çalışmada, Sprinkled-CLUDS ve Adaptive-Sprinkled-CLUDS da uygulanmıştır. Değerlendirilen deney sonuçları CLUDS, Sprinkled-CLUDS ve Adaptive-Sprinkled-CLUDS'ın test setlerinde temel algoritmalara göre değerli bir performans kazancı sağladığını göstermektedir.
dc.description Automatic text classification and sentiment polarity detection are two important research problems of social media analysis. The meanings of the words are so important that they need to be captured by a document classification algorithm to reach an accurate classification performance. Another important issue with the text classification is the scarcity of labeled data. In this study, Combining Labeled and Unlabeled Data with Semantic Values of Terms (CLUDS) is presented. CLUDS has the following steps: preprocessing, instance labeling, combining labeled and unlabeled data, and prediction. In preprocessing step Latent Dirichlet Allocation (LDA) algorithm is used. In instance labeling step Logistic Regression is applied. In CLUDS, relevance values computation has been applied as a supervised term weighting methodology in the text classification field. Still, according to the literature, CLUDS is the first attempt that uses both relevance and weighting calculation in a semi-supervised semantic kernel for Support Vector Machines (SVM). In this study, Sprinkled-CLUDS and Adaptive-Sprinkled-CLUDS have also been implemented. Evaluated experimental results show that CLUDS, Sprinkled-CLUDS and Adaptive-Sprinkled-CLUDS generate a valuable performance gain over the baseline algorithms on test sets.
dc.format application/pdf
dc.language en
dc.publisher Süleyman Demirel Üniversitesi
dc.publisher Süleyman Demirel University
dc.relation https://dergipark.org.tr/tr/download/article-file/1238564
dc.source Volume: 9, Issue: 4 1048-1061 en-US
dc.source 1308-6693
dc.source Mühendislik Bilimleri ve Tasarım Dergisi
dc.subject Tweet Sınıflandırması,Gizli Dirichlet Analizi,Lojistik Regresyon,Sosyal Medya Analizi,Duygu Polarite Tespiti
dc.subject Tweet Classification,Latent Dirichlet Allocation,Logistic Regression,Social Media Analysis,Sentiment Polarity Detection
dc.title CLUDS: SOSYAL MEDYA ANALİZİ İÇİN ETİKETLİ VE ETİKETSİZ VERİLERİ LOJİSTİK REGRESYON İLE BİRLEŞTİRME tr-TR
dc.title CLUDS: COMBINING LABELED AND UNLABELED DATA WITH LOGISTIC REGRESSION FOR SOCIAL MEDIA ANALYSIS en-US
dc.type info:eu-repo/semantics/article
dc.citation Ahmed, I., Ali, R., Guan, D., Lee, Y., Lee, S., Chung, T. 2015. Semi-Supervised Learning Using Frequent Itemset and Ensemble Learning for SMS Classification. Expert Systems with Applications, 42(3), 1065-1073.
dc.citation Akın, A. A., & Akın, M. D., 2007. Zemberek, an open source nlp framework for Turkish languages. Structure, 10, 1-5.
dc.citation Alsmadi, I., & Hoon, G. K., 2019. Term weighting scheme for short-text classification: Twitter corpuses. Neural Computing and Applications, 31(8), 3819-3831.
dc.citation Altınel, B., Diri, B., Ganiz, M.C., 2015. A Novel Semantic Smoothing Kernel for Text Classification with Class-based Weighting. Knowledge-Based Systems, 89(1), 265-277.
dc.citation Altınel, B., Ganiz, M. C., 2018. Semantic Text Classification: A Survey of Past and Recent Advances. Information Processing & Management, 54(6), 1129-1153.
dc.citation Amasyalı, M. F., Beken, A. Türkçe Kelimelerin Anlamsal Benzerliklerinin Ölçülmesi ve Metin Siniflandirmada Kullanilmasi, In Proceedings of IEEE Sinyal İşleme ve İletişim Uygulamalari Kurultayi (SIU), 2009.
dc.citation Amor, B. R. , Vuik, S. I. , Callahan, R. , Darzi, A. , Yaliraki, S. N. , & Barahona, M., 2016. Community detection and role identification in directed networks: Understand- ing the twitter network of the care. data debate. In Dynamic networks and cyber.
dc.citation Asiaee T, A., Tepper, M., Banerjee, A., & Sapiro, G., 2012. If you are happy and you know it... tweet. In Proceedings of the 21st ACM international conference on Information and knowledge management, 1602-1606.
dc.citation Bai, X., Padman, R., Airoldi, E., 2004. Sentiment Extraction From Unstructured Text Using Tabu Search-Enhanced Markov Blanket. Carnegie Mellon University, School of Computer Science [Institute for Software Research International].
dc.citation Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H. Greedy Layer-Wise Training of Deep Networks, 2007. Advances in Neural Information Processing Systems, 19(1), 153-160.
dc.citation Biricik, G., Diri, B., Sönmez, A. C., 2009. A New Method for Attribute Extraction with Application on Text Classification, Soft Computing. Computing with Words and Perceptions in System Analysis, Decision and Control (ICSCCW), Fifth IEEE International Conference 2009, 1-4.
dc.citation Biricik, G., Diri, B., Sönmez, A. C., 2012. Abstract Feature Extraction for Text Classification. Turkish Journal of Electrical Engineering & Computer Sciences, 2012, 20(1), 1137-1159.
dc.citation Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
dc.citation Bloehdorn, S., Moschitti, A., 2007. Combined Syntactic and Semantic Kernels for Text Classification, Springer, 307-318.
dc.citation Bordes, A., Glorot, X., Weston, J., Bengio, Y., 2012. Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing. In Proceedings of International Conference on Artificial Intelligence and Statistics, 127–135.
dc.citation Blum, A. and Mitchell, T., 1998. Semi-Supervised Learning Literature Survey, In Proceedings of Conf. on Computational Learning Theory, 92-100.
dc.citation Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S. Sprinkling: Supervised Latent Semantic Indexing. In European Conference on Information Retrieval 2006, 510-514. Springer Berlin Heidelberg.
dc.citation Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S. N., Harper, D. J. Supervised Latent Semantic Indexing Using Adaptive Sprinkling. In Proceedings of International Joint Conferences on Artificial Intelligence Organization (IJCAI), 2007, 7(1), 1582-1587.
dc.citation Chapelle, O. and Zien, A., 2005. Semi-Supervised Classification by Low Density Separation, In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 57-64.
dc.citation Chalothom, T., & Ellman, J., 2015. Simple approaches of sentiment analysis via ensemble learning. In information science and applications (pp. 631-639). Springer, Berlin, Heidelberg.
dc.citation Chen, J., Huang, H., Tian, S., Qu, Y., 2009. Feature Selection for Text Classification with Naïve Bayes. Expert Systems with Applications, 36(3), 5432-5435.
dc.citation Cho, Y. , Hwang, J. , & Lee, D., 2012. Identification of effective opinion leaders in the diffusion of technological innovation: A social network approach. Technological Forecasting and Social Change, 79 (1), 97–106.
dc.citation Dahl, G., Ranzato, M., Mohamed, A-R., Hinton, GE., 2010. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. In: Advances in Neural Information Processing Systems. Curran Associates, 469–477.
dc.citation Dahl, G., Yu, D., Deng, L., Acero, A., 2012. Context-Dependent Pre-trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions of Audio Speech Language Processing, 20(1), 30–42.
dc.citation Denecke, K., 2008. Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th International Conference on Data Engineering Workshop, 507-512. IEEE.
dc.citation Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A., 2014. The rise of social bots. arXiv preprint arXiv: 1407.5225.
dc.citation Fung, B.C.M., 2003. Hierarchical Document Clustering Using Frequent Itemsets, In Proceedings of International Conference on Data Mining, 59-70.
dc.citation Graham, S., Weingart, S., & Milligan, I., 2012. Getting started with topic modeling and MALLET. The Editorial Board of the Programming Historian.
dc.citation Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B., 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition, IEEE Signal Processing Magazine, 29(6), 82-97.
dc.citation Hinton, G., Osindero, S., Teh, Y-W., 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18(7):1527–1554.
dc.citation Hu, X., Tang, J., & Liu, H., 2014a. Online social spammer detection. In Twenty-Eighth AAAI Conference on Artificial Intelligence.
dc.citation Hu, X., Tang, J., Gao, H., & Liu, H., 2014b. Social Spammer Detection with Sentiment Information. In 2014 IEEE International Conference on Data Mining (pp. 180-189). IEEE.
dc.citation Hu, Y., Yi, Y., Yang, T., & Pan, Q., 2018. Short Text Classification with Convolutional Neural Networks Based Method. In 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV) (pp. 1432-1435). IEEE.
dc.citation Injadat, M., Salo, F., & Nassif, A. B., 2016. Data mining techniques in social media: A survey. Neurocomputing, 214, 654-670.
dc.citation Kalchbrenner, N., Grefenstette, E. and Blunsom, P., 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
dc.citation Kamber, I.H., Frank, E. Data Mining: Practical Machine Learning Tools And Techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
dc.citation Kempe, D., Kleinberg, J., & Tardos, É., 2003. Maximizing the spread of influence through a social network. In Proceedings of the ninth acm sigkdd international conference on knowledge discovery and data mining (pp. 137–146). ACM.
dc.citation Khan, F. H., Qamar, U., & Bashir, S., 2016. SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140-153.
dc.citation Koehler, M., Greenhalgh, S., & Zellner, A., 2015. Potential Applications of Sentiment Analysis in Educational Research and PracticeIs SITE the Friendliest Conference?. In Society for Information Technology & Teacher Education International Conference (pp. 1348-1354). Association for the Advancement of Computing in Education (AACE).
dc.citation Krizhevsky A., Sutskever, I., Hinton, G., 2012. Imagenet Classification with Deep Convolutional Neural Networks.In: Advances in Neural Information Processing Systems. Curran Associates, 25(1), 1106–1114.
dc.citation Lan, M., Tan, C. L., Su, J., Lu, Y. 2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721-735.
dc.citation Liu YY, Yang M, Ramsay M, Li XS, Coid JW (2011) A comparison of logistic regression, classification and regression tree, and neural networks models in predicting violent re-offending. J Quant Criminol 27(4):547–553.
dc.citation Luo, L., Yang, Y., Chen, Z., & Wei, Y., 2018. Identifying opinion leaders with improved weighted LeaderRank in online learning communities. International Journal of Performability Engineering, 14(2), 193-201.
dc.citation Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., and Khudanpur, S., 2011. Recurrent Neural Network Based Language Model, In Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 045–1048.
dc.citation Mishne, G. and Glance, NS, 2006. Predicting movie sales from blogger sentiment,” in AAAI 2006 Spring Symposium on Computational Approaches to Analyzing Weblogs.
dc.citation Moore, A. Support Vector Machines, Tutorial slides, http://www.cs.cmu.edu/~awm, 2003.
dc.citation Muslea, I., Minton, S., Knoblock, C.A., 2002. Active Semi-Supervised Learning In Robust Multi-View Learning. In Proceedings of the Nineteenth International Conference on Machine Learning.
dc.citation Nakagawa, T. Inui, K. and Kurohashi, S., 2010. Dependency tree-based sentiment classification using CRFs with hidden variables. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 786–794. Association for Computational Linguistics.
dc.citation Nigam, K., McCallum, A. K., Thrun, S., Mitchell, T., 2000. Text Classification From Labeled And Unlabeled Documents Using EM, Machine Learning, 39(2/3), 103-134.
dc.citation Nigam, K., Ghani, R., 2000b. Analyzing the Effectiveness and Applicability of Co-Training. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management, Washington, DC, 86–93.
dc.citation Pang, B., Lee, L., & Vaithyanathan, S., 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics.
dc.citation Papka, R., Allan, J., 1998. Document Classification Using Multiword Features, In Proceedings of the Seventh International Conference on Information and Knowledge Management Table of Contents, Bethesda, Maryland, United States, 124–131.
dc.citation Peng, F., Schuurmans, D., 2003. Combining Naive Bayes and n-Gram Language Models for Text Classification. In European Conference on Information Retrieval, 335-350. Springer Berlin Heidelberg.
dc.citation Peng, Q., & Zhong, M., 2014. Detecting Spam Review through Sentiment Analysis. JSW, 9(8), 2065-2072.
dc.citation Razon, A. R., Barnden, J. A., 2015. A New Approach to Automated Text Readability Classification based on Concept Indexing with Integrated Part-of-Speech n-Gram Features. Recent Advances in Natural Language Processing, 521-528.
dc.citation Reborto, D. S., C., 2012 Kernel Functions for Machine Learning Applications, http://crsouza.com.
dc.citation Rosenberg, C. et al., 2005. Semi-Supervised Self-Training of Object Detection Models, In Proc. 7th Workshop on Applications of Computer Vision, (1), 29-36.
dc.citation Salah, Z., Al-Ghuwairi, A. R. F., Baarah, A., Aloqaily, A., Qadoumi, B. A., Alhayek, M., & Alhijawi, B., 2019. A systematic review on opinion mining and sentiment analysis in social media. International Journal of Business Information Systems, 31(4), 530-554.
dc.citation Seide, F., Li, G., Yu, D., 2011. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks. In Proceedings of International Symposium on Computer Architecture, 437–440.
dc.citation Shinnou, H., Xiao, L., Sasaki, M., Komiya, K., 2015. Hybrid Method of Semi-supervised Learning and Feature Weighted Learning for Domain Adaptation of Document Classification, In Proceeding of the 29th Pacific Asia Conference on Language, Information and Computation, 496-503.
dc.citation Silva, J., Coheur, L. Mendes, A.C. and Wichert, A., 2011. From symbolic to sub-symbolic information in question classification. Artificial Intelligence Review, 35(2):137–154.
dc.citation Song, G., Ye, Y., Du, X., Huang, X., Bie, S., 2014. Short Text Classification: A survey, Journal of Multimedia, 9/5, 635-643.
dc.citation Ucan, A., Naderalvojoud, B., Akcapinar Sezer, E. and Sever, H., 2016. SentiWordNet for New Language: Automatic Translation Approach. 12th International Conference on Signal-Image Technology & Internet-Based Systems.
dc.citation Uysal, A. K., Gunal, S., 2014. Text Classification Using Genetic Algorithm Oriented Latent Semantic Features. Expert Systems with Applications, 41(13), 5938-5947.
dc.citation Van Eck, P. S., Jager, W., & Leeflang, P. S., 2011. Opinion leaders’ role in innovation diffusion: A simulation study. Journal of Product Innovation Management, 28(2), 187-203.
dc.citation Wang, P., Xu, B., Xu, J., Tian, G., Liu, C. L., & Hao, H., 2016. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing, 174, 806-814.
dc.citation Wang, S. and Manning, C. ,2012. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pages 90–94. Association for Computational Linguistics.
dc.citation Yardi, S., Romero, D., & Schoenebeck, G., 2009. Detecting spam in a twitter network. First Monday, 15(1).
dc.citation Yarowsky, D., 1995. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196.
dc.citation Zeng, J., Li, J., Song, Y., Gao, C., Lyu, M. R., & King, I., 2018. Topic memory networks for short text classification. arXiv preprint arXiv:1809.03664.
dc.citation Zhao, Y. , Li, S. , & Jin, F., 2016a. Identification of influential nodes in social net- works with community structure based on label propagation. Neurocomputing, 210, 34–44.
dc.citation Zhao, Q. , Erdogdu, M. A. , He, H. Y. , Rajaraman, A. , & Leskovec, J., 2015. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data min.
dc.citation Zhou, X., Zhang, X., Hu, X., 2008. Semantic Smoothing for Bayesian Text Classification with Small Training Data. In Proceedings of International Conference on Data Mining, 289-300.
dc.citation Zhu, X. J., 2005. Semi-supervised Learning Literature Survey, Technical Report, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI.


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account