Computer Engineering / Bilgisayar Mühendisliği
Permanent URI for this collectionhttps://hdl.handle.net/11147/10
Browse
4 results
Search Results
Conference Object Citation - WoS: 2Citation - Scopus: 4Çok-etiketli Film Türü Sınıflandırması için Türkçe Konu Modellemesi Veri Kümesi(Institute of Electrical and Electronics Engineers, 2020) Jabrayilzade, Elgün; Poyraz Arslan, Algın; Para, Hasan; Polatbilek, Ozan; Sezerer, Erhan; Tekir, SelmaStatistical topic modeling aims to assign topics to documents in an unsupervised way. Latent Dirichlet Allocation (LDA) is the standard model for topic modeling. It shows good performance on document collections, documents being relatively long texts but it has poor performance on short texts. Topic modeling on short texts is on the rise due to the potential of social media. Thus, approaches that are able to nd topics on short texts as well as long texts are sought. However, there is a lack of datasets that include both long and short texts which have the same ground-truth categories. In this work, we release a Turkish movie dataset which contain both short lm descriptions and long subscripts where lm genre can be considered as topic. Furthermore, we provide multi-label movie genre classication results using a Feed Forward Neural Network (FFNN) taking LDA document-topic or Doc2Vec dense representations. © 2020 IEEE.Conference Object Citation - Scopus: 1Türkçe Tweetler Üzerinden Yapay Sinir Ağları ile Cinsiyet Tahminlemesi(Institute of Electrical and Electronics Engineers Inc., 2019) Sezerer, Erhan; Polatbilek, Ozan; Tekir, SelmaYazar ayrımlaması, yazarı bilinmeyen bir metin üzerinden yazarına dair cinsiyet, yaş ve dil gibi bazı anahtar özniteliklerin belirlenmesidir. Özellikle güvenlik ve pazarlama alanında önem arz etmektedir. Bu çalışmada, kullanıcıların tweetleri kullanılarak cinsiyetleri tahminlenmektedir. Yinelemeli Sinir Ağı (YSA) ve ilgi mekanizmasının birleşiminden oluşan bir model önerilmiştir. Bildiğimiz kadarıyla bu çalışma Twitter veri kümesi ile Türkçe’de ilk defa yapılmıştır. Önerilen model Türkçe, İngilizce, İspanyolca ve Arapça dillerinde sınanmış ve sırasıyla 80.63, 81.73, 78.22, 78.5 doğruluk değerlerine ulaşılmıştır. Elde edilen doğruluk değerleri Türkçe’de en gelişkin, diğer dillerde ise rekabetçi bir başarım ortaya koymaktadır.Article Gender Prediction From Tweets: Improving Neural Representations With Hand-Crafted Features(Cornell University, 2019) Tekir, Selma; Sezerer, Erhan; Polatbilek, OzanAuthor profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn ’where to look’. This model1 is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic.Conference Object Citation - Scopus: 6Gender Prediction From Tweets With Convolutional Neural Networks: Notebook for Pan at Clef 2018(CEUR Workshop Proceedings, 2018) Sezerer, Erhan; Polatbilek, Ozan; Sevgili, Özge; Tekir, SelmaThis paper presents a system1 developed for the author profiling task of PAN at CLEF 2018. The system utilizes style-based features to predict the gender information from the given tweets of each user. These features are automatically extracted by Convolutional Neural Networks (CNN). The system mainly depends on the idea that the informativeness of each tweet is not the same in terms of the gender of a user. Thus, the attention mechanism is included to the CNN outputs in order to discriminate the tweets carrying more information. Our architecture was able to obtain competitive results on three languages provided by the PAN 2018 author profiling challenge with an average accuracy of 75.1% on local runs and 70.23% on the submission run.
