Search Results

Now showing 1 - 3 of 3

Citation - WoS: 10
A Turkish Dataset for Gender Identification of Twitter Users
(Assoc Computational Linguistics-ACL, 2019) Tekir, Selma; Sezerer, Erhan; Tekir, Selma; 03.04. Department of Computer Engineering; 03. Faculty of Engineering; 01. Izmir Institute of Technology
Author profiling is the identification of an author's gender, age, and language from his/her texts. With the increasing trend of using Twitter as a means to express thought, profiling the gender of an author from his/her tweets has become a challenge. Although several datasets in different languages have been released on this problem, there is still a need for multilingualism. In this work, we propose a dataset of tweets of Turkish Twitter users which are labeled with their gender information. The dataset has 3368 users in the training set and 1924 users in the test set where each user has 100 tweets. The dataset is publicly available(1).
Citation - WoS: 2
Citation - Scopus: 4
Çok-etiketli Film Türü Sınıflandırması için Türkçe Konu Modellemesi Veri Kümesi
(Institute of Electrical and Electronics Engineers, 2020) Tekir, Selma; Sezerer, Erhan; Para, Hasan; Polatbilek, Ozan; Sezerer, Erhan; Tekir, Selma; 03.04. Department of Computer Engineering; 03. Faculty of Engineering; 01. Izmir Institute of Technology
Statistical topic modeling aims to assign topics to documents in an unsupervised way. Latent Dirichlet Allocation (LDA) is the standard model for topic modeling. It shows good performance on document collections, documents being relatively long texts but it has poor performance on short texts. Topic modeling on short texts is on the rise due to the potential of social media. Thus, approaches that are able to nd topics on short texts as well as long texts are sought. However, there is a lack of datasets that include both long and short texts which have the same ground-truth categories. In this work, we release a Turkish movie dataset which contain both short lm descriptions and long subscripts where lm genre can be considered as topic. Furthermore, we provide multi-label movie genre classication results using a Feed Forward Neural Network (FFNN) taking LDA document-topic or Doc2Vec dense representations. © 2020 IEEE.
Citation - Scopus: 1
Türkçe Tweetler Üzerinden Yapay Sinir Ağları ile Cinsiyet Tahminlemesi
(Institute of Electrical and Electronics Engineers Inc., 2019) Sezerer, Erhan; Tekir, Selma; Tekir, Selma; Sezerer, Erhan; 03.04. Department of Computer Engineering; 03. Faculty of Engineering; 01. Izmir Institute of Technology
Yazar ayrımlaması, yazarı bilinmeyen bir metin üzerinden yazarına dair cinsiyet, yaş ve dil gibi bazı anahtar özniteliklerin belirlenmesidir. Özellikle güvenlik ve pazarlama alanında önem arz etmektedir. Bu çalışmada, kullanıcıların tweetleri kullanılarak cinsiyetleri tahminlenmektedir. Yinelemeli Sinir Ağı (YSA) ve ilgi mekanizmasının birleşiminden oluşan bir model önerilmiştir. Bildiğimiz kadarıyla bu çalışma Twitter veri kümesi ile Türkçe’de ilk defa yapılmıştır. Önerilen model Türkçe, İngilizce, İspanyolca ve Arapça dillerinde sınanmış ve sırasıyla 80.63, 81.73, 78.22, 78.5 doğruluk değerlerine ulaşılmıştır. Elde edilen doğruluk değerleri Türkçe’de en gelişkin, diğer dillerde ise rekabetçi bir başarım ortaya koymaktadır.

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results