Search Results

Now showing 1 - 4 of 4

Improvements on a Multi-Task Bert Model
(Ieee, 2024) Agrali, Mahmut; Tekir, Selma
Pre-trained language models have introduced significant performance boosts in natural language processing. Fine-tuning of these models using downstream tasks' supervised data further improves the acquired results. In the fine-tuning process, combining the learning of tasks is an effective approach. This paper proposes a multi-task learning framework based on BERT. To accomplish the tasks of sentiment analysis, paraphrase detection, and semantic text similarity, we include linear layers, a Siamese network with cosine similarity, and convolutional layers to the appropriate places in the architecture. We conducted an ablation study using Stanford Sentiment Treebank (SST), Quora, and SemEval STS datasets for each task to test the framework and its components' effectiveness. The results demonstrate that the proposed multi-task framework improves the performance of BERT. The best results obtained for sentiment analysis, paraphrase detection, and semantic text similarity are accuracies of 0.534 and 0.697 and a Pearson correlation coefficient of 0.345.
Citation - WoS: 2
Citation - Scopus: 4
Çok-etiketli Film Türü Sınıflandırması için Türkçe Konu Modellemesi Veri Kümesi
(Institute of Electrical and Electronics Engineers, 2020) Jabrayilzade, Elgün; Poyraz Arslan, Algın; Para, Hasan; Polatbilek, Ozan; Sezerer, Erhan; Tekir, Selma
Statistical topic modeling aims to assign topics to documents in an unsupervised way. Latent Dirichlet Allocation (LDA) is the standard model for topic modeling. It shows good performance on document collections, documents being relatively long texts but it has poor performance on short texts. Topic modeling on short texts is on the rise due to the potential of social media. Thus, approaches that are able to nd topics on short texts as well as long texts are sought. However, there is a lack of datasets that include both long and short texts which have the same ground-truth categories. In this work, we release a Turkish movie dataset which contain both short lm descriptions and long subscripts where lm genre can be considered as topic. Furthermore, we provide multi-label movie genre classication results using a Feed Forward Neural Network (FFNN) taking LDA document-topic or Doc2Vec dense representations. © 2020 IEEE.
Doğal Dil Çıkarımı Modellerinde Bert Vektörlerinin Başarım Değerlendirmesi
(Institute of Electrical and Electronics Engineers Inc., 2021) Oğul, İskender Ülgen; Tekir, Selma
Doğal dil çıkarımı, düşünce ifade eden cümlelerin arasındaki ilişkiyi; karşıtlık, gerekseme veya tarafsızlık olarak sınıflandırmayı hedefler. Sınıflandırma görevini gerçekleştirmek için metinsel kaynaklar, vektör ya da gömme olarak adlandırılan matematiksel gösterimlere dönüştürülür. Bu çalışmada, hem statik (Glove, OntoNotes5) hem de bağlamsal (BERT) kelime gömme yöntemleri kullanılmıştır. Fikirsel cümleler arasındaki mantıksal ilişkilerin sınıflandırılması zordur zira cümleler karmaşık gramer yapılarına sahiptir ve cümlelerin işlenerek mantıksal gösterimlere dönüştürülmesi geleneksel doğal dil işleme çözümleri ile yetersiz kalmaktadır. Bu çalışma, sınıflandırma görevini gerçekleştirmek için ayrıştırılabilir ilgi ve doğal dil çıkarımı için gelişmiş LSTM (ESIM) derin öğrenme modellerini kullanmıştır. En iyi sonuç olan %88 doğruluk değeri SNLI veri kümesi üzerinde ESIM-BERT ile elde edilmiştir.
Citation - Scopus: 1
Türkçe Tweetler Üzerinden Yapay Sinir Ağları ile Cinsiyet Tahminlemesi
(Institute of Electrical and Electronics Engineers Inc., 2019) Sezerer, Erhan; Polatbilek, Ozan; Tekir, Selma
Yazar ayrımlaması, yazarı bilinmeyen bir metin üzerinden yazarına dair cinsiyet, yaş ve dil gibi bazı anahtar özniteliklerin belirlenmesidir. Özellikle güvenlik ve pazarlama alanında önem arz etmektedir. Bu çalışmada, kullanıcıların tweetleri kullanılarak cinsiyetleri tahminlenmektedir. Yinelemeli Sinir Ağı (YSA) ve ilgi mekanizmasının birleşiminden oluşan bir model önerilmiştir. Bildiğimiz kadarıyla bu çalışma Twitter veri kümesi ile Türkçe’de ilk defa yapılmıştır. Önerilen model Türkçe, İngilizce, İspanyolca ve Arapça dillerinde sınanmış ve sırasıyla 80.63, 81.73, 78.22, 78.5 doğruluk değerlerine ulaşılmıştır. Elde edilen doğruluk değerleri Türkçe’de en gelişkin, diğer dillerde ise rekabetçi bir başarım ortaya koymaktadır.

WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results