Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 10 of 24
  • Conference Object
    A News Chain Evaluation Methodology Along With a Lattice-Based Approach for News Chain Construction
    (Association for Computational Linguistics (ACL), 2017) Toprak, Mustafa; Özkahraman,Ö.; Tekir, Selma
    Chain construction is an important requirement for understanding news and establishing the context. A news chain can be defined as a coherent set of articles that explains an event or a story. There's a lack of well-established methods in this area. In this work, we propose a methodology to evaluate the "goodness" of a given news chain and implement a concept latticebased news chain construction method by Hossain et al. The methodology part is vital as it directly affects the growth of research in this area. Our proposed methodology consists of collected news chains from different studies and two "goodness" metrics, minedge and dispersion coefficient respectively. We assess the utility of the lattice-based news chain construction method by our proposed methodology. © EMNLP 2017.All right reserved.
  • Research Project
    Haber Zincirlerinde Tutarlılık ve Güvenilirlik Değerlendirmesi
    (2017) Tekir, Selma
    Çok hızlı ve büyük miktarda haber akısının oldugu günümüzde haber analizi büyük bir gereksinimdir. Haberi takip etmek, dogrulugunu denetlemek, yorumlamak özellikle kurumsal bazda çok önemlidir. Bunun yapılabilmesi bir bilgi isleyis döngüsünün çalıstırılması ile mümkündür. Bilginin toplanması, hedefler dogrultusunda islenip analiz edilerek ise yarar bilgiye dönüstürülmesi beklenmektedir. Projenin amacı haber güvenilirliginin ölçülüp degerlendirilmesine yönelik bir yaklasım gelistirmektir. Haber güvenilirligi haberalma faaliyetinin olmazsa olmazları arasındadır. Kurumsallasmıs medya kurulusları (BBC, The New York Times vb.) hâlihazırda çok büyük miktarda yapısal veri sunmaktadır. Haberi dogrulama, kaynak geçerligini denetleme gereksinimi had safhadadır. Projede bir haber zinciri üzerinde güvenilirlik ölçümü ve degerlendirmesi yapılacaktır. Projede haber zinciri, noktaları birlestirelim (connecting the dots) yaklasımı ile karsılanmaktadır. Noktaları birlestirelim yaklasımı, haber zincirini biri baslangıç digeri bitis noktası olarak tanımlanabilecek iki haber belgesini tutarlı bir sekilde birlestiren haber belgeleri dizisi olarak tanımlamaktadır. Güvenilirlik degerlendirmesi ele alınacak haber zincirinin tutarlılık degerlendirmesi ile birlikte gerçeklestirilecektir. Haber güvenilirligi; haberin dogrulugu, fikirlerden ziyade somut gerçeklere dayanmasıdır. Haber kaynagının güvenilirligi ise iki temel boyutta ele alınmaktadır: Haber kaynagına duyulan güven ve kaynagın o konudaki uzmanlıgı [21]. Haber güvenilirliginin ölçümünde ?Gerçegi fikirden ayırıyor mu? ve ?Fikirlere mi gerçeklere mi dayanmakta? faktörleri baz alınacaktır. Fikir madenciligi kullanılarak gerçekler fikirlerden ayırdedilmeye çalısılacaktır. Haber zincirini olusturan belgelerin gerçegi fikirden ayırıp ayırmadıkları, belge yapısındaki gerçek/fikir tümcelerinin organizasyonu irdelenecektir. Belgelerdeki gerçek/fikir yapılanmasına ek olarak gerçek/fikir oranı da tespit edilip haber zincirini olusturan dokümanların bu açıdan bütünsellik tasıyıp tasımadıgı sınanacaktır. Ayrıca ardısık dokümanlar arasındaki geçisin gerçek/fikir bilgisine dayalı bir degerlendirme mekanizması kurulacaktır. Güvenilirlik kavramının ölçümüne iliskin benzer bir yaklasım mevcut olmadıgından proje oldukça yenilikçidir. Haber zincirinin güvenilirlik degerlendirmesi tutarlılık degerlendirmesinden bagımsız degildir. Haber zinciri baglamında tutarlılıgı karsılamak üzere yöntemler mevcuttur ancak yeni arayıslara gereksinim vardır. Proje kapsamında tutarlı haber zinciri elde etmeyi saglayacak yeni bir yöntem gelistirilecektir. Gelistirilecek yöntem tutarlı haber zincirinin kafes (lattice) yapısı ile iyi temsil edilecegi sezgisine dayanmaktadır. Kafes yapısının dügümleri, haber belgelerinde geçen sözcükler ve bu sözcüklerin geçtigi haber belgeleri ikilileri ile temsil edilecektir. Tutarlı haber zincirlerine ait kafeslerin tam kafes niteligine sahip olması beklenmektedir. Zaki ve Ramakrishnan [17]?ın kapalı tanım kümesi kafesi (closed description set lattice) olusturma algoritması bu amaçla kullanılarak iyi haber zincirleri elde edilip edilmeyecegi sınanacaktır. Önerilen yöntem güvenilirlik degerlendirmesi yapılacak haber zincirlerini üretmesi açısından önemli ve aynı ölçüde özgündür. Proje tamamlandıgında elde edilecek çıktıların bilgi kesfi ve veri madenciligi alanında bilimsel katkı sunması beklenmektedir. Önerilen tekniklerin olgunlasması bunların yeni teknolojilerde kullanılmasını mümkün hale getirecektir. Ayrıca haberalma toplumun sosyo-ekonomik yapısında etkili olan bir islevdir. Özellikle sosyal medya bu alanı yeniden sekillendirmektedir. Bu alanda, alınan haberlerin dogru sekilde islenmesi ve haber güvenilirligi konusunda farkındalıgın artırılması büyük önem arz etmektedir.
  • Article
    Gender Bias in Occupation Classification From the New York Times Obituaries
    (Dokuz Eylül Üniversitesi, 2022) Atik, Ceren; Tekir, Selma
    Technological developments such as artificial intelligence can strengthen social prejudices prevailing in society, regardless of the developer's intention. Therefore, researchers should be aware of the ethical issues that may arise from a developed product/solution. In this study, we investigate the effect of gender bias on occupational classification. For this purpose, a new dataset was created by collecting obituaries from the New York Times website and is provided in two different versions: With and without gender indicators. Category distributions from this dataset show that gender and occupation variables have dependence. Thus, gender affects occupation classification. To test the effect, we perform occupation classification using SVM (Support Vector Machine), HAN (Hierarchical Attention Network), and DistilBERT-based classifiers. Moreover, to get further insights into the relationship of gender and occupation in classification problems, a multi-tasking model in which occupation and gender are learned together is evaluated. Experimental results reveal that there is a gender bias in job classification.
  • Article
    Asking the Right Questions To Solve Algebraic Word Problems
    (TÜBİTAK - Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, 2022) Çelik, Ege Yiğit; Orulluoğlu, Zeynel; Mertoğlu, Rıdvan; Tekir, Selma
    Word algebra problems are among challenging AI tasks as they combine natural language understanding with a formal equation system. Traditional approaches to the problem work with equation templates and frame the task as a template selection and number assignment to the selected template. The recent deep learning-based solutions exploit contextual language models like BERT and encode the natural language text to decode the corresponding equation system. The proposed approach is similar to the template-based methods as it works with a template and fills in the number slots. Nevertheless, it has contextual understanding because it adopts a question generation and answering pipeline to create tuples of numbers, to finally perform the number assignment task by custom sets of rules. The inspiring idea is that by asking the right questions and answering them using a state-of-the-art language model-based system, one can learn the correct values for the number slots in an equation system. The empirical results show that the proposed approach outperforms the other methods significantly on the word algebra benchmark dataset alg514 and performs the second best on the AI2 corpus for arithmetic word problems. It also has superior performance on the challenging SVAMP dataset. Though it is a rule-based system, simple rule sets and relatively slight differences between rules for different templates indicate that it is highly probable to develop a system that can learn the patterns for the collection of all possible templates, and produce the correct equations for an example instance.
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content
    (World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, Selma
    In the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.
  • Article
    Sales History-Based Demand Prediction Using Generalized Linear Models
    (Süleyman Demirel Üniversitesi, 2019) Özenboy, Başar; Tekir, Selma
    It’s vital for commercial enterprises to accurately predict demand by utilizing the existing sales data. Such predictive analytics is a crucial part of their decision support systems to increase the profitability of the company.In predictive data analytics, the branch of regression modeling is used to predict a numerical response variable like sale amount. In this category, linear models are simple and easy to interpret yet they permit generalization to very powerful and flexible families of models which are called Generalized linear models (GLM). The generalization potential over simple linear regression can be explained twofold: First, GLM relax the assumption of normally distributed error terms. Moreover, the relationship of the set of predictor variables and the response variable could be represented by a set of link functions rather than the sole choice of the identity function. This work models the sales amount prediction problem through the use of GLM. Unique company sales data are explored and the response variable, sale amount is fitted to the Gamma distribution. Then, inverse link function, which is the canonical one in the case of gamma-distributed response variable is used. The experimental results are compared with the other regression models and the classification algorithms. The model selection is performed via the use of MSE and AIC metrics respectively. The results show that GLM is better than the linear regression. As for the classification algorithms, Random Forest and GLM are the top performers. Moreover, categorization on the predictor variables improves model fitting results significantly.
  • Article
    Citation - WoS: 2
    Citation - Scopus: 2
    Incorporating Concreteness in Multi-Modal Language Models With Curriculum Learning
    (MDPI, 2021) Sezerer, Erhan; Tekir, Selma
    Over the last few years, there has been an increase in the studies that consider experiential (visual) information by building multi-modal language models and representations. It is shown by several studies that language acquisition in humans starts with learning concrete concepts through images and then continues with learning abstract ideas through the text. In this work, the curriculum learning method is used to teach the model concrete/abstract concepts through images and their corresponding captions to accomplish multi-modal language modeling/representation. We use the BERT and Resnet-152 models on each modality and combine them using attentive pooling to perform pre-training on the newly constructed dataset, which is collected from the Wikimedia Commons based on concrete/abstract words. To show the performance of the proposed model, downstream tasks and ablation studies are performed. The contribution of this work is two-fold: A new dataset is constructed from Wikimedia Commons based on concrete/abstract words, and a new multi-modal pre-training approach based on curriculum learning is proposed. The results show that the proposed multi-modal pre-training approach contributes to the success of the model.
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 4
    Çok-etiketli Film Türü Sınıflandırması için Türkçe Konu Modellemesi Veri Kümesi
    (Institute of Electrical and Electronics Engineers, 2020) Jabrayilzade, Elgün; Poyraz Arslan, Algın; Para, Hasan; Polatbilek, Ozan; Sezerer, Erhan; Tekir, Selma
    Statistical topic modeling aims to assign topics to documents in an unsupervised way. Latent Dirichlet Allocation (LDA) is the standard model for topic modeling. It shows good performance on document collections, documents being relatively long texts but it has poor performance on short texts. Topic modeling on short texts is on the rise due to the potential of social media. Thus, approaches that are able to nd topics on short texts as well as long texts are sought. However, there is a lack of datasets that include both long and short texts which have the same ground-truth categories. In this work, we release a Turkish movie dataset which contain both short lm descriptions and long subscripts where lm genre can be considered as topic. Furthermore, we provide multi-label movie genre classication results using a Feed Forward Neural Network (FFNN) taking LDA document-topic or Doc2Vec dense representations. © 2020 IEEE.
  • Conference Object
    Doğal Dil Çıkarımı Modellerinde Bert Vektörlerinin Başarım Değerlendirmesi
    (Institute of Electrical and Electronics Engineers Inc., 2021) Oğul, İskender Ülgen; Tekir, Selma
    Doğal dil çıkarımı, düşünce ifade eden cümlelerin arasındaki ilişkiyi; karşıtlık, gerekseme veya tarafsızlık olarak sınıflandırmayı hedefler. Sınıflandırma görevini gerçekleştirmek için metinsel kaynaklar, vektör ya da gömme olarak adlandırılan matematiksel gösterimlere dönüştürülür. Bu çalışmada, hem statik (Glove, OntoNotes5) hem de bağlamsal (BERT) kelime gömme yöntemleri kullanılmıştır. Fikirsel cümleler arasındaki mantıksal ilişkilerin sınıflandırılması zordur zira cümleler karmaşık gramer yapılarına sahiptir ve cümlelerin işlenerek mantıksal gösterimlere dönüştürülmesi geleneksel doğal dil işleme çözümleri ile yetersiz kalmaktadır. Bu çalışma, sınıflandırma görevini gerçekleştirmek için ayrıştırılabilir ilgi ve doğal dil çıkarımı için gelişmiş LSTM (ESIM) derin öğrenme modellerini kullanmıştır. En iyi sonuç olan %88 doğruluk değeri SNLI veri kümesi üzerinde ESIM-BERT ile elde edilmiştir.
  • Article
    Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi
    (Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, Selma
    Many text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.