Computer Engineering / Bilgisayar Mühendisliği
Permanent URI for this collectionhttps://hdl.handle.net/11147/10
Browse
13 results
Search Results
Article Asking the Right Questions To Solve Algebraic Word Problems(TÜBİTAK - Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, 2022) Çelik, Ege Yiğit; Orulluoğlu, Zeynel; Mertoğlu, Rıdvan; Tekir, SelmaWord algebra problems are among challenging AI tasks as they combine natural language understanding with a formal equation system. Traditional approaches to the problem work with equation templates and frame the task as a template selection and number assignment to the selected template. The recent deep learning-based solutions exploit contextual language models like BERT and encode the natural language text to decode the corresponding equation system. The proposed approach is similar to the template-based methods as it works with a template and fills in the number slots. Nevertheless, it has contextual understanding because it adopts a question generation and answering pipeline to create tuples of numbers, to finally perform the number assignment task by custom sets of rules. The inspiring idea is that by asking the right questions and answering them using a state-of-the-art language model-based system, one can learn the correct values for the number slots in an equation system. The empirical results show that the proposed approach outperforms the other methods significantly on the word algebra benchmark dataset alg514 and performs the second best on the AI2 corpus for arithmetic word problems. It also has superior performance on the challenging SVAMP dataset. Though it is a rule-based system, simple rule sets and relatively slight differences between rules for different templates indicate that it is highly probable to develop a system that can learn the patterns for the collection of all possible templates, and produce the correct equations for an example instance.Article Citation - WoS: 1Citation - Scopus: 1Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content(World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, SelmaIn the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.Article Citation - WoS: 2Citation - Scopus: 2Incorporating Concreteness in Multi-Modal Language Models With Curriculum Learning(MDPI, 2021) Sezerer, Erhan; Tekir, SelmaOver the last few years, there has been an increase in the studies that consider experiential (visual) information by building multi-modal language models and representations. It is shown by several studies that language acquisition in humans starts with learning concrete concepts through images and then continues with learning abstract ideas through the text. In this work, the curriculum learning method is used to teach the model concrete/abstract concepts through images and their corresponding captions to accomplish multi-modal language modeling/representation. We use the BERT and Resnet-152 models on each modality and combine them using attentive pooling to perform pre-training on the newly constructed dataset, which is collected from the Wikimedia Commons based on concrete/abstract words. To show the performance of the proposed model, downstream tasks and ablation studies are performed. The contribution of this work is two-fold: A new dataset is constructed from Wikimedia Commons based on concrete/abstract words, and a new multi-modal pre-training approach based on curriculum learning is proposed. The results show that the proposed multi-modal pre-training approach contributes to the success of the model.Conference Object Citation - WoS: 2Citation - Scopus: 4Çok-etiketli Film Türü Sınıflandırması için Türkçe Konu Modellemesi Veri Kümesi(Institute of Electrical and Electronics Engineers, 2020) Jabrayilzade, Elgün; Poyraz Arslan, Algın; Para, Hasan; Polatbilek, Ozan; Sezerer, Erhan; Tekir, SelmaStatistical topic modeling aims to assign topics to documents in an unsupervised way. Latent Dirichlet Allocation (LDA) is the standard model for topic modeling. It shows good performance on document collections, documents being relatively long texts but it has poor performance on short texts. Topic modeling on short texts is on the rise due to the potential of social media. Thus, approaches that are able to nd topics on short texts as well as long texts are sought. However, there is a lack of datasets that include both long and short texts which have the same ground-truth categories. In this work, we release a Turkish movie dataset which contain both short lm descriptions and long subscripts where lm genre can be considered as topic. Furthermore, we provide multi-label movie genre classication results using a Feed Forward Neural Network (FFNN) taking LDA document-topic or Doc2Vec dense representations. © 2020 IEEE.Conference Object Doğal Dil Çıkarımı Modellerinde Bert Vektörlerinin Başarım Değerlendirmesi(Institute of Electrical and Electronics Engineers Inc., 2021) Oğul, İskender Ülgen; Tekir, SelmaDoğal dil çıkarımı, düşünce ifade eden cümlelerin arasındaki ilişkiyi; karşıtlık, gerekseme veya tarafsızlık olarak sınıflandırmayı hedefler. Sınıflandırma görevini gerçekleştirmek için metinsel kaynaklar, vektör ya da gömme olarak adlandırılan matematiksel gösterimlere dönüştürülür. Bu çalışmada, hem statik (Glove, OntoNotes5) hem de bağlamsal (BERT) kelime gömme yöntemleri kullanılmıştır. Fikirsel cümleler arasındaki mantıksal ilişkilerin sınıflandırılması zordur zira cümleler karmaşık gramer yapılarına sahiptir ve cümlelerin işlenerek mantıksal gösterimlere dönüştürülmesi geleneksel doğal dil işleme çözümleri ile yetersiz kalmaktadır. Bu çalışma, sınıflandırma görevini gerçekleştirmek için ayrıştırılabilir ilgi ve doğal dil çıkarımı için gelişmiş LSTM (ESIM) derin öğrenme modellerini kullanmıştır. En iyi sonuç olan %88 doğruluk değeri SNLI veri kümesi üzerinde ESIM-BERT ile elde edilmiştir.Article Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi(Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, SelmaMany text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.Article Citation - WoS: 9Citation - Scopus: 14Rule-Based Automatic Question Generation Using Semantic Role Labeling(Institute of Electronics, Information and Communication Engineers, 2019) Keklik, Onur; Tuğlular, Tuğkan; Tekir, SelmaThis paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.Conference Object Citation - Scopus: 1Türkçe Tweetler Üzerinden Yapay Sinir Ağları ile Cinsiyet Tahminlemesi(Institute of Electrical and Electronics Engineers Inc., 2019) Sezerer, Erhan; Polatbilek, Ozan; Tekir, SelmaYazar ayrımlaması, yazarı bilinmeyen bir metin üzerinden yazarına dair cinsiyet, yaş ve dil gibi bazı anahtar özniteliklerin belirlenmesidir. Özellikle güvenlik ve pazarlama alanında önem arz etmektedir. Bu çalışmada, kullanıcıların tweetleri kullanılarak cinsiyetleri tahminlenmektedir. Yinelemeli Sinir Ağı (YSA) ve ilgi mekanizmasının birleşiminden oluşan bir model önerilmiştir. Bildiğimiz kadarıyla bu çalışma Twitter veri kümesi ile Türkçe’de ilk defa yapılmıştır. Önerilen model Türkçe, İngilizce, İspanyolca ve Arapça dillerinde sınanmış ve sırasıyla 80.63, 81.73, 78.22, 78.5 doğruluk değerlerine ulaşılmıştır. Elde edilen doğruluk değerleri Türkçe’de en gelişkin, diğer dillerde ise rekabetçi bir başarım ortaya koymaktadır.Conference Object Citation - WoS: 1Citation - Scopus: 1A Relativistic Opinion Mining Approach To Detect Factual or Opinionated News Sources(Springer Verlag, 2017) Sezerer, Erhan; Tekir, SelmaThe credibility of news cannot be isolated from that of its source. Further, it is mainly associated with a news source’s trustworthiness and expertise. In an effort to measure the trustworthiness of a news source, the factor of “is factual or opinionated” must be considered among others. In this work, we propose an unsupervised probabilistic lexicon-based opinion mining approach to describe a news source as “being factual or opinionated”. We get words’ positive, negative, and objective scores from a sentiment lexicon and normalize these scores through the use of their cumulative distribution. The idea behind the use of such a statistical approach is inspired from the relativism that each word is evaluated with its difference from the average word. In order to test the effectiveness of the approach, three different news sources are chosen. They are editorials, New York Times articles, and Reuters articles, which differ in their characteristic of being opinionated. Thus, the experimental validation is done by the analysis of variance on these different groups of news. The results prove that our technique can distinguish the news articles from these groups with respect to “being factual or opinionated” in a statistically significant way.Conference Object Overt information operations during peacetime(Curran Associates, 2012) Tekir, SelmaInformation superiority is the most critical asset in war making. It directly addresses the perception of the opponent and in the long term the will of him to act. Sun Tzu's classical text states this fact by the concept of deception as the basis of all warfare. The success in warfare then is dependent on being aware of what's happening, accurately realizing the context. This is the intelligence function in broad terms and mostly open source intelligence as it provides the context. Competitive intelligence is based mainly on open sources and day by day the open source share in the intelligence product is increasing. Present diversified open sources & services represent a methodology shift in war. The two preceding ways have been overt physical acts against military targets in wartime and covert information operations conducted throughout peacetime against even nonmilitary targets respectively. The present methodology must be overt (open) information operations during peacetime. This coincides with a metaphor change as well. It proposes a transformation from a war metaphor into a game metaphor in which there are some playing rules. In fact, the existence of such rules helps in drawing the boundary of the field of competitive intelligence and thus making it a profession. Game metaphor is safer to adopt than war as it's easier to take responsibility in public disclosure scenarios in this case. By following this metaphor, you continue to stay in the boundary of legitimate competition. In other terms, you make a conscious preference in terms of war intensities by choosing to avoid the more intense war forms limited conflict, and actual warfare respectively. Finally, this preference is in accordance with the fundamental point of the Sun Tzu's entire argument: The vision of victory without fighting. To summarize, open source domination in the competitive intelligence lays the ground for the game metaphor that represents a transformation in warfare. The apparent outcome is overt information operations during peacetime. It emerges as the most important tool to fight against deception, thus success in information warfare in the contemporary world.
