Master Degree / Yüksek Lisans Tezleri

Permanent URI for this collectionhttps://hdl.handle.net/11147/3008

Browse

Search Results

Now showing 1 - 10 of 20

Enrichment of Turkish Question Answering Systems Using Knowledge Graphs
(01. Izmir Institute of Technology, 2023) Çiftçi, Okan; Tekir, Selma; Soygazi, Fatih
In the era of digital communication, the ability to effectively process and interpret human language has become a key research area. Natural Language Processing (NLP) has emerged as a field that enables machines to better understand and analyze human language. One of the most important applications of NLP is the development of question answering systems, which are essential in various domains such as customer service, search engines, and chatbots. To answer incoming queries, question answering systems rely on knowledge graphs as a reliable source. This thesis proposes a Turkish Question Answering (TRQA) system that utilizes a knowledge graph. The research focuses on the automatic construction of a knowledge graph specific to the film industry, as well as the creation of a multi-hop question-answering dataset that can be queried from this graph. Building upon these constructions, we develop a deep learning based method for answering questions using the constructed knowledge graph. The constructed knowledge graph is compared with various knowledge graphs presented in the literature using DistMult, ComplEx and SimplE methods for the link prediction task. Additionally, the proposed question answering system is compared with the baseline study and compared with a generative large language model through quantitative and qualitative analyses.
Reproducibility Assessment of Research Code Repositories
(01. Izmir Institute of Technology, 2023) Akdeniz, Eyüp Kaan; Tekir, Selma
The growth in machine learning research has not been accompanied by a corresponding improvement in the reproducibility of the results. This thesis presents a novel, fully-automated end-to-end system that evaluates the reproducibility of machine learning studies based on the content of the associated GitHub project's Readme file. This evaluation relies on a readme template derived from an analysis of popular repositories. The template suggests a structure that promotes reproducibility. Our system generates a reproducibility score for each Readme file assessed, and it employs two distinct models, one based on section classification and the other on hierarchical transformers. The experimental outcomes indicate that the system based on section similarity outperforms the hierarchical transformer model. Furthermore, it has a superior edge concerning explainability, as it allows for a direct correlation of the scores with the respective sections of the Readme files. The proposed framework provides an important tool for improving the quality of code sharing and ultimately helps to increase reproducibility in machine learning research.
Recognition of Counterfactual Statements in Turkish
(01. Izmir Institute of Technology, 2023) Acar, Ali; Tekir, Selma
Counterfactual statements describe an event that did not happen or cannot happen, and optionally the consequence of this event if it would happen. Counterfactual statements are the building blocks of human thought processes as people constantly reflect upon past happenings and consider their future implications. Counterfactual reasoning is essential for machine intelligence and explainable artificial intelligence studies. Detecting counterfactuals automatically with machine learning algorithms is very crucial for these areas. This thesis presents the development of the first-ever Turkish counterfactual detection dataset. It presents a comprehensive classification baseline and expands the scope of counterfactual detection to include the Turkish language.
Automatic Quote Detection From Literary Work
(01. Izmir Institute of Technology, 2022) Güzel Altıntaş, Aybüke; Tekir, Selma
Literature inspires readers, and readers tend to share quotes from a literary work. The reader underlines the quotes in the book and shares them on social media, or on an online platform used by book readers. The definition of a quote is a span in a written text that is interesting for many readers and readers can use the quote in different contexts. In this study, a novel task in the field of Natural Language Processing is proposed: the Quote Detection Task. Also, an original dataset was formed from the Goodreads and Gutenberg websites with web scraping. Quotes are Goodreads data sourced from Kaggle and data that has been voted by 10 or more users are selected. These quotes have been validated with the books on the Project Gutenberg website. The final dataset consists of 4554 rows. The dataset contains quotes with their book spans. The span of a quote consists of the previous 10 sentences of the quote, the quote itself, and the following 10 sentences of the quote. Conditional Random Field (CRF) and Extractive Summarization as Text Matching (MatchSum) were run as two different baselines for quote detection. The Quote Detection Task is span detection that can be modeled with sequence labeling solutions and Neural extractive summarization systems in the literature. For this sequence tagging problem, the statistics-based CRF was run as first baseline. Extractive Summarization as Text Matching baseline is the second baseline chosen for the experimental part. Rouge-1 scores of 27.24% and 40.54%, respectively, were obtained from these baselines.
Classification of Contradictory Opinions in Text Using Deep Learning Methods
(01. Izmir Institute of Technology, 2020) Oğul, İskender Ülgen; Tekir, Selma
Natural language inference (NLI) problem aims to ensure consistency as well as accuracy of propositions while making sense of natural language. Natural language inference aims to classify the relationship between two given sentences as contradiction, entailment or neutrality. To accomplish the classification task, sentences or words must be translated into mathematical representations called vectors or embedding. Vectorization of a sentence is as important as the complexity of the classification model. In this study, both pre-trained (Glove, Fasttext, Word2Vec) and contextual word embedding methods (BERT) were used for comparison and acquire the best result. One of the natural language processing tasks NLI, is highly complex and requires solutions. Conventional machine learning methods are insufficient to carry out natural language processing solutions. Therefore, more advanced solutions are required. This study used deep learning methods to perform the classification task. Unlike conventional machine learning approaches, deep learning approaches reduce errors while increasing accuracy by repeating the data many times. Opinion sentences have complex grammatical structures that are difficult to classify. This study used Decomposable Attention and Enhanced LSTM for natural language inference to perform NLI classification task. Using the advanced LSTM deep learning method and Bert contextual vectors for natural language extraction on the SNLI dataset, an accuracy result 88.0% very close state of the art result 92.1% was obtained. In order to show the usability of the developed solution in different NLI tasks, an accuracy of 80.02% was obtained in the studies performed on the MNLI data set.
A Language Modeling Approach To Detect Bias
(Izmir Institute of Technology, 2020) Atik, Ceren; Tekir, Selma
Technology is developing day by day and is involved in every area of our lives. Technological innovations such as artificial intelligence can strengthen social biases that already exist in society, regardless of the developers' intentions. Therefore, researchers should be aware of this ethical issue. In this thesis, the effect of gender bias, which is one of the social biases, on occupation classification is investigated. For this, a new dataset was created by collecting obituaries from the New York Times website and they were handled in two different versions, with and without gender indicators. Since occupation and gender are independent variables, gender indicators should not have an impact on the occupation prediction of models. In this context, in order to investigate gender bias on occupation estimation, a model in which occupation and gender are learned together is evaluated as well as models that make only occupation classification are evaluated. The results obtained from models state that gender bias has a role in classification occupation.
Enriching Contextual Word Embeddings With Character Information
(Izmir Institute of Technology, 2020) Polatbilek, Ozan; Tekir, Selma
Natural Language Processing has become more and more popular with the recent advances in Artificial Intelligence. Fundamental improvements have been introduced in word representations to store semantic and/or syntactic features. With the recently published language model BERT, contextual word vectors could be generated. This model do not process character level information. In morphologically rich languages such as Turkish, this model's perception of syntax could be improved. In this thesis, a new model, called BERT-ELMo, which is a combination of BERT and ELMo, is proposed to enrich BERT with character level information. This model combines character level processing part of ELMo and contextual word representation part of the BERT model. To show the effectiveness of the proposed model, both quantitative (question answering) and qualitative (word analogy, word contextualization, morphological meaning, out of vocabulary word capturing) analyses are performed and it is compared with BERT on Turkish language. Thanks to character level addition, proposed model is able get trained in any language without any pre-analysis.To the best of our knowledge, this is the first study which uses morphological analysis to train the BERT model in Turkish, and the first model to integrate a character level module to BERT.
Automatic Story Construction From News Articles in an Online Fashion
(Izmir Institute of Technology, 2019) Can, Özgür; Tekir, Selma
Every day, thousands of local and global news become online. Each arriving news piece can have a connection with some previous information, but in a large-scale news flow, it is quite difficult for readers to integrate news and evaluate the agenda in the light of past. Thus, grouping news in a coherent way to construct news stories is a fundamental requirement. To meet this requirement, first of all meaningful representation of documents on which the clustering is performed must be extracted, and the new story clusters must be generated on the fly in an online fashion. In this work, we analyze the complex relations of the news articles, and propose a system to generate continuously updated news stories in online fashion. As part of the experimental validation, we provide a step by step construction of a meaningful news story out of news articles that are coming from different sources. The constructed news stories demonstrate the usefulness of the developed system.
Identifying Communities Using Collaboration and Word Association Networks in Turkish Social Media
(Izmir Institute of Technology, 2018) Atay, Abdullah Asil; Tekir, Selma
Social media contents are always very attractive title for researchers. Scores of people use social media and share their ideas with pictures, videos or documents. Researchers analyze this information and they try to deduce beneficial data. A lot of researchers think that analyzing social media information is a very important research area. There are a lot of social media platforms which have Turkish contents. We can give an example Ekşisözlük which have Turkish contents and popular social media platform in Turkey. Within scope of the thesis, Ekşisözlük contents downloaded, decomposed and used actively. Social media consists of human or human made products and sharing contents have some similarities. In this thesis, to calculate similarities, some methods are used. Scope of the thesis, two different networks are created from same content which are word association network and collaboration network. Word association network is a network that created by coexistence of words in specific window size. Collaboration network is a network that created by entered content to same title with different users. This information gives the similarity of users. These two networks are analyzed separately and deduced some information.
Automatic Question Generation Using Natural Language Processing Techniques
(Izmir Institute of Technology, 2018) Keklik, Onur; Tuğlular, Tuğkan; Tekir, Selma
This thesis proposes a new rule based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. The design and implementation of the proposed approach are also explained in detail. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. With respect to human evaluations, the designed system significantly outperforms all other systems and generated the most natural (human-like) questions.

Master Degree / Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results