Master Degree / Yüksek Lisans Tezleri
Permanent URI for this collectionhttps://hdl.handle.net/11147/3008
Browse
4 results
Search Results
Master Thesis Enrichment of Turkish Question Answering Systems Using Knowledge Graphs(01. Izmir Institute of Technology, 2023) Çiftçi, Okan; Tekir, Selma; Soygazi, FatihIn the era of digital communication, the ability to effectively process and interpret human language has become a key research area. Natural Language Processing (NLP) has emerged as a field that enables machines to better understand and analyze human language. One of the most important applications of NLP is the development of question answering systems, which are essential in various domains such as customer service, search engines, and chatbots. To answer incoming queries, question answering systems rely on knowledge graphs as a reliable source. This thesis proposes a Turkish Question Answering (TRQA) system that utilizes a knowledge graph. The research focuses on the automatic construction of a knowledge graph specific to the film industry, as well as the creation of a multi-hop question-answering dataset that can be queried from this graph. Building upon these constructions, we develop a deep learning based method for answering questions using the constructed knowledge graph. The constructed knowledge graph is compared with various knowledge graphs presented in the literature using DistMult, ComplEx and SimplE methods for the link prediction task. Additionally, the proposed question answering system is compared with the baseline study and compared with a generative large language model through quantitative and qualitative analyses.Master Thesis Reproducibility Assessment of Research Code Repositories(01. Izmir Institute of Technology, 2023) Akdeniz, Eyüp Kaan; Tekir, SelmaThe growth in machine learning research has not been accompanied by a corresponding improvement in the reproducibility of the results. This thesis presents a novel, fully-automated end-to-end system that evaluates the reproducibility of machine learning studies based on the content of the associated GitHub project's Readme file. This evaluation relies on a readme template derived from an analysis of popular repositories. The template suggests a structure that promotes reproducibility. Our system generates a reproducibility score for each Readme file assessed, and it employs two distinct models, one based on section classification and the other on hierarchical transformers. The experimental outcomes indicate that the system based on section similarity outperforms the hierarchical transformer model. Furthermore, it has a superior edge concerning explainability, as it allows for a direct correlation of the scores with the respective sections of the Readme files. The proposed framework provides an important tool for improving the quality of code sharing and ultimately helps to increase reproducibility in machine learning research.Master Thesis Automatic Quote Detection From Literary Work(01. Izmir Institute of Technology, 2022) Güzel Altıntaş, Aybüke; Tekir, SelmaLiterature inspires readers, and readers tend to share quotes from a literary work. The reader underlines the quotes in the book and shares them on social media, or on an online platform used by book readers. The definition of a quote is a span in a written text that is interesting for many readers and readers can use the quote in different contexts. In this study, a novel task in the field of Natural Language Processing is proposed: the Quote Detection Task. Also, an original dataset was formed from the Goodreads and Gutenberg websites with web scraping. Quotes are Goodreads data sourced from Kaggle and data that has been voted by 10 or more users are selected. These quotes have been validated with the books on the Project Gutenberg website. The final dataset consists of 4554 rows. The dataset contains quotes with their book spans. The span of a quote consists of the previous 10 sentences of the quote, the quote itself, and the following 10 sentences of the quote. Conditional Random Field (CRF) and Extractive Summarization as Text Matching (MatchSum) were run as two different baselines for quote detection. The Quote Detection Task is span detection that can be modeled with sequence labeling solutions and Neural extractive summarization systems in the literature. For this sequence tagging problem, the statistics-based CRF was run as first baseline. Extractive Summarization as Text Matching baseline is the second baseline chosen for the experimental part. Rouge-1 scores of 27.24% and 40.54%, respectively, were obtained from these baselines.Master Thesis Enriching Contextual Word Embeddings With Character Information(Izmir Institute of Technology, 2020) Polatbilek, Ozan; Tekir, SelmaNatural Language Processing has become more and more popular with the recent advances in Artificial Intelligence. Fundamental improvements have been introduced in word representations to store semantic and/or syntactic features. With the recently published language model BERT, contextual word vectors could be generated. This model do not process character level information. In morphologically rich languages such as Turkish, this model's perception of syntax could be improved. In this thesis, a new model, called BERT-ELMo, which is a combination of BERT and ELMo, is proposed to enrich BERT with character level information. This model combines character level processing part of ELMo and contextual word representation part of the BERT model. To show the effectiveness of the proposed model, both quantitative (question answering) and qualitative (word analogy, word contextualization, morphological meaning, out of vocabulary word capturing) analyses are performed and it is compared with BERT on Turkish language. Thanks to character level addition, proposed model is able get trained in any language without any pre-analysis.To the best of our knowledge, this is the first study which uses morphological analysis to train the BERT model in Turkish, and the first model to integrate a character level module to BERT.
