Master Degree / Yüksek Lisans Tezleri

Permanent URI for this collectionhttps://hdl.handle.net/11147/3008

Browse

Search Results

Now showing 1 - 10 of 15

A Language Modeling Approach To Detect Bias
(Izmir Institute of Technology, 2020) Atik, Ceren; Tekir, Selma
Technology is developing day by day and is involved in every area of our lives. Technological innovations such as artificial intelligence can strengthen social biases that already exist in society, regardless of the developers' intentions. Therefore, researchers should be aware of this ethical issue. In this thesis, the effect of gender bias, which is one of the social biases, on occupation classification is investigated. For this, a new dataset was created by collecting obituaries from the New York Times website and they were handled in two different versions, with and without gender indicators. Since occupation and gender are independent variables, gender indicators should not have an impact on the occupation prediction of models. In this context, in order to investigate gender bias on occupation estimation, a model in which occupation and gender are learned together is evaluated as well as models that make only occupation classification are evaluated. The results obtained from models state that gender bias has a role in classification occupation.
Enriching Contextual Word Embeddings With Character Information
(Izmir Institute of Technology, 2020) Polatbilek, Ozan; Tekir, Selma
Natural Language Processing has become more and more popular with the recent advances in Artificial Intelligence. Fundamental improvements have been introduced in word representations to store semantic and/or syntactic features. With the recently published language model BERT, contextual word vectors could be generated. This model do not process character level information. In morphologically rich languages such as Turkish, this model's perception of syntax could be improved. In this thesis, a new model, called BERT-ELMo, which is a combination of BERT and ELMo, is proposed to enrich BERT with character level information. This model combines character level processing part of ELMo and contextual word representation part of the BERT model. To show the effectiveness of the proposed model, both quantitative (question answering) and qualitative (word analogy, word contextualization, morphological meaning, out of vocabulary word capturing) analyses are performed and it is compared with BERT on Turkish language. Thanks to character level addition, proposed model is able get trained in any language without any pre-analysis.To the best of our knowledge, this is the first study which uses morphological analysis to train the BERT model in Turkish, and the first model to integrate a character level module to BERT.
Automatic Story Construction From News Articles in an Online Fashion
(Izmir Institute of Technology, 2019) Can, Özgür; Tekir, Selma
Every day, thousands of local and global news become online. Each arriving news piece can have a connection with some previous information, but in a large-scale news flow, it is quite difficult for readers to integrate news and evaluate the agenda in the light of past. Thus, grouping news in a coherent way to construct news stories is a fundamental requirement. To meet this requirement, first of all meaningful representation of documents on which the clustering is performed must be extracted, and the new story clusters must be generated on the fly in an online fashion. In this work, we analyze the complex relations of the news articles, and propose a system to generate continuously updated news stories in online fashion. As part of the experimental validation, we provide a step by step construction of a meaningful news story out of news articles that are coming from different sources. The constructed news stories demonstrate the usefulness of the developed system.
Identifying Communities Using Collaboration and Word Association Networks in Turkish Social Media
(Izmir Institute of Technology, 2018) Atay, Abdullah Asil; Tekir, Selma
Social media contents are always very attractive title for researchers. Scores of people use social media and share their ideas with pictures, videos or documents. Researchers analyze this information and they try to deduce beneficial data. A lot of researchers think that analyzing social media information is a very important research area. There are a lot of social media platforms which have Turkish contents. We can give an example Ekşisözlük which have Turkish contents and popular social media platform in Turkey. Within scope of the thesis, Ekşisözlük contents downloaded, decomposed and used actively. Social media consists of human or human made products and sharing contents have some similarities. In this thesis, to calculate similarities, some methods are used. Scope of the thesis, two different networks are created from same content which are word association network and collaboration network. Word association network is a network that created by coexistence of words in specific window size. Collaboration network is a network that created by entered content to same title with different users. This information gives the similarity of users. These two networks are analyzed separately and deduced some information.
Automatic Question Generation Using Natural Language Processing Techniques
(Izmir Institute of Technology, 2018) Keklik, Onur; Tuğlular, Tuğkan; Tekir, Selma
This thesis proposes a new rule based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. The design and implementation of the proposed approach are also explained in detail. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. With respect to human evaluations, the designed system significantly outperforms all other systems and generated the most natural (human-like) questions.
A Systematic Evaluation of Semantic Representations in Natural Language Processing
(Izmir Institute of Technology, 2018) Sevgili Ergüven, Özge; Tekir, Selma
In the studies of semantics, the main aim is to address meaning. In a computational manner, this goal is accomplished through the encoding of language constructs. These encodings are in the form of information-theoretic measures and vector representations. We have focused on the representation of words. In word representations, the earlier approaches depend on counting the statistics between word and its accompanied words, whereas the current methods are based on learning approaches. At this point, we have investigated the relation between these two approaches. We have realized that both approaches use context as the normalization factor. We support our idea by evaluating word representations on some Natural Language Processing (NLP) tasks. Furthermore, we have studied the polysemous words which carry more than one meaning. The word representation of the polysemous word provides a representation that covers more than one meaning. To overcome this issue, we provide a method to create a representation for each sense of polysemous word.
Spatio-Temporal Modeling of Documents
(Izmir Institute of Technology, 2017) Yaşar, Damla; Tekir, Selma
Temporal and geographic information is important aspects of text documents. Thus, it also occurs frequently in many types of text documents in the form of temporal and geographic expressions. Spatio-temporal expressions can be normalized so that their meaning is unambiguous and can be placed on a timeline or pinpointed on a map. A general text document can contain many spatio-temporal expressions that are unrelated to their content. In this thesis, we propose estimating the focus time and focus place of documents that are defined as the time and place that the document’s content refers to. We utilize statistical knowledge from Wikipedia English to calculate association scores that are used to estimate the focus time and place contained in the document. We implement two different association score calculation methodologies and compare their accuracy respectively. The effectiveness of our methods are evaluated on three different time-tagged datasets of documents about historical events in total time frame of 4000 years. Our methods achieve average error of less than 15 years. Our methods are also able to estimate focus place of each document correctly.
Automatic, Fast and Accurate Sequence Decontamination
(Izmir Institute of Technology, 2016) Bağcı, Caner; Allmer, Jens; Tekir, Selma
The introduction of massively parallel sequencing technologies was a revolutionary step in genomics. Their decreasing cost and powerful features have put them more and more on demand in the last decade. It is now possible to sequence even complete genomes of organisms, using massively parallel sequencing technologies even for small laboratories around the world. However, the power of this powerful technology comes with its challenges. The challenges are both in technological and computational side of the work. In this work, one of these computational challenges is addressed and a novel algorithm is offered to solve the problem. Sequencing by synthesis is one of the methods used in many different massively parallel sequencing instruments. This method utilizes the biological process of DNA replication and with the help of different means of detection, it allows sequencing a DNA molecule while it is replicated. Since DNA polymerase requires a primer to start the replication reaction, short oligonucleotide adapters are used in sequencing by synthesis methods to initiate the reaction. However, certain circumstances allow these adapters to contaminate final sequence reads. Several tools have been offered to trim adapters from reads; but all depend on the prior knowledge of the adapter sequence by the bioinformatician. In this work, an algorithm is offered to detect and trim adapters only using the sequences of reads, without relying on prior knowledge of adapter sequences. The algorithm was shown to perform better or on the same grounds with existing methods in terms of speed and efficiency.
Sales History-Based Demand Prediction by Using Generalized Linear Models
(Izmir Institute of Technology, 2016) Özenboy, Başar; Tekir, Selma
Improved data collection and storage capabilities make vast amounts of data available in appropriate formats. Commercial enterprises store their sales data. It’s vital for companies to accurately predict demand by utilizing the existing sales data. Such predictive analytics is a crucial part of their decision support systems to increase the profitability of the company. In predictive data analytics, the branch of regression modeling commonly is used to predict a numerical response variable like sales amount. In recent years, generalized linear models provide a generalization to better address the specificities of the problem at hand. To begin with, they relax the assumption of normally distributed error terms. Moreover, the relationship of the set of predictor variables and the response variable could be represented by a set of link functions rather than the sole choice of the identity function. This thesis models the sales amount prediction problem through the use of generalized linear models. Unique company sales data are explored and fitted accordingly with the right distribution function of the response variable along with an appropriate link function. The experimental results are compared with the other regression models, classification algorithms, and time series models. The model selection is performed via the use of MSE and AIC metrics respectively.
Finding Out Subject-Matter Experts and Research Trends Using Bibliographic Data
(Izmir Institute of Technology, 2015) Karataş, Arzum; Tekir, Selma
With the prevalent use of information technology, it is very easy to reach nearly any information. However, if it is desired to be specialized in an area, the first thing to do is to know who are the experts in that area. Since experts have valuable knowledge, it is important to find these experts. Also, it is vital to be aware of trends for researchers who want to be expert in a topic or who want to enter into a new area. This work includes an empirical study for finding experts and research trends in academic world. We created a citation network from KDD proceedings and an author-keyword bipartite graph from bibliographic data of the same set of proceedings. Then, we applied link analysis algorithms HITS and PageRank, respectively. The results show that it is possible to detect two expert types (one that works intensively on a single subject and another having high level knowledge of various subtopics of a subject-matter). Moreover, topical trends are identified as doing peak, periodic, and having the same shape rather than showing absolute increase, decrease or stationary pose.

Master Degree / Yüksek Lisans Tezleri

Browse

Filters

Settings

Sort By

Results per page

Search Results