Computer Engineering / Bilgisayar Mühendisliği
Permanent URI for this collectionhttps://hdl.handle.net/11147/10
Browse
18 results
Search Results
Article Gender Bias in Occupation Classification From the New York Times Obituaries(Dokuz Eylül Üniversitesi, 2022) Atik, Ceren; Tekir, SelmaTechnological developments such as artificial intelligence can strengthen social prejudices prevailing in society, regardless of the developer's intention. Therefore, researchers should be aware of the ethical issues that may arise from a developed product/solution. In this study, we investigate the effect of gender bias on occupational classification. For this purpose, a new dataset was created by collecting obituaries from the New York Times website and is provided in two different versions: With and without gender indicators. Category distributions from this dataset show that gender and occupation variables have dependence. Thus, gender affects occupation classification. To test the effect, we perform occupation classification using SVM (Support Vector Machine), HAN (Hierarchical Attention Network), and DistilBERT-based classifiers. Moreover, to get further insights into the relationship of gender and occupation in classification problems, a multi-tasking model in which occupation and gender are learned together is evaluated. Experimental results reveal that there is a gender bias in job classification.Article Citation - WoS: 1Citation - Scopus: 1Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content(World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, SelmaIn the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.Article Sales History-Based Demand Prediction Using Generalized Linear Models(Süleyman Demirel Üniversitesi, 2019) Özenboy, Başar; Tekir, SelmaIt’s vital for commercial enterprises to accurately predict demand by utilizing the existing sales data. Such predictive analytics is a crucial part of their decision support systems to increase the profitability of the company.In predictive data analytics, the branch of regression modeling is used to predict a numerical response variable like sale amount. In this category, linear models are simple and easy to interpret yet they permit generalization to very powerful and flexible families of models which are called Generalized linear models (GLM). The generalization potential over simple linear regression can be explained twofold: First, GLM relax the assumption of normally distributed error terms. Moreover, the relationship of the set of predictor variables and the response variable could be represented by a set of link functions rather than the sole choice of the identity function. This work models the sales amount prediction problem through the use of GLM. Unique company sales data are explored and the response variable, sale amount is fitted to the Gamma distribution. Then, inverse link function, which is the canonical one in the case of gamma-distributed response variable is used. The experimental results are compared with the other regression models and the classification algorithms. The model selection is performed via the use of MSE and AIC metrics respectively. The results show that GLM is better than the linear regression. As for the classification algorithms, Random Forest and GLM are the top performers. Moreover, categorization on the predictor variables improves model fitting results significantly.Article Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi(Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, SelmaMany text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.Article Citation - WoS: 9Citation - Scopus: 14Rule-Based Automatic Question Generation Using Semantic Role Labeling(Institute of Electronics, Information and Communication Engineers, 2019) Keklik, Onur; Tuğlular, Tuğkan; Tekir, SelmaThis paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.Conference Object Citation - Scopus: 1Türkçe Tweetler Üzerinden Yapay Sinir Ağları ile Cinsiyet Tahminlemesi(Institute of Electrical and Electronics Engineers Inc., 2019) Sezerer, Erhan; Polatbilek, Ozan; Tekir, SelmaYazar ayrımlaması, yazarı bilinmeyen bir metin üzerinden yazarına dair cinsiyet, yaş ve dil gibi bazı anahtar özniteliklerin belirlenmesidir. Özellikle güvenlik ve pazarlama alanında önem arz etmektedir. Bu çalışmada, kullanıcıların tweetleri kullanılarak cinsiyetleri tahminlenmektedir. Yinelemeli Sinir Ağı (YSA) ve ilgi mekanizmasının birleşiminden oluşan bir model önerilmiştir. Bildiğimiz kadarıyla bu çalışma Twitter veri kümesi ile Türkçe’de ilk defa yapılmıştır. Önerilen model Türkçe, İngilizce, İspanyolca ve Arapça dillerinde sınanmış ve sırasıyla 80.63, 81.73, 78.22, 78.5 doğruluk değerlerine ulaşılmıştır. Elde edilen doğruluk değerleri Türkçe’de en gelişkin, diğer dillerde ise rekabetçi bir başarım ortaya koymaktadır.Conference Object 13. Ulusal Yazılım Mühendisliği Sempozyumu(Izmir Institute of Technology, 2019) Ayav, Tolga; Tekir, Selma; Erten, MuratThe 13th National Software Engineering Symposium (UYMS) of Turkey was held Izmir Institute of Technology on 23-25 September 2019. There has been a great interest in this year’s symposium, as in previous years. UYMS is a platform which helps bring together the software industry and the academicians working in this area. It is being organized since 2003 and it plays an important role in shaping the future of the software industry in Turkey. We would like to thank all the participants whose contributions led to the successful realization of this symposium. We would also like to express our belief that these contributions will lead to a better and more productive efforts in the field of software engineering. Along with the main area of UYMS, in the thematic areas of Software Test Engineering, Software Engineering for Health, Software Modeling, and Graduate Theses, a total of 77 papers were accepted this year. At least three referees reviewed each paper and the papers were evaluated based on these reviews. We thank all the program committee members who served as referees.Article Gender Prediction From Tweets: Improving Neural Representations With Hand-Crafted Features(Cornell University, 2019) Tekir, Selma; Sezerer, Erhan; Polatbilek, OzanAuthor profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn ’where to look’. This model1 is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic.Conference Object Citation - Scopus: 6Gender Prediction From Tweets With Convolutional Neural Networks: Notebook for Pan at Clef 2018(CEUR Workshop Proceedings, 2018) Sezerer, Erhan; Polatbilek, Ozan; Sevgili, Özge; Tekir, SelmaThis paper presents a system1 developed for the author profiling task of PAN at CLEF 2018. The system utilizes style-based features to predict the gender information from the given tweets of each user. These features are automatically extracted by Convolutional Neural Networks (CNN). The system mainly depends on the idea that the informativeness of each tweet is not the same in terms of the gender of a user. Thus, the attention mechanism is included to the CNN outputs in order to discriminate the tweets carrying more information. Our architecture was able to obtain competitive results on three languages provided by the PAN 2018 author profiling challenge with an average accuracy of 75.1% on local runs and 70.23% on the submission run.Conference Object Doğruluk Problemi için Veri Kümesi Hazırlanması(CEUR Workshop Proceedings, 2018) Karabayır, Arif Kürşat; Tek, Ozan Onur; Çınar, Özgür Fırat; Tekir, SelmaInternet has become one of the most important information sources. With the advent of Internet, the ease of access and sharing of information have caused the emergence of conflicting information. The increase in conflicting information makes it a challenge to find the truth out of it. This problem is named as the veracity problem. The algorithms that were developed in response to this problem accept structured data as in¬ put. Thus, to be able to use these algorithms on Internet, there is a need to transform the unstructured data on the Internet into a structured form. This need is hard to fulfill in a domain-independent and automatic way considering the variety on Internet. In this work; structured data preparation to test the effectiveness of the truth-finder algorithms is experienced. The process of transforming the unstructured data on the Internet into a structured form is described in steps to contribute its generalization in a domain-independent way. As a result of this process, a new quotes data set is constructed and a truth-finder algorithm is tested on this dataset by giving some comments on it.
