Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 10 of 14
  • Article
    Gender Bias in Occupation Classification From the New York Times Obituaries
    (Dokuz Eylül Üniversitesi, 2022) Atik, Ceren; Tekir, Selma
    Technological developments such as artificial intelligence can strengthen social prejudices prevailing in society, regardless of the developer's intention. Therefore, researchers should be aware of the ethical issues that may arise from a developed product/solution. In this study, we investigate the effect of gender bias on occupational classification. For this purpose, a new dataset was created by collecting obituaries from the New York Times website and is provided in two different versions: With and without gender indicators. Category distributions from this dataset show that gender and occupation variables have dependence. Thus, gender affects occupation classification. To test the effect, we perform occupation classification using SVM (Support Vector Machine), HAN (Hierarchical Attention Network), and DistilBERT-based classifiers. Moreover, to get further insights into the relationship of gender and occupation in classification problems, a multi-tasking model in which occupation and gender are learned together is evaluated. Experimental results reveal that there is a gender bias in job classification.
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content
    (World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, Selma
    In the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.
  • Article
    Sales History-Based Demand Prediction Using Generalized Linear Models
    (Süleyman Demirel Üniversitesi, 2019) Özenboy, Başar; Tekir, Selma
    It’s vital for commercial enterprises to accurately predict demand by utilizing the existing sales data. Such predictive analytics is a crucial part of their decision support systems to increase the profitability of the company.In predictive data analytics, the branch of regression modeling is used to predict a numerical response variable like sale amount. In this category, linear models are simple and easy to interpret yet they permit generalization to very powerful and flexible families of models which are called Generalized linear models (GLM). The generalization potential over simple linear regression can be explained twofold: First, GLM relax the assumption of normally distributed error terms. Moreover, the relationship of the set of predictor variables and the response variable could be represented by a set of link functions rather than the sole choice of the identity function. This work models the sales amount prediction problem through the use of GLM. Unique company sales data are explored and the response variable, sale amount is fitted to the Gamma distribution. Then, inverse link function, which is the canonical one in the case of gamma-distributed response variable is used. The experimental results are compared with the other regression models and the classification algorithms. The model selection is performed via the use of MSE and AIC metrics respectively. The results show that GLM is better than the linear regression. As for the classification algorithms, Random Forest and GLM are the top performers. Moreover, categorization on the predictor variables improves model fitting results significantly.
  • Article
    Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi
    (Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, Selma
    Many text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 14
    Rule-Based Automatic Question Generation Using Semantic Role Labeling
    (Institute of Electronics, Information and Communication Engineers, 2019) Keklik, Onur; Tuğlular, Tuğkan; Tekir, Selma
    This paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.
  • Article
    Gender Prediction From Tweets: Improving Neural Representations With Hand-Crafted Features
    (Cornell University, 2019) Tekir, Selma; Sezerer, Erhan; Polatbilek, Ozan
    Author profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn ’where to look’. This model1 is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic.
  • Conference Object
    Citation - Scopus: 6
    Gender Prediction From Tweets With Convolutional Neural Networks: Notebook for Pan at Clef 2018
    (CEUR Workshop Proceedings, 2018) Sezerer, Erhan; Polatbilek, Ozan; Sevgili, Özge; Tekir, Selma
    This paper presents a system1 developed for the author profiling task of PAN at CLEF 2018. The system utilizes style-based features to predict the gender information from the given tweets of each user. These features are automatically extracted by Convolutional Neural Networks (CNN). The system mainly depends on the idea that the informativeness of each tweet is not the same in terms of the gender of a user. Thus, the attention mechanism is included to the CNN outputs in order to discriminate the tweets carrying more information. Our architecture was able to obtain competitive results on three languages provided by the PAN 2018 author profiling challenge with an average accuracy of 75.1% on local runs and 70.23% on the submission run.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    A Relativistic Opinion Mining Approach To Detect Factual or Opinionated News Sources
    (Springer Verlag, 2017) Sezerer, Erhan; Tekir, Selma
    The credibility of news cannot be isolated from that of its source. Further, it is mainly associated with a news source’s trustworthiness and expertise. In an effort to measure the trustworthiness of a news source, the factor of “is factual or opinionated” must be considered among others. In this work, we propose an unsupervised probabilistic lexicon-based opinion mining approach to describe a news source as “being factual or opinionated”. We get words’ positive, negative, and objective scores from a sentiment lexicon and normalize these scores through the use of their cumulative distribution. The idea behind the use of such a statistical approach is inspired from the relativism that each word is evaluated with its difference from the average word. In order to test the effectiveness of the approach, three different news sources are chosen. They are editorials, New York Times articles, and Reuters articles, which differ in their characteristic of being opinionated. Thus, the experimental validation is done by the analysis of variance on these different groups of news. The results prove that our technique can distinguish the news articles from these groups with respect to “being factual or opinionated” in a statistically significant way.
  • Conference Object
    Sosyal Çizgeler için Arama Motoru Geliştirilmesi
    (CEUR Workshop Proceedings, 2016) Yafay, Erman; Tekir, Selma
    Sosyal ağlara giderek artan ilgi, beraberinde büyük ölçeklerde bağlantılı veri açığa çıkarmıştır. Bu büyük veriler üzerinde arama yapabilmek için özelleştirilmiş sistemlere gereksinim duyulmaktadır. Bu gereksinimi karşılamak üzere Facebook, 2013 yılında kendi arama motoru olan Unicorn’u[1] hizmete sunmuştur. Bu çalışmada, Unicorn’un asgari fakat temel özellikleri tasarlanıp gerçekleştirilmiştir. Yaklaşımımızda sosyal ağ bir çizge olarak modellenmiştir ve çizgedeki düğümler ve kenarlar farklı türlere sahip olabilecek şekilde genel olarak tanımlanmıştır. Düğümler, kişi veya sayfa gibi varlıkları ifade ederken; kenarlar, düğümler arasındaki arkadaşlık veya beğenme ilişkisini ortaya koyar. Verimlilik sorununu çözebilmek için tamamen bellek üzerinde çalışan bir indisleme sistemi geliştirilmiştir. Bu sistem geniş ölçekte veri işlenmesini sağlamak üzere geliştirilen dağıtık motor Spark[2] üzerinde gerçekleştirilmiştir. Son olarak, sosyal ağ yapısına uygun işleçler (ve, veya, zayıf- ve, güçlü-veya, uygula) tasarlanmıştır. Bu işleçler sayesinde kolayca kişilerin ortak arkadaşları veya arkadaşlarının arkadaşları gibi sorgular ifade edilip çalıştırılabilmektedir. Çalışmanın son bölümünde bu tip bir sistemin gerçekleştirilmesinde dikkate alınması gereken nitelikler, bu niteliklere ilişkin ödünleşimler ve karar mekanizmaları ele alınıp değerlendirilmiştir.
  • Conference Object
    Overt information operations during peacetime
    (Curran Associates, 2012) Tekir, Selma
    Information superiority is the most critical asset in war making. It directly addresses the perception of the opponent and in the long term the will of him to act. Sun Tzu's classical text states this fact by the concept of deception as the basis of all warfare. The success in warfare then is dependent on being aware of what's happening, accurately realizing the context. This is the intelligence function in broad terms and mostly open source intelligence as it provides the context. Competitive intelligence is based mainly on open sources and day by day the open source share in the intelligence product is increasing. Present diversified open sources & services represent a methodology shift in war. The two preceding ways have been overt physical acts against military targets in wartime and covert information operations conducted throughout peacetime against even nonmilitary targets respectively. The present methodology must be overt (open) information operations during peacetime. This coincides with a metaphor change as well. It proposes a transformation from a war metaphor into a game metaphor in which there are some playing rules. In fact, the existence of such rules helps in drawing the boundary of the field of competitive intelligence and thus making it a profession. Game metaphor is safer to adopt than war as it's easier to take responsibility in public disclosure scenarios in this case. By following this metaphor, you continue to stay in the boundary of legitimate competition. In other terms, you make a conscious preference in terms of war intensities by choosing to avoid the more intense war forms limited conflict, and actual warfare respectively. Finally, this preference is in accordance with the fundamental point of the Sun Tzu's entire argument: The vision of victory without fighting. To summarize, open source domination in the competitive intelligence lays the ground for the game metaphor that represents a transformation in warfare. The apparent outcome is overt information operations during peacetime. It emerges as the most important tool to fight against deception, thus success in information warfare in the contemporary world.