Computer Engineering / Bilgisayar Mühendisliği
Permanent URI for this collectionhttps://hdl.handle.net/11147/10
Browse
14 results
Search Results
Article Gender Bias in Occupation Classification From the New York Times Obituaries(Dokuz Eylül Üniversitesi, 2022) Atik, Ceren; Tekir, SelmaTechnological developments such as artificial intelligence can strengthen social prejudices prevailing in society, regardless of the developer's intention. Therefore, researchers should be aware of the ethical issues that may arise from a developed product/solution. In this study, we investigate the effect of gender bias on occupational classification. For this purpose, a new dataset was created by collecting obituaries from the New York Times website and is provided in two different versions: With and without gender indicators. Category distributions from this dataset show that gender and occupation variables have dependence. Thus, gender affects occupation classification. To test the effect, we perform occupation classification using SVM (Support Vector Machine), HAN (Hierarchical Attention Network), and DistilBERT-based classifiers. Moreover, to get further insights into the relationship of gender and occupation in classification problems, a multi-tasking model in which occupation and gender are learned together is evaluated. Experimental results reveal that there is a gender bias in job classification.Article Asking the Right Questions To Solve Algebraic Word Problems(TÜBİTAK - Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, 2022) Çelik, Ege Yiğit; Orulluoğlu, Zeynel; Mertoğlu, Rıdvan; Tekir, SelmaWord algebra problems are among challenging AI tasks as they combine natural language understanding with a formal equation system. Traditional approaches to the problem work with equation templates and frame the task as a template selection and number assignment to the selected template. The recent deep learning-based solutions exploit contextual language models like BERT and encode the natural language text to decode the corresponding equation system. The proposed approach is similar to the template-based methods as it works with a template and fills in the number slots. Nevertheless, it has contextual understanding because it adopts a question generation and answering pipeline to create tuples of numbers, to finally perform the number assignment task by custom sets of rules. The inspiring idea is that by asking the right questions and answering them using a state-of-the-art language model-based system, one can learn the correct values for the number slots in an equation system. The empirical results show that the proposed approach outperforms the other methods significantly on the word algebra benchmark dataset alg514 and performs the second best on the AI2 corpus for arithmetic word problems. It also has superior performance on the challenging SVAMP dataset. Though it is a rule-based system, simple rule sets and relatively slight differences between rules for different templates indicate that it is highly probable to develop a system that can learn the patterns for the collection of all possible templates, and produce the correct equations for an example instance.Article Citation - WoS: 7Citation - Scopus: 12A Survey on Organizational Choices for Microservice-Based Software Architectures(TÜBİTAK, 2022) Ünlü, Hüseyin; Bilgin, Burak; Demirörs, OnurDuring the last decade, the demand for more flexible, responsive, and reliable software applications increased exponentially. The availability of internet infrastructure and new software technologies to respond to this demand led to a new generation of applications. As a result, cloud-based, distributed, independently deployable web applications working together in a microservice-based software architecture style have gained popularity. The style has been a common practice in the industry and successfully utilized by companies. Adopting this style demands software organizations to transform their culture. However, there is a lack of research studies that explores common practices for microservices. Thus, we performed a survey to explore the organizational choices on software analysis, design, size measurement, and effort estimation when working with microservices. The results provide a snapshot of the software industry that utilizes microservices. We provide insight for software organizations to transform their culture and suggest challenges researchers can focus on in the area.Article Citation - Scopus: 1Performance Analysis and Feature Selection for Network-Based Intrusion Detection With Deep Learning(Türkiye Klinikleri, 2022) Caner, Serhat; Erdoğmuş, Nesli; Erten, Yusuf MuratAn intrusion detection system is an automated monitoring tool that analyzes network traffic and detects malicious activities by looking out either for known patterns of attacks or for an anomaly. In this study, intrusion detection and classification performances of different deep learning based systems are examined. For this purpose, 24 deep neural networks with four different architectures are trained and evaluated on CICIDS2017 dataset. Furthermore, the best performing model is utilized to inspect raw network traffic features and rank them with respect to their contributions to success rates. By selecting features with respect to their ranks, sets of varying size from 3 to 77 are assessed in terms of classification accuracy and time efficiency. The results show that recurrent neural networks with a certain level of complexity can achieve comparable success rates with state-of-the-art systems using a small feature set of size 9; while the average time required to classify a test sample is halved compared to the complete set.Article Citation - WoS: 1Citation - Scopus: 2Information Retrieval-Based Bug Localization Approach With Adaptive Attribute Weighting(TÜBİTAK - Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, 2021) ErşahIn, Mustafa; Utku, Semih; Kılınç, Deniz; ErşahIn, BuketSoftware quality assurance is one of the crucial factors for the success of software projects. Bug fixing has an essential role in software quality assurance, and bug localization (BL) is the first step of this process. BL is difficult and time-consuming since the developers should understand the flow, coding structure, and the logic of the program. Information retrieval-based bug localization (IRBL) uses the information of bug reports and source code to locate the section of code in which the bug occurs. It is difficult to apply other tools because of the diversity of software development languages, design patterns, and development standards. The aim of this study is to build an adaptive IRBL tool and make it usable by more companies. BugSTAiR solves the aforementioned problem by means of the adaptive attribute weighting (AAW) algorithm and is evaluated on four open-source projects which are well-known benchmark datasets on BL. One of them is BLIA which is the state of the art in bug localization area and another is BLUIR which is a well-known BL tool. According to the promising results of experiments, Top1 rank of BugSTAiR is 2% and MAP is 10% better than BLIA's results on AspectJ and it has localized 4.6% of all bugs in Top1 and its precision is 6.1% better than BLIA on SWT, respectively. On the other side, it is 20% better in the Top1 metric and 30% in precision than BLUIR.Article Sales History-Based Demand Prediction Using Generalized Linear Models(Süleyman Demirel Üniversitesi, 2019) Özenboy, Başar; Tekir, SelmaIt’s vital for commercial enterprises to accurately predict demand by utilizing the existing sales data. Such predictive analytics is a crucial part of their decision support systems to increase the profitability of the company.In predictive data analytics, the branch of regression modeling is used to predict a numerical response variable like sale amount. In this category, linear models are simple and easy to interpret yet they permit generalization to very powerful and flexible families of models which are called Generalized linear models (GLM). The generalization potential over simple linear regression can be explained twofold: First, GLM relax the assumption of normally distributed error terms. Moreover, the relationship of the set of predictor variables and the response variable could be represented by a set of link functions rather than the sole choice of the identity function. This work models the sales amount prediction problem through the use of GLM. Unique company sales data are explored and the response variable, sale amount is fitted to the Gamma distribution. Then, inverse link function, which is the canonical one in the case of gamma-distributed response variable is used. The experimental results are compared with the other regression models and the classification algorithms. The model selection is performed via the use of MSE and AIC metrics respectively. The results show that GLM is better than the linear regression. As for the classification algorithms, Random Forest and GLM are the top performers. Moreover, categorization on the predictor variables improves model fitting results significantly.Article Citation - WoS: 2Citation - Scopus: 1Mutant Selection by Using Fourier Expansion(Türkiye Klinikleri Journal of Medical Sciences, 2020) Takan, Savaş; Ayav, TolgaMutation analysis is a widely used technique to evaluate the effectiveness of test cases in both hardware and software testing. The original model is mutated systematically under certain fault assumptions and test cases are checked against the mutants created to see whether the test cases can detect the faults or not. Mutation analysis is usually a computationally intensive task, particularly in finite state machine (FSM) testing due to a possibly huge amount of mutants. Random selection could be a practical reduction method under the assumption that each mutant is identical in terms of the probability of occurrence of its associating fault. The present study proposes a mutant selection method based on Fourier analysis of Boolean functions. Fourier helps to identify the most effective transitions on the output so that the mutants related to those transitions can be selected. Such mutants are considered more important since they are more likely to be killed. To evaluate the method, test cases are generated by the well-known W method, which has the capability of detecting every potential fault. The original and reduced sets of mutants are compared with respect to their importance values. Evaluations show that the mutants selected by the proposed technique are more effective, which reduces the cost of mutation analysis without sacrificing the performance of the mutation analysis.Article Privacy Issues on Social Networks(Gazi Üniversitesi, 2018) Şahin, SerapThe privacy is a need for humanity since the creation of civilizations. In social networks the common point is that each user has to create a profile to define his or her own identity. The profile description includes many items with privacy settings to tune their visibility degrees for only owner, friends, friends of friends and sometimes for public. After enrolment stage, users extend their social connection graphs with accepted new friends and these graphs grow without the control of individual due to the new friends of friends. Hence, with high probability, the shared information of member is generally available to public and can be retrieved by users around the world. This article is prepared to give an overview on the reasons of privacy concerns and risks of SNs, and summarize the current and future possible solution directions for researchers and governments.Article Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi(Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, SelmaMany text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.Article Systematic Reviews in Model-Driven Engineering: a Tertiary Study(Hezârfen Havacılık ve Uzay Teknolojileri Enstitüsü, 2020) Akdur, Deniz; Demirörs, OnurTo cope with growing complexity of software-intensive systems, model-driven engineering (MDE) has become a widely used approach in the industry by providing many (potential) benefits with different purposes. Although there has been an increasing interest in conducting secondary studies among MDE researchers such as surveys, systematic mapping (SM) and systematic literature review (SLR), there have been no tertiary study to synthesize the findings from all these existing secondary studies, which also examines various characteristics of software modeling (e.g., purposes, benefits and challenges) as a meta-analysis. The objective of this paper is to investigate and understand the state-of-the-practices in MDE based on the modeling characteristics by presenting a tertiary study (i.e., a systematic review of systematic reviews). To this end, we collected the set of all the existing 64 secondary studies in this field using a well-defined search strategy. This article presents inputs for different modeling stakeholders to better understand and use different purposes, benefits, and challenges of MDE by aggregating consolidated findings on this approach.
