Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection
Permanent URI for this collectionhttps://hdl.handle.net/11147/7148
Browse
11 results
Search Results
Article Citation - WoS: 1Citation - Scopus: 1Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content(World Scientific Publishing, 2021) Sezerer, Erhan; Tenekeci, Samet; Acar, Ali; Baloğlu, Bora; Tekir, SelmaIn the field of software engineering, practitioners' share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors' mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves 63.8% accuracy in binary classification in SO design patterns tag and 71.6% accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system's predicted reputation labels match the quality of provided answers.Article Estimating Spatiotemporal Focus of Documents Using Entropy With Pmi(Türkiye Klinikleri Journal of Medical Sciences, 2020) Yaşar, Damla; Tekir, SelmaMany text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word’s temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods.Article Citation - WoS: 9Citation - Scopus: 14Rule-Based Automatic Question Generation Using Semantic Role Labeling(Institute of Electronics, Information and Communication Engineers, 2019) Keklik, Onur; Tuğlular, Tuğkan; Tekir, SelmaThis paper proposes a new rule-based approach to automatic question generation. The proposed approach focuses on analysis of both syntactic and semantic structure of a sentence. Although the primary objective of the designed system is question generation from sentences, automatic evaluation results shows that, it also achieves great performance on reading comprehension datasets, which focus on question generation from paragraphs. Especially, with respect to METEOR metric, the designed system significantly outperforms all other systems in automatic evaluation. As for human evaluation, the designed system exhibits similar performance by generating the most natural (human-like) questions.Conference Object Citation - Scopus: 6Gender Prediction From Tweets With Convolutional Neural Networks: Notebook for Pan at Clef 2018(CEUR Workshop Proceedings, 2018) Sezerer, Erhan; Polatbilek, Ozan; Sevgili, Özge; Tekir, SelmaThis paper presents a system1 developed for the author profiling task of PAN at CLEF 2018. The system utilizes style-based features to predict the gender information from the given tweets of each user. These features are automatically extracted by Convolutional Neural Networks (CNN). The system mainly depends on the idea that the informativeness of each tweet is not the same in terms of the gender of a user. Thus, the attention mechanism is included to the CNN outputs in order to discriminate the tweets carrying more information. Our architecture was able to obtain competitive results on three languages provided by the PAN 2018 author profiling challenge with an average accuracy of 75.1% on local runs and 70.23% on the submission run.Conference Object Citation - WoS: 1Citation - Scopus: 1A Relativistic Opinion Mining Approach To Detect Factual or Opinionated News Sources(Springer Verlag, 2017) Sezerer, Erhan; Tekir, SelmaThe credibility of news cannot be isolated from that of its source. Further, it is mainly associated with a news source’s trustworthiness and expertise. In an effort to measure the trustworthiness of a news source, the factor of “is factual or opinionated” must be considered among others. In this work, we propose an unsupervised probabilistic lexicon-based opinion mining approach to describe a news source as “being factual or opinionated”. We get words’ positive, negative, and objective scores from a sentiment lexicon and normalize these scores through the use of their cumulative distribution. The idea behind the use of such a statistical approach is inspired from the relativism that each word is evaluated with its difference from the average word. In order to test the effectiveness of the approach, three different news sources are chosen. They are editorials, New York Times articles, and Reuters articles, which differ in their characteristic of being opinionated. Thus, the experimental validation is done by the analysis of variance on these different groups of news. The results prove that our technique can distinguish the news articles from these groups with respect to “being factual or opinionated” in a statistically significant way.Conference Object Sosyal Çizgeler için Arama Motoru Geliştirilmesi(CEUR Workshop Proceedings, 2016) Yafay, Erman; Tekir, SelmaSosyal ağlara giderek artan ilgi, beraberinde büyük ölçeklerde bağlantılı veri açığa çıkarmıştır. Bu büyük veriler üzerinde arama yapabilmek için özelleştirilmiş sistemlere gereksinim duyulmaktadır. Bu gereksinimi karşılamak üzere Facebook, 2013 yılında kendi arama motoru olan Unicorn’u[1] hizmete sunmuştur. Bu çalışmada, Unicorn’un asgari fakat temel özellikleri tasarlanıp gerçekleştirilmiştir. Yaklaşımımızda sosyal ağ bir çizge olarak modellenmiştir ve çizgedeki düğümler ve kenarlar farklı türlere sahip olabilecek şekilde genel olarak tanımlanmıştır. Düğümler, kişi veya sayfa gibi varlıkları ifade ederken; kenarlar, düğümler arasındaki arkadaşlık veya beğenme ilişkisini ortaya koyar. Verimlilik sorununu çözebilmek için tamamen bellek üzerinde çalışan bir indisleme sistemi geliştirilmiştir. Bu sistem geniş ölçekte veri işlenmesini sağlamak üzere geliştirilen dağıtık motor Spark[2] üzerinde gerçekleştirilmiştir. Son olarak, sosyal ağ yapısına uygun işleçler (ve, veya, zayıf- ve, güçlü-veya, uygula) tasarlanmıştır. Bu işleçler sayesinde kolayca kişilerin ortak arkadaşları veya arkadaşlarının arkadaşları gibi sorgular ifade edilip çalıştırılabilmektedir. Çalışmanın son bölümünde bu tip bir sistemin gerçekleştirilmesinde dikkate alınması gereken nitelikler, bu niteliklere ilişkin ödünleşimler ve karar mekanizmaları ele alınıp değerlendirilmiştir.Conference Object Overt information operations during peacetime(Curran Associates, 2012) Tekir, SelmaInformation superiority is the most critical asset in war making. It directly addresses the perception of the opponent and in the long term the will of him to act. Sun Tzu's classical text states this fact by the concept of deception as the basis of all warfare. The success in warfare then is dependent on being aware of what's happening, accurately realizing the context. This is the intelligence function in broad terms and mostly open source intelligence as it provides the context. Competitive intelligence is based mainly on open sources and day by day the open source share in the intelligence product is increasing. Present diversified open sources & services represent a methodology shift in war. The two preceding ways have been overt physical acts against military targets in wartime and covert information operations conducted throughout peacetime against even nonmilitary targets respectively. The present methodology must be overt (open) information operations during peacetime. This coincides with a metaphor change as well. It proposes a transformation from a war metaphor into a game metaphor in which there are some playing rules. In fact, the existence of such rules helps in drawing the boundary of the field of competitive intelligence and thus making it a profession. Game metaphor is safer to adopt than war as it's easier to take responsibility in public disclosure scenarios in this case. By following this metaphor, you continue to stay in the boundary of legitimate competition. In other terms, you make a conscious preference in terms of war intensities by choosing to avoid the more intense war forms limited conflict, and actual warfare respectively. Finally, this preference is in accordance with the fundamental point of the Sun Tzu's entire argument: The vision of victory without fighting. To summarize, open source domination in the competitive intelligence lays the ground for the game metaphor that represents a transformation in warfare. The apparent outcome is overt information operations during peacetime. It emerges as the most important tool to fight against deception, thus success in information warfare in the contemporary world.Conference Object Citation - WoS: 3Citation - Scopus: 4Recent Cyberwar Spectrum and Its Analysis(Curran Associates, 2012) Aslanoğlu, Rabia; Tekir, SelmaWar is an organized, armed, and often prolonged conflict that is carried on between states, nations or other parties. Every war instance includes some basic components like rising conditions, battlespace, weapons, strategy, tactics, and consequences. Recent developments in the information and communication technologies have brought about changes on the nature of war. As a consequence of this change, cyberwar became the new form of war. In this new form, the new battlespace is cyber space and the contemporary weapons are constantly being renovated viruses, worms, trojans, denial-of-service, botnets, and advanced persistent threat. In this work, we present recent cyberwar spectrum along with its analysis. The spectrum is composed of the Estonia Attack, Georgia Attack, Operation Aurora, and Stuxnet Worm cases. The methodology for analysis is to identify reasons, timeline, effects, responses, and evaluation of each individual case. Moreover, we try to enumerate the fundamental war components for each incident. The analysis results put evidences to the evolution of the weapons into some new forms such as advanced persistent threat. Another outcome of the analysis is that when approaching to the end, confidentiality and integrity attributes of information are being compromised in addition to the availability. Another important observation is that in the last two cases, the responsive actions were not possible due to the lack of the identities of the offending parties. Thus, attribution appears as a significant concern for the modern warfare. The current sophistication level of the cyber weapons poses critical threats to society. Particularly developed countries that have high dependence on information and communication technologies are potential targets since the safety of the critical infrastructures like; healthcare, oil and gas production, water supply, transportation and telecommunication count on the safety of the computer networks. Being aware of this fact, every nation should attach high priorities to cyber security in his agenda and thus behave proactively.Article Citation - WoS: 1Reading Cs Classics(Association for Computing Machinery (ACM), 2012) Tekir, SelmaKnowledge of the theories of computer science (CS) helps in understanding the limitations of the field by providing users with new perspectives and insights. It can be a good practice for CS professionals to compile their own list of classics that highlights some key scientific concepts of the field. 'An Axiomatic Basis for Computer Programming,' by C.A.R. Hoare is a CS classic, which tells about the computing industry of the 1960s and 1970s in Britain. Hoare provides a foundation for the formal proofs of programs by an algebraic assertions-based approach. 'Computing Machinery and Intelligence,' by A.M. Turing tells about the computer numbering systems that provide unique representation to every programming construct. Dijkstra' s realization of the high intellectual challenge of programming and his encouragement made him one of the greatest minds of computer programming. Donald Knuth is extraordinary with his perspective on computer programming.Conference Object Citation - Scopus: 6Geodesic Distances for Web Document Clustering(Institute of Electrical and Electronics Engineers Inc., 2011) Tekir, Selma; Mansmann, Florian; Keim, DanielWhile traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article. © 2011 IEEE.
