Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 9 of 9

Citation - Scopus: 1
An Interestingness Measure for Knowledge Bases
(Elsevier, 2023) Oğuz, Damla; Soygazi, Fatih
Association rule mining and logical rule mining both aim to discover interesting relationships in data or knowledge. In association rule mining, relationships are identified based on the occurrence of items in a dataset, while in logical rule mining, relationships are determined based on logical relationships between atoms in a knowledge base. Association rule mining has been widely studied in transactional databases, mainly for market basket analysis. Confidence has become the most widely used interesting measure to assess the strength of a rule. Many other interestingness measures have been proposed since confidence can be insufficient to filter negatively associated relationships. Recently, logical rule mining has become an important area of research, as new facts can be inferred by applying discovered logical rules. They can be used for reasoning, identifying potential errors in knowledge bases, and to better understand data. However, there are currently only a few measures for logical rule mining. Furthermore, current measures do not consider relations that can have several objects, called quasi-functions, which can dramatically alter the interestingness of the rule. In this paper, we focus on effectively assessing the strength of logical rules. We propose a new interestingness measure that takes into account two categories of relations, functions and quasi-functions, to assess the degree of certainty of logical rules. We compare our proposed measure with a widely used measure on both synthetic test data and real knowledge bases. We show that it is more effective in indicating rule quality, making it an appropriate interestingness measure for logical rule evaluation. & COPY; 2023 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Citation - WoS: 1
Citation - Scopus: 3
Ignoring Internal Utilities in High-Utility Itemset Mining
(MDPI, 2022) Oğuz, Damla
High-utility itemset mining discovers a set of items that are sold together and have utility values higher than a given minimum utility threshold. The utilities of these itemsets are calculated by considering their internal and external utility values, which correspond, respectively, to the quantity sold of each item in each transaction and profit units. Therefore, internal and external utilities have symmetric effects on deciding whether an itemset is high-utility. The symmetric contributions of both utilities cause two major related challenges. First, itemsets with low external utility values can easily exceed the minimum utility threshold if they are sold extensively. In this case, such itemsets can be found more efficiently using frequent itemset mining. Second, a large number of high-utility itemsets are generated, which can result in interesting or important high-utility itemsets that are overlooked. This study presents an asymmetric approach in which the internal utility values are ignored when finding high-utility itemsets with high external utility values. The experimental results of two real datasets reveal that the external utility values have fundamental effects on the high-utility itemsets. The results of this study also show that this effect tends to increase for high values of the minimum utility threshold. Moreover, the proposed approach reduces the execution time.
Citation - WoS: 9
Citation - Scopus: 11
A Qualitative Survey on Frequent Subgraph Mining
(De Gruyter, 2018) Güvenoğlu, Büşra; Ergenç Bostanoğlu, Belgin
Data mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.
Citation - Scopus: 4
Survey: Running and Comparing Stream Clustering Algorithms
(CEUR Workshop Proceedings, 2018) Ahmed, Rowanda D.; Dalkılıç, Gökhan; Erten, Murat
Recently, clustering data streams have become an incredibly important research area for knowledge discovery as applications produce more and more unstoppable streaming data. In this paper we introduce clustering, streams and data streaming clustering algorithms, as well as discussions of the most important stream clustering algorithms, considering their structure. As an additional contribution of our work and differently from review and survey papers in stream clustering, we offer the practical part of the most known stream clustering algorithms, namely: (i) CluStream; (ii) DenStream; (iii) D-Stream; and (iv) ClusTree, showing their experimental results along with some performance metrics computation of for each, depending on MOA framework.
Citation - Scopus: 3
Ontology Supported Policy Modeling in Opinion Mining Process
(Springer Verlag, 2012) Husaini, Mus'ab; Ko, Andrea; Tapucu, Dilek; Saygın, Yücel
In e-Society the spreading services offered by Social Web has changed the way of communication and cooperation among citizens, policy-makers, governance bodies and civil society actors. One of the main goals of policymakers is to motivate citizens for participation in policy-making processes. UbiPOL ((Ubiquitous Participation Platform for Policy-making, ICT-2009.7.3(ICT for Governance and Policy Modelling), 2009-2011) aimed to develop a ubiquitous solution, which emphasizes citizens' participation in policy-making processes (PMPs) regardless of their current location and time. Ontology-based opinion mining component of Ubipol system has a crucial role in citizens' commitment, because it empowers them to contribute in policy making. This paper presents the ontology-based semi-automatic approach and tool for sentiment analysis in Ubipol system, which include lexicon extraction from a large corpus of documents. Aspect-based opinion summarization of user reviews and its combination with domain ontology development are discussed as well.
Citation - Scopus: 2
Comparison of Dynamic Itemset Mining Algorithms for Multiple Support Thresholds
(Association for Computing Machinery (ACM), 2017) Abuzayed, Nourhan; Ergenç, Belgin
Mining1 frequent itemsets is an important part of association rule mining process. Handling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Most of the existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms are static. This work focuses on dynamic update problem of frequent itemsets under multiple support thresholds and proposes tree-based Dynamic CFP-Growth++ algorithm. Proposed algorithm is compared to our previous dynamic algorithm Dynamic MIS [50] and a recent static algorithm CFP-Growth++ [2] and, findings are; in dynamic database, 1) both of the dynamic algorithms are better than the static algorithm CFP-Growth++, 2) as memory usage performance; Dynamic CFP-Growth++ performs better than Dynamic MIS, 3) as execution time performance; Dynamic MIS is better than Dynamic CFP-Growth++. In short, Dynamic CFP-Growth++ and Dynamic MIS have a trade-off relationship in terms of memory usage and execution time.
Citation - WoS: 1
Citation - Scopus: 1
A Relativistic Opinion Mining Approach To Detect Factual or Opinionated News Sources
(Springer Verlag, 2017) Sezerer, Erhan; Tekir, Selma
The credibility of news cannot be isolated from that of its source. Further, it is mainly associated with a news source’s trustworthiness and expertise. In an effort to measure the trustworthiness of a news source, the factor of “is factual or opinionated” must be considered among others. In this work, we propose an unsupervised probabilistic lexicon-based opinion mining approach to describe a news source as “being factual or opinionated”. We get words’ positive, negative, and objective scores from a sentiment lexicon and normalize these scores through the use of their cumulative distribution. The idea behind the use of such a statistical approach is inspired from the relativism that each word is evaluated with its difference from the average word. In order to test the effectiveness of the approach, three different news sources are chosen. They are editorials, New York Times articles, and Reuters articles, which differ in their characteristic of being opinionated. Thus, the experimental validation is done by the analysis of variance on these different groups of news. The results prove that our technique can distinguish the news articles from these groups with respect to “being factual or opinionated” in a statistically significant way.
A Data Coding and Screening System for Accident Risk Patterns: A Learning System
(WITPress, 2011) Geçer Sargın, Feral; Geçer Sargın, Feral; Duvarcı, Yavuz; Duvarcı, Yavuz; İnan, E.; İnan, E.; Kumova, Bora İsmail; Kumova, Bora İsmail; Atay Kaya, İlgi; Atay Kaya, İlgi
Accidents on urban roads can occur for many reasons, and the contributing factors together pose some complexity in the analysis of the casualties. In order to simplify the analysis and track changes from one accident to another for comparability, an authentic data coding and category analysis methods are developed, leading to data mining rules. To deal with a huge number of parameters, first, most qualitative data are converted into categorical codes (alpha-numeric), so that computing capacity would also be increased. Second, the whole data entry per accident are turned into ID codes, meaning each crash is possibly unique in attributes, called 'accident combination', reducing the large number of similar value accident records into smaller sets of data. This genetical code technique allows us to learn accident types with its solid attributes. The learning (output averages) provides a decision support mechanism for taking necessary cautions for similar combinations. The results can be analyzed by inputs, outputs (attributes), time (years) and the space (streets). According to Izmir's case results; sampled data and its accident combinations are obtained for 3 years (2005 - 2007) and their attributes are learned. © 2011 WIT Press.
Citation - Scopus: 16
Comparison of Two Association Rule Mining Algorithms Without Candidate Generation
(ACTA Press, 2010) Yıldız, Barış; Ergenç, Belgin
Association rule mining techniques play an important role in data mining research where the aim is to find interesting correlations among sets of items in databases. Although the Apriori algorithm of association rule mining is the one that boosted data mining research, it has a bottleneck in its candidate generation phase that requires multiple passes over the source data. FP-Growth and Matrix Apriori are two algorithms that overcome that bottleneck by keeping the frequent itemsets in compact data structures, eliminating the need of candidate generation. To our knowledge, there is no work to compare those two similar algorithms focusing on their performances in different phases of execution. In this study, we compare Matrix Apriori and FP-Growth algorithms. Two case studies analyzing the algorithms are carried out phase by phase using two synthetic datasets generated in order i) to see their performance with datasets having different characteristics, ii) to understand the causes of performance differences in different phases. Our findings are i) performances of algorithms are related to the characteristics of the given dataset and threshold value, ii) Matrix Apriori outperforms FP-Growth in total performance for threshold values below 10%, iii) although building matrix data structure has higher cost, finding itemsets is faster.

Computer Engineering / Bilgisayar Mühendisliği

Browse

Filters

Settings

Sort By

Results per page

Search Results