Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Permanent URI for this collectionhttps://hdl.handle.net/11147/7148

Browse

Search Results

Now showing 1 - 7 of 7
  • Article
    Citation - Scopus: 1
    An Interestingness Measure for Knowledge Bases
    (Elsevier, 2023) Oğuz, Damla; Soygazi, Fatih
    Association rule mining and logical rule mining both aim to discover interesting relationships in data or knowledge. In association rule mining, relationships are identified based on the occurrence of items in a dataset, while in logical rule mining, relationships are determined based on logical relationships between atoms in a knowledge base. Association rule mining has been widely studied in transactional databases, mainly for market basket analysis. Confidence has become the most widely used interesting measure to assess the strength of a rule. Many other interestingness measures have been proposed since confidence can be insufficient to filter negatively associated relationships. Recently, logical rule mining has become an important area of research, as new facts can be inferred by applying discovered logical rules. They can be used for reasoning, identifying potential errors in knowledge bases, and to better understand data. However, there are currently only a few measures for logical rule mining. Furthermore, current measures do not consider relations that can have several objects, called quasi-functions, which can dramatically alter the interestingness of the rule. In this paper, we focus on effectively assessing the strength of logical rules. We propose a new interestingness measure that takes into account two categories of relations, functions and quasi-functions, to assess the degree of certainty of logical rules. We compare our proposed measure with a widely used measure on both synthetic test data and real knowledge bases. We show that it is more effective in indicating rule quality, making it an appropriate interestingness measure for logical rule evaluation. & COPY; 2023 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
  • Article
    Citation - WoS: 1
    Citation - Scopus: 3
    Ignoring Internal Utilities in High-Utility Itemset Mining
    (MDPI, 2022) Oğuz, Damla
    High-utility itemset mining discovers a set of items that are sold together and have utility values higher than a given minimum utility threshold. The utilities of these itemsets are calculated by considering their internal and external utility values, which correspond, respectively, to the quantity sold of each item in each transaction and profit units. Therefore, internal and external utilities have symmetric effects on deciding whether an itemset is high-utility. The symmetric contributions of both utilities cause two major related challenges. First, itemsets with low external utility values can easily exceed the minimum utility threshold if they are sold extensively. In this case, such itemsets can be found more efficiently using frequent itemset mining. Second, a large number of high-utility itemsets are generated, which can result in interesting or important high-utility itemsets that are overlooked. This study presents an asymmetric approach in which the internal utility values are ignored when finding high-utility itemsets with high external utility values. The experimental results of two real datasets reveal that the external utility values have fundamental effects on the high-utility itemsets. The results of this study also show that this effect tends to increase for high values of the minimum utility threshold. Moreover, the proposed approach reduces the execution time.
  • Article
    Citation - WoS: 9
    Citation - Scopus: 11
    A Qualitative Survey on Frequent Subgraph Mining
    (De Gruyter, 2018) Güvenoğlu, Büşra; Ergenç Bostanoğlu, Belgin
    Data mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 1
    A Relativistic Opinion Mining Approach To Detect Factual or Opinionated News Sources
    (Springer Verlag, 2017) Sezerer, Erhan; Tekir, Selma
    The credibility of news cannot be isolated from that of its source. Further, it is mainly associated with a news source’s trustworthiness and expertise. In an effort to measure the trustworthiness of a news source, the factor of “is factual or opinionated” must be considered among others. In this work, we propose an unsupervised probabilistic lexicon-based opinion mining approach to describe a news source as “being factual or opinionated”. We get words’ positive, negative, and objective scores from a sentiment lexicon and normalize these scores through the use of their cumulative distribution. The idea behind the use of such a statistical approach is inspired from the relativism that each word is evaluated with its difference from the average word. In order to test the effectiveness of the approach, three different news sources are chosen. They are editorials, New York Times articles, and Reuters articles, which differ in their characteristic of being opinionated. Thus, the experimental validation is done by the analysis of variance on these different groups of news. The results prove that our technique can distinguish the news articles from these groups with respect to “being factual or opinionated” in a statistically significant way.
  • Conference Object
    Citation - WoS: 1
    Citation - Scopus: 2
    Fisher's Linear Discriminant Analysis Based Prediction Using Transient Features of Seismic Events in Coal Mines
    (Institute of Electrical and Electronics Engineers Inc., 2016) Köktürk Güzel, Başak Esin; Karaçalı, Bilge
    Identification of seismic activity levels in coal mines is important to avoid accidents such as rockburst. Creating an early warning system that can save lives requires an automated way of predicting. This study proposes a prediction algorithm for the AAIA'16 Data Mining Challenge: Predicting Dangerous Seismic Events in Active Coal Mines that is based on transient activity features along with average indicators evaluated by a Fisher's linear discriminant analysis. Performance evaluation experiments on the training datasets revealed an accuracy level of around 0.9438 while the performance on the test dataset was at a level of 0.9297. These results suggest that the proposed approach achieves high accuracy in predicting danger seismic events while maintaining low complexity.
  • Article
    Citation - Scopus: 1
    Annealing-Based Model-Free Expectation Maximisation for Multi-Colour Flow Cytometry Data Clustering
    (Inderscience Enterprises Ltd., 2016) Köktürk, Başak Esin; Karaçalı, Bilge
    This paper proposes an optimised model-free expectation maximisation method for automated clustering of high-dimensional datasets. The method is based on a recursive binary division strategy that successively divides an original dataset into distinct clusters. Each binary division is carriedout using a model-free expectation maximisation scheme that exploits the posterior probability computation capability of the quasi-supervised learningalgorithm subjected to a line-search optimisation over the reference set size parameter analogous to a simulated annealing approach. The divisions arecontinued until a division cost exceeds an adaptively determined limit. Experiment results on synthetic as well as real multi-colour flow cytometrydatasets showed that the proposed method can accurately capture the prominent clusters without requiring any prior knowledge on the number of clusters ortheir distribution models.
  • Conference Object
    Citation - WoS: 4
    Citation - Scopus: 4
    Mining Frequent Patterns From Microarray Data
    (Institute of Electrical and Electronics Engineers Inc., 2011) Yıldız, Barış; Şelale, Hatice
    The rapid development of computers and increasing amount of collected data made data mining a popular analysis tool. Data mining research is interrelated to many fields and one of the most important ones is bioinformatics. Among many techniques, mining association rules or frequent patterns is one of the most studied techniques in computer science and it is applicable to bioinformatics. Association analysis of gene expressions may be used as decision support mechanism for finding genes that are in same pathway. In this work, publicly available yeast microarray data has been analyzed using an efficient frequent pattern mining algorithm Matrix Apriori and frequently co-over-expressed genes have been identified. © 2011 IEEE.