Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 4 of 4
  • Article
    Citation - WoS: 9
    Citation - Scopus: 11
    A Qualitative Survey on Frequent Subgraph Mining
    (De Gruyter, 2018) Güvenoğlu, Büşra; Ergenç Bostanoğlu, Belgin
    Data mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.
  • Conference Object
    Citation - WoS: 2
    Citation - Scopus: 2
    Dynamic Itemset Mining Under Multiple Support Thresholds
    (IOS Press, 2016) Abuzayed, Nourhan; Ergenç Bostanoğlu, Belgin; Ergenç, Belgin
    Handling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms assume that the databases are static. This paper focuses on dynamic update problem of frequent itemsets under MIS (Multiple Item Support) thresholds and introduces Dynamic MIS algorithm. It is i) tree based and scans the database once, ii) considers multiple support thresholds, and iii) handles increments of additions, additions with new items and deletions. Proposed algorithm is compared to CFP-Growth++ and findings are; in dynamic database 1) Dynamic MIS performs better than CFP-Growth++ since it runs only on increments and 2) Dynamic MIS can achieve speed-up up to 56 times against CFP-Growth++.
  • Conference Object
    Citation - Scopus: 4
    Itemset Hiding Under Multiple Sensitive Support Thresholds
    (SCITEPRESS, 2017) Öztürk, Ahmet Cumhur; Ergenç Bostanoğlu, Belgin
    Itemset mining is the challenging step of association rule mining that aims to extract patterns among items from transactional databases. In the case of applying itemset mining on the shared data of organizations, each party needs to hide its sensitive knowledge before extracting global knowledge for mutual benefit. Ensuring the privacy of the sensitive itemsets is not the only challenge in the itemset hiding process, also the distortion given to the non-sensitive knowledge and data should be kept at minimum. Most of the previous works related to itemset hiding allow database owner to assign unique sensitive threshold for each sensitive itemset however itemsets may have different count and utility. In this paper we propose a new heuristic based hiding algorithm which 1) allows database owner to assign multiple sensitive threshold values for sensitive itemsets, 2) hides all user defined sensitive itemsets, 3) uses heuristics that minimizes loss of information and distortion on the shared database. In order to speed up hiding steps we represent the database as Pseudo Graph and perform scan operations on this data structure rather than the actual database. Performance evaluation of our algorithm Pseudo Graph Based Sanitization (PGBS) is conducted on 4 real databases. Distortion given to the nonsensitive itemsets (information loss), distortion given to the shared data (distance) and execution time in comparison to three similar algorithms is measured. Experimental results show that PGBS is competitive in terms of execution time and distortion and achieves reasonable performance in terms of information loss amongst the other algorithms. © 2017 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
  • Conference Object
    Citation - WoS: 7
    Citation - Scopus: 20
    Vertical Pattern Mining Algorithm for Multiple Support Thresholds
    (Elsevier Ltd., 2017) Darrab, Sadeq; Ergenç Bostanoğlu, Belgin; Ergenç, Belgin
    Frequent pattern mining is an important task in discovering hidden items that co-occur (itemset) more than a predefined threshold in a database. Mining frequent itemsets has drawn attention although rarely occurring ones might have more interesting insights. In existing studies, to find these interesting patterns (rare itemsets), user defined single threshold should be set low enough but this results in generation of huge amount of redundant itemsets. We present Multiple Item Support-eclat; MIS-eclat algorithm, to mine frequent patterns including rare itemsets under multiple support thresholds (MIS) by utilizing a vertical representation of data. We compare MIS-eclat to our previous tree based algorithm, MISFP-growth28 and another recent algorithm, CFP-growth++22 in terms of execution time, memory usage and scalability on both sparse and dense databases. Experimental results reveal that MIS-eclat and MISFP-growth outperform CFP-growth++ in terms of execution time, memory usage and scalability.