Computer Engineering / Bilgisayar Mühendisliği

Permanent URI for this collectionhttps://hdl.handle.net/11147/10

Browse

Search Results

Now showing 1 - 9 of 9
  • Conference Object
    Citation - Scopus: 3
    Integrated Approach for Privacy Preserving Itemset Mining
    (Springer, 2012) Yıldız, Barış; Ergenç, Belgin
    In this work, we propose an integrated itemset hiding algorithm that eliminates the need of pre-mining and post-mining and uses a simple heuristic in selecting the itemset and the item in itemset for distortion. Base algorithm (matrix-apriori) works without candidate generation so efficiency is increased. Performance evaluation demonstrates (1) the side effect (lost itemsets) and time while increasing the number of sensitive itemsets and support of itemset and (2) speed up by integrating the post mining. © 2012 Springer Science+Business Media, LLC.
  • Conference Object
    Citation - Scopus: 12
    Incremental Itemset Mining Based on Matrix Apriori Algorithm
    (Springer Verlag, 2012) Oğuz, Damla; Ergenç, Belgin
    Databases are updated continuously with increments and re-running the frequent itemset mining algorithms with every update is inefficient. Studies addressing incremental update problem generally propose incremental itemset mining methods based on Apriori and FP-Growth algorithms. Besides inheriting the disadvantages of base algorithms, incremental itemset mining has challenges such as handling i) increments without re-running the algorithm, ii) support changes, iii) new items and iv) addition/deletions in increments. In this paper, we focus on the solution of incremental update problem by proposing the Incremental Matrix Apriori Algorithm. It scans only new transactions, allows the change of minimum support and handles new items in the increments. The base algorithm Matrix Apriori works without candidate generation, scans database only twice and brings additional advantages. Performance studies show that Incremental Matrix Apriori provides speed-up between 41% and 92% while increment size is varied between 5% and 100%.
  • Conference Object
    Citation - Scopus: 2
    Adaptive Join Operator for Federated Queries Over Linked Data Endpoints
    (Springer Verlag, 2016) Oğuz, Damla; Yin, Shaoyi; Hameurlain, Abdelkader; Ergenç, Belgin; Dikenelli, Oğuz
    Traditional static query optimization is not adequate for query federation over linked data endpoints due to unpredictable data arrival rates and missing statistics. In this paper, we propose an adaptive join operator for federated query processing which can change the join method during the execution. Our approach always begins with symmetric hash join in order to produce the first result tuple as soon as possible and changes the join method as bind join when it estimates that bind join is more efficient than symmetric hash join for the rest of the process. We compare our approach with symmetric hash join and bind join. Performance evaluation shows that our approach provides optimal response time and has the adaptation ability to the different data arrival rates.
  • Conference Object
    Citation - Scopus: 2
    Comparison of Dynamic Itemset Mining Algorithms for Multiple Support Thresholds
    (Association for Computing Machinery (ACM), 2017) Abuzayed, Nourhan; Ergenç, Belgin
    Mining1 frequent itemsets is an important part of association rule mining process. Handling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Most of the existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms are static. This work focuses on dynamic update problem of frequent itemsets under multiple support thresholds and proposes tree-based Dynamic CFP-Growth++ algorithm. Proposed algorithm is compared to our previous dynamic algorithm Dynamic MIS [50] and a recent static algorithm CFP-Growth++ [2] and, findings are; in dynamic database, 1) both of the dynamic algorithms are better than the static algorithm CFP-Growth++, 2) as memory usage performance; Dynamic CFP-Growth++ performs better than Dynamic MIS, 3) as execution time performance; Dynamic MIS is better than Dynamic CFP-Growth++. In short, Dynamic CFP-Growth++ and Dynamic MIS have a trade-off relationship in terms of memory usage and execution time.
  • Conference Object
    Citation - Scopus: 7
    Orderbased Labeling Scheme for Dynamic Xml Query Processing
    (Springer Verlag, 2012) Assefa, Beakal Gizachew; Ergenç, Belgin
    Need for robust and high performance XML database systems increased due to growing XML data produced by today's applications. Like indexes in relational databases, XML labeling is the key to XML querying. Assigning unique labels to nodes of a dynamic XML tree in which the labels encode all structural relationships between the nodes is a challenging problem. Early labeling schemes designed for static XML document generate short labels; however, their performance degrades in update intensive environments due to the need for relabeling. On the other hand, dynamic labeling schemes achieve dynamicity at the cost of large label size or complexity which results in poor query performance. This paper presents OrderBased labeling scheme which is dynamic, simple and compact yet able to identify structural relationships among nodes. A set of performance tests show promising labeling, querying, update performance and optimum label size. © 2012 IFIP International Federation for Information Processing.
  • Conference Object
    Citation - Scopus: 2
    Hiding Sensitive Predictive Frequent Itemsets
    (International Association of Engineers, 2011) Yıldız, Barış; Ergenç, Belgin
    In this work, we propose an itemset hiding algorithm with four versions that use different heuristics in selecting the item in itemset and the transaction for distortion. The main strengths of itemset hiding algorithm can be stated as i) it works without pre-mining so privacy breech caused by revealing frequent itemsets in advance is prevented and efficiency is increased, ii) base algorithm (Matrix-Apriori) works without candidate generation so efficiency is increased, iii) sanitized database and frequent itemsets of this database are given as outputs so no post-mining is required and iv) simple heuristics like the length of the pattern and the frequency of the item in the pattern are used for selecting the item for distortion. We compare versions of our itemset hiding algorithm by their side effects, runtimes and distortion on original database.
  • Conference Object
    Citation - Scopus: 16
    Comparison of Two Association Rule Mining Algorithms Without Candidate Generation
    (ACTA Press, 2010) Yıldız, Barış; Ergenç, Belgin
    Association rule mining techniques play an important role in data mining research where the aim is to find interesting correlations among sets of items in databases. Although the Apriori algorithm of association rule mining is the one that boosted data mining research, it has a bottleneck in its candidate generation phase that requires multiple passes over the source data. FP-Growth and Matrix Apriori are two algorithms that overcome that bottleneck by keeping the frequent itemsets in compact data structures, eliminating the need of candidate generation. To our knowledge, there is no work to compare those two similar algorithms focusing on their performances in different phases of execution. In this study, we compare Matrix Apriori and FP-Growth algorithms. Two case studies analyzing the algorithms are carried out phase by phase using two synthetic datasets generated in order i) to see their performance with datasets having different characteristics, ii) to understand the causes of performance differences in different phases. Our findings are i) performances of algorithms are related to the characteristics of the given dataset and threshold value, ii) Matrix Apriori outperforms FP-Growth in total performance for threshold values below 10%, iii) although building matrix data structure has higher cost, finding itemsets is faster.
  • Conference Object
    Coefficient-Based Exact Approach for Frequent Itemset Hiding
    (IARIA, 2014) Leloğlu, Engin; Ayav, Tolga; Ergenç, Belgin
    Concealing sensitive relationships before sharing a database is of utmost importance in many circumstances. This implies to hide the frequent itemsets corresponding to sensitive association rules by removing some items of the database. Research efforts generally aim at finding out more effectivemethods in terms of convenience, execution time and side-effect. This paper presents a practical approach for hiding sensitive patterns while allowing as much nonsensitive patterns as possible in the sanitized database. We model the itemset hiding problem as integer programming whereas the objective coefficients allow finding out a solution with minimum loss of nonsensitive itemsets. We evaluate our method using three real datasets and compared the results with a previous work. The results show that information loss is dramatically minimized without sacrificing the accuracy.
  • Conference Object
    Citation - WoS: 4
    Citation - Scopus: 6
    Robust Placement of Mobile Relational Operators for Large Scale Distributed Query Optimization
    (Institute of Electrical and Electronics Engineers Inc., 2007) Ergenç, Belgin; Morvan, Franck; Hameurlain, Abdelkader
    This paper presents a compile-time placement method of mobile relational operators MROs in a large scale environment. MROs are self adaptive to changing runtime conditions by deciding their execution place if they discover compile-time estimation errors. Proposed placement methods tend to have a main drawback with MROs running over a large scale environment: their focus is on finding optimal performance depending on single-point estimation at compile-time, instead of optimal performance over an estimation interval. We propose: (i) to determine the migration space of a MRO including the sites on which the MRO is allowed to migrate during its execution, and (ii) to find the robust site which will allow acceptable response time in an estimation interval. Performance study shows that, with a risk of loosing around 6% in response time, it is possible to gain up to 300% with the proposed robust placement.