Search Results

Now showing 1 - 4 of 4

Citation - Scopus: 2
Comparison of Dynamic Itemset Mining Algorithms for Multiple Support Thresholds
(Association for Computing Machinery (ACM), 2017) Abuzayed, Nourhan; Ergenç, Belgin
Mining1 frequent itemsets is an important part of association rule mining process. Handling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Most of the existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms are static. This work focuses on dynamic update problem of frequent itemsets under multiple support thresholds and proposes tree-based Dynamic CFP-Growth++ algorithm. Proposed algorithm is compared to our previous dynamic algorithm Dynamic MIS [50] and a recent static algorithm CFP-Growth++ [2] and, findings are; in dynamic database, 1) both of the dynamic algorithms are better than the static algorithm CFP-Growth++, 2) as memory usage performance; Dynamic CFP-Growth++ performs better than Dynamic MIS, 3) as execution time performance; Dynamic MIS is better than Dynamic CFP-Growth++. In short, Dynamic CFP-Growth++ and Dynamic MIS have a trade-off relationship in terms of memory usage and execution time.
Citation - Scopus: 19
Data Mining for Microrna Gene Prediction: on the Impact of Class Imbalance and Feature Number for Microrna Gene Prediction
(Institute of Electrical and Electronics Engineers Inc., 2013) Saçar, Müşerref Duygu; Allmer, Jens
MicroRNAs (miRNAs) are small, non-coding RNAs which are involved in the posttranscriptional modulation of gene expression. Their short (18-24) single stranded mature sequences are involved in targeting specific genes. It turns out that experimental methods are limited and that it is difficult, if not impossible, to establish all miRNAs and their targets experimentally. Therefore, many tools for the prediction of miRNA genes and miRNA targets have been proposed. Most of these tools are based on machine learning methods and within that area mostly two-class classification is employed. Unfortunately, truly negative data is impossible to attain and only approximations of negative data are currently available. Also, we recently showed that the available positive data is not flawless. Here we investigate the impact of class imbalance on the learner accuracy and find that there is a difference of up to 50% between the best and worst precision and recall values. In addition, we looked at increasing number of features and found a curve maximizing at 0.97 recall and 0.91 precision with quickly decaying performance after inclusion of more than 100 features. © 2013 IEEE.
Citation - WoS: 4
Citation - Scopus: 4
Mining Frequent Patterns From Microarray Data
(Institute of Electrical and Electronics Engineers Inc., 2011) Yıldız, Barış; Şelale, Hatice
The rapid development of computers and increasing amount of collected data made data mining a popular analysis tool. Data mining research is interrelated to many fields and one of the most important ones is bioinformatics. Among many techniques, mining association rules or frequent patterns is one of the most studied techniques in computer science and it is applicable to bioinformatics. Association analysis of gene expressions may be used as decision support mechanism for finding genes that are in same pathway. In this work, publicly available yeast microarray data has been analyzed using an efficient frequent pattern mining algorithm Matrix Apriori and frequently co-over-expressed genes have been identified. © 2011 IEEE.
Citation - Scopus: 16
Comparison of Two Association Rule Mining Algorithms Without Candidate Generation
(ACTA Press, 2010) Yıldız, Barış; Ergenç, Belgin
Association rule mining techniques play an important role in data mining research where the aim is to find interesting correlations among sets of items in databases. Although the Apriori algorithm of association rule mining is the one that boosted data mining research, it has a bottleneck in its candidate generation phase that requires multiple passes over the source data. FP-Growth and Matrix Apriori are two algorithms that overcome that bottleneck by keeping the frequent itemsets in compact data structures, eliminating the need of candidate generation. To our knowledge, there is no work to compare those two similar algorithms focusing on their performances in different phases of execution. In this study, we compare Matrix Apriori and FP-Growth algorithms. Two case studies analyzing the algorithms are carried out phase by phase using two synthetic datasets generated in order i) to see their performance with datasets having different characteristics, ii) to understand the causes of performance differences in different phases. Our findings are i) performances of algorithms are related to the characteristics of the given dataset and threshold value, ii) Matrix Apriori outperforms FP-Growth in total performance for threshold values below 10%, iii) although building matrix data structure has higher cost, finding itemsets is faster.

Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection

Browse

Filters

Settings

Sort By

Results per page

Search Results