Computer Engineering / Bilgisayar Mühendisliği
Permanent URI for this collectionhttps://hdl.handle.net/11147/10
Browse
5 results
Search Results
Now showing 1 - 5 of 5
Research Project DFIS- Çoklu destek eşiklerinde dinamik sık kümeler madenciliği ve gizleme platformu(2018) Ergenç Bostanoğlu, BelginBu proje kapsamında, veri madenciligi alanının en çok kullanılan yöntemi olan, iliski kuralları (association rules) madenciliginin basetmeye çalıstıgı zorluklardan, veri büyüklügü, veri dinamizmi, sık kümelerin (frequent itemsets) özel destek esik (support threshold) degerlerinin dikkate alınması ve paylasımında ortaya çıkabilecek duyarlı (sensitive) bilgilerin gizlenmesi (sensitive knowledge hiding) problemleri ile aynı anda ugrasan sınama platformunun gelistirilmesi hedeflenmektedir. Önerilecek olan platformdaki temel (baseline) iliski kuralı madenciligi islevi veri büyüklügü ile basedebilmek için veritabanını çoklu taramayacak, kolay yönetilebilir veri tipleri kullanacak ve etkin bellek kullanımı yapacaktır. Söz konusu islev, tüm platform için tek bir destek esik degeri ile çalısmak yerine veri kümelerine özel destek esik degerleri ile çalısabilir olacaktır. Platform parçalarından biri de temel iliski kuralı madenciligi islevinin dinamik sürümüdür; bu sürüm veri güncellemeleri geldiginde tüm iliski kuralı bulma sürecini bastan çalıstırmak yerine, güncellemeyi içeren veritabanı parçası ve önceki sonuçları dikkate alarak güncel sık kümeleri dinamik olarak bulur. Platform son olarak veritabanını, duyarlı bilgi çıkarımları yapılamayacak halde paylasmaya hazırlayabilecek yani dinamik sık küme gizleme (itemset hiding) islevi içermektedir.Article Citation - WoS: 9Citation - Scopus: 11A Qualitative Survey on Frequent Subgraph Mining(De Gruyter, 2018) Güvenoğlu, Büşra; Ergenç Bostanoğlu, BelginData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.Conference Object Citation - WoS: 2Citation - Scopus: 2Dynamic Itemset Mining Under Multiple Support Thresholds(IOS Press, 2016) Abuzayed, Nourhan; Ergenç Bostanoğlu, Belgin; Ergenç, BelginHandling dynamic aspect of databases and multiple support threshold requirements of items are two important challenges of frequent itemset mining algorithms. Existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms assume that the databases are static. This paper focuses on dynamic update problem of frequent itemsets under MIS (Multiple Item Support) thresholds and introduces Dynamic MIS algorithm. It is i) tree based and scans the database once, ii) considers multiple support thresholds, and iii) handles increments of additions, additions with new items and deletions. Proposed algorithm is compared to CFP-Growth++ and findings are; in dynamic database 1) Dynamic MIS performs better than CFP-Growth++ since it runs only on increments and 2) Dynamic MIS can achieve speed-up up to 56 times against CFP-Growth++.Conference Object Citation - Scopus: 4Itemset Hiding Under Multiple Sensitive Support Thresholds(SCITEPRESS, 2017) Öztürk, Ahmet Cumhur; Ergenç Bostanoğlu, BelginItemset mining is the challenging step of association rule mining that aims to extract patterns among items from transactional databases. In the case of applying itemset mining on the shared data of organizations, each party needs to hide its sensitive knowledge before extracting global knowledge for mutual benefit. Ensuring the privacy of the sensitive itemsets is not the only challenge in the itemset hiding process, also the distortion given to the non-sensitive knowledge and data should be kept at minimum. Most of the previous works related to itemset hiding allow database owner to assign unique sensitive threshold for each sensitive itemset however itemsets may have different count and utility. In this paper we propose a new heuristic based hiding algorithm which 1) allows database owner to assign multiple sensitive threshold values for sensitive itemsets, 2) hides all user defined sensitive itemsets, 3) uses heuristics that minimizes loss of information and distortion on the shared database. In order to speed up hiding steps we represent the database as Pseudo Graph and perform scan operations on this data structure rather than the actual database. Performance evaluation of our algorithm Pseudo Graph Based Sanitization (PGBS) is conducted on 4 real databases. Distortion given to the nonsensitive itemsets (information loss), distortion given to the shared data (distance) and execution time in comparison to three similar algorithms is measured. Experimental results show that PGBS is competitive in terms of execution time and distortion and achieves reasonable performance in terms of information loss amongst the other algorithms. © 2017 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.Conference Object Citation - WoS: 7Citation - Scopus: 20Vertical Pattern Mining Algorithm for Multiple Support Thresholds(Elsevier Ltd., 2017) Darrab, Sadeq; Ergenç Bostanoğlu, Belgin; Ergenç, BelginFrequent pattern mining is an important task in discovering hidden items that co-occur (itemset) more than a predefined threshold in a database. Mining frequent itemsets has drawn attention although rarely occurring ones might have more interesting insights. In existing studies, to find these interesting patterns (rare itemsets), user defined single threshold should be set low enough but this results in generation of huge amount of redundant itemsets. We present Multiple Item Support-eclat; MIS-eclat algorithm, to mine frequent patterns including rare itemsets under multiple support thresholds (MIS) by utilizing a vertical representation of data. We compare MIS-eclat to our previous tree based algorithm, MISFP-growth28 and another recent algorithm, CFP-growth++22 in terms of execution time, memory usage and scalability on both sparse and dense databases. Experimental results reveal that MIS-eclat and MISFP-growth outperform CFP-growth++ in terms of execution time, memory usage and scalability.
