GCRIS

Now showing 1 - 3 of 3

Graphlet mining in big data
(01. Izmir Institute of Technology, 2024) Ergenç Bostanoğlu, Belgin; Çalmaz, Büşra; Bostanoğlu, Belgin Ergenç
This thesis explores graphlet counting algorithms, which are crucial for understanding the structural principles of complex networks such as bioinformatics, social networks, and network model evaluation. Counting graphlets in large networks is computationally challenging due to the combinatorial explosion of possibilities, particularly for larger graphlet sizes. To address this, we focus on clique graphlets, fully connected subgraphs, which reveal critical patterns in areas like protein structure analysis, social network modeling, community detection, and spam detection. Counting k-cliques (subgraphs with $k$ nodes) becomes infeasible for large datasets and high $k$ values. Existing exact and approximate algorithms struggle with large $k$, often failing when $k$ exceeds 10. To tackle these limitations, we propose BDAC (Boundary-Driven Approximations of K-Cliques), a novel algorithm that efficiently approximates k-clique counts using classical extremal graph theorems. BDAC uniquely provides lower and upper bounds for k-clique counts at both local (per vertex) and global levels, making it particularly suited for large, dense graphs with high $k$ values. Unlike existing methods, the algorithm's complexity remains unaffected by the value of $k$. We validate BDAC's efficiency and scalability through extensive comparisons with leading algorithms on diverse datasets, spanning k values from minor (e.g., 8) to large (e.g., 50). Parallelization techniques enhance its performance, making it highly scalable for analyzing large and dense networks. BDAC offers a significant advancement in k-clique counting, enabling the analysis of previously considered computationally intractable networks.
Frequent Subgraph Mining Over Dynamic Graphs
(Izmir Institute of Technology, 2022) Abuzayed, Nourhan N. I.; Ergenç Bostanoğlu, Belgin
Frequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications. Modern applications employ evolving graphs, so FSM is more challenging with evolving graphs due to the streaming nature of the input, and the exponential time complexity of the algorithms. Sampling schemes are used if approximate results serve the purpose. This thesis introduces three approximate frequent subgraph mining algorithms in evolving graphs. those algorithms use novel controlled reservoir sampling. A sample reservoir of the evolving graph and an auxiliary heap reservoir data structure are kept together in a fixed sized reservoir. When the whole reservoir is full, and space has required the edges of lower degree or higher nodes are deleted. This selection is done by utilizing the heap data structure as a heap reservoir, which keeps the node degrees. By keeping the edges of higher degree nodes in the sample reservoir, accuracy is maximized without sacrificing time and space, in contrast, keeping the edges of lower degree nodes in the sample reservoir, accuracy is minimized with higher time and space. The first algorithm is Controlled Reservoir Sampling with Unlimited heap size (UCRS), where the used heap reservoir size is unlimited. The second algorithm is Controlled Reservoir Sampling with Limited heap size (LCRS). It is a modified version of UCRS, but the heap reservoir size is limited, as a result; sample reservoir size in the whole reservoir increases since the total number of nodes dedicated for the whole reservoir includes the nodes of the heap reservoir also. The third algorithm is Maximum Controlled Reservoir Sampling (MCRS). It is a modified version of UCRS, but the candidate edge for deletion is an edge with maximum node degrees. Experimental evaluations to measure scalability and recall performances of the three algorithms in comparison to state of art algorithms are performed on dense and sparse evolving graphs. Findings show that UCRS and LCRS algorithms are scalable and achieve better recall than edge based reservoir algorithms. LCRS achieves the best recall in comparison to edge or subgraph based reservoir algorithms. MCRS has the worst speed-up and recall among the other proposed and competitor algorithms.
Dynamic Itemset Hiding Under Multiple Support Thresholds
(Izmir Institute of Technology, 2018) Öztürk, Ahmet Cumhur; Ergenç Bostanoğlu, Belgin
Data sharing is commonly performed between organizations for mutual benefits. However, if confidential knowledge is not hidden before the data is published it may pose threat to security and privacy. The privacy preserving frequent itemset mining is the process of hiding sensitive itemsets from being discovered with any frequent itemset mining algorithm. The privacy constraint of sensitive itemset hiding is sensitive threshold. If support of a given sensitive itemset is under the sensitive threshold, then this sensitive itemset is considered as non-interesting and hidden. One possible way of decreasing support of sensitive itemsets under predefined sensitive threshold is deleting items from a set of transaction. This type of frequent itemset sanitization is called distortion based frequent itemset hiding. The main focus of this thesis is to preserve sensitive itemsets with considering the multiple sensitive thresholds on both static and dynamic environments. Three different distortion based frequent itemset hiding algorithms proposed; Pseodo Graph Based Sanitization (PGBS), Itemset Oriented Pseudo Graph Based Sanitization (IPGBS) and DynamicPGBS are proposed. Both PGBS and IPGBS algorithms are designed for static environment and the DynamicPGBS algorithm is designed for the dynamic environment. The main objective of these three algorithms is to hide all sensitive itemsets with giving minimum distortion on non-sensitive knowledge and data in the resulting sanitized database.

Phd Degree / Doktora

Browse

Filters

Settings

Sort By

Results per page

Search Results