Phd Degree / Doktora
Permanent URI for this collectionhttps://hdl.handle.net/11147/2869
Browse
8 results
Search Results
Now showing 1 - 8 of 8
Doctoral Thesis Frequent Subgraph Mining Over Dynamic Graphs(Izmir Institute of Technology, 2022) Abuzayed, Nourhan N. I.; Ergenç Bostanoğlu, BelginFrequent subgraph mining (FSM) is an essential and challenging graph mining task used in several applications. Modern applications employ evolving graphs, so FSM is more challenging with evolving graphs due to the streaming nature of the input, and the exponential time complexity of the algorithms. Sampling schemes are used if approximate results serve the purpose. This thesis introduces three approximate frequent subgraph mining algorithms in evolving graphs. those algorithms use novel controlled reservoir sampling. A sample reservoir of the evolving graph and an auxiliary heap reservoir data structure are kept together in a fixed sized reservoir. When the whole reservoir is full, and space has required the edges of lower degree or higher nodes are deleted. This selection is done by utilizing the heap data structure as a heap reservoir, which keeps the node degrees. By keeping the edges of higher degree nodes in the sample reservoir, accuracy is maximized without sacrificing time and space, in contrast, keeping the edges of lower degree nodes in the sample reservoir, accuracy is minimized with higher time and space. The first algorithm is Controlled Reservoir Sampling with Unlimited heap size (UCRS), where the used heap reservoir size is unlimited. The second algorithm is Controlled Reservoir Sampling with Limited heap size (LCRS). It is a modified version of UCRS, but the heap reservoir size is limited, as a result; sample reservoir size in the whole reservoir increases since the total number of nodes dedicated for the whole reservoir includes the nodes of the heap reservoir also. The third algorithm is Maximum Controlled Reservoir Sampling (MCRS). It is a modified version of UCRS, but the candidate edge for deletion is an edge with maximum node degrees. Experimental evaluations to measure scalability and recall performances of the three algorithms in comparison to state of art algorithms are performed on dense and sparse evolving graphs. Findings show that UCRS and LCRS algorithms are scalable and achieve better recall than edge based reservoir algorithms. LCRS achieves the best recall in comparison to edge or subgraph based reservoir algorithms. MCRS has the worst speed-up and recall among the other proposed and competitor algorithms.Doctoral Thesis Planar Geometry Estimation With Deep Learning(Izmir Institute of Technology, 2022) Uzyıldırım, Furkan Eren; Özuysal, MustafaUnderstanding the geometric structure of any scene is one of the oldest problems in Computer Vision. Most scenes include planar regions that provide information about the geometric structure and their automatic detection and segmentation plays an important role in many computer vision applications. In recent years, convolutional neural network architectures have been introduced for piece-wise planar segmentation. They outperform the traditional approaches that generate plane candidates with 3D segmentation methods from the explicitly reconstructed 3D point cloud. However, most of the convolutional neural network architectures are not designed and trained for outdoor scenes, because they require manual annotation, which is a time-consuming task that results in a lack of training data. In this thesis,we propose and develop a deep learning based framework for piece-wise plane detection and segmentation of outdoor scenes without requiring manually annotated training data. We exploit a network trained on imagery with annotated targets and an automatically reconstructed point cloud from either Structure from Motion-Multi View Stereo pipeline or monocular depth estimation network to estimate the training ground truth on the outdoor images in an iterative energy minimization framework. We show that the resulting ground truth estimate of various sets of images in the outdoor domain is good enough to improve network weights of different architectures trained on ground truth annotated images. Moreover, we demonstrate that this transfer learning scheme can be repeated multiple times iteratively to further improve the accuracy of plane detection and segmentation on monocular images of outdoor scenes.Doctoral Thesis Discovering Specific Semantic Relations Among Words Using Neural Network Methods(Izmir Institute of Technology, 2021) Sezerer, Erhan; Tekir, SelmaHuman-level language understanding is one of the oldest challenges in computer science. Many scientific work has been dedicated to finding good representations for semantic units (words, morphemes, characters) in languages. Recently, contextual language models, such as BERT and its variants, showed great success in downstream natural language processing tasks with the use of masked language modelling and transformer structures. Although these methods solve many problems in this domain and are proved to be useful, they still lack one crucial aspect of the language acquisition in humans: Experiential (visual) information. Over the last few years, there has been an increase in the studies that consider experiential information by building multi-modal language models and representations. It is shown by several studies that language acquisition in humans start with learning concrete concepts through images and then continue with learning abstract ideas through text. In this work, the curriculum learning method is used to teach the model concrete/abstract concepts through the use of images and corresponding captions to accomplish the task of multi-modal language modeling/representation. BERT and Resnet-152 model is used on each modality with attentive pooling mechanism on the newly constructed dataset, collected from the Wikimedia Commons. To show the performance of the proposed model, downstream tasks and ablation studies are performed. Contribution of this work is two-fold: a new dataset is constructed from Wikimedia Commons and a new multi-modal pre-training approach that is based on curriculum learning is proposed. Results show that the proposed multi-modal pre-training approach increases the success of the model.Doctoral Thesis Improved Image Based Localization Using Semantic Descriptors(Izmir Institute of Technology, 2021) Çınaroğlu, İbrahim; Baştanlar, YalınPlace recognition and Visual Localization (VL) for autonomous driving are the topics that keep their popularity in the field of Computer Vision. In this study, semantically improved Hybrid-VL approaches, that use localization aware semantic information in street-level driving images are proposed. Initially, Semantic Descriptor (SD) is extracted from semantically segmented images with a Convolutional Neural Network (CNN) trained for localization task. Then, image retrieval based VL task is performed using the approximate nearest neighbor search (ANNS) in 2D-2D matching context. This proposed method is named as SD-VL and its success is compared with the success of the state-of-the-art Local Descriptor (LD) based VL method (LD-VL) which is frequently used in the literature. Furthermore, with the aim of alleviating the shortcomings of both two methods, a novel decision-level Hybrid-VL (Hybrid-VL_DL ) method is proposed by combining SD-VL and LD-VL in post-processing stage. Also feature-level Hybrid-VL (Hybrid-VL_FL ) method is proposed in order to produce automatically tuned hybrid result. These proposed VL methods are examined on two challenging benchmarks; RobotCar Seasons and Malaga Downtown Data Sets. Moreover, a new VL data set Malaga Streetview Challenge is generated by collecting Google Streetview images on the same path of Malaga Downtown in order to observe impact of environmental and wide-baseline changes. This newly generated test set will be useful for researchers studying in this field. After all, the proposed semantically boosted Hybrid-VL_DL method is able to increase localization performance on both RobotCar Seasons and Malaga Streetview Challenge data sets by 11.6% and 4.5% Top-1 recall@5, and 4% and 5.4% recall@1 scores respectively. Additionally, reliability of our hyper-parameter (W) based Hybrid-VL_DL approach is supported by very close performance of the Hybrid-VL_FL method.Doctoral Thesis Density Grid Based Stream Clustering Algorithm(Izmir Institute of Technology, 2019) Ahmed, Rowanda Daoud; Ayav, Tolga; Ayav, Tolga; Dalkılıç, GökhanRecently as applications produce overwhelming data streams, the need for strategies to analyze and cluster streaming data becomes an urgent and a crucial research area for knowledge discovery. The main objective and the key aim of data stream clustering is to gain insights into incoming data. Recognizing all probable patterns in this boundless data which arrives at varying speeds and structure and evolves over time, is very important in this analysis process. The existing data stream clustering strategies so far, all suffer from different limitations, like the inability to find the arbitrary shaped clusters and handling outliers in addition to requiring some parameter information for data processing. For fast, accurate, efficient and effective handling for all these challenges, we proposed DGStream, a new online-offline grid and density-based stream clustering algorithm. We conducted many experiments and evaluated the performance of DGStream over different simulated databases and for different parameter settings where a wide variety of concept drifts, novelty, evolving data, number and size of clusters and outlier detection are considered. Our algorithm is suitable for applications where the interest lies in the most recent information like stock market, or if the analysis of existing information is required as well as cases where both the old and the recent information are all equally important. The experiments, over the synthetic and real datasets, show that our proposed algorithm outperforms the other algorithms in efficiency.Doctoral Thesis Fourier Analysis Based Testing of Finite State Machines(Izmir Institute of Technology, 2019) Takan, Savaş; Takan, Savaş; Ayav, Tolga; Ayav, TolgaFinite state machine (FSM) is a widely used modeling technique for circuit and software testing. FSM testing is a well-studied topic in the literature and there are several test case generation methods such as W, Wp, UIO, UIOv, DS, HSI and H. Despite the existing methods, there is still a need for alternative techniques with better performance in terms of test suite size, fault detection ratio and test generation time. In this thesis, two new test case generation methods, F and Fw have been proposed. The proposed test generation methods are based on Fourier analysis of Boolean functions. Fourier transformations have been studied extensively in mathematics, computer science and engineering. The proposed F method only tests outputs whereas Fw method also tests the next state with the outputs. In this context, the proposed methods are compared with UIO andWmethods in terms of characteristic, cost, fault detection ratio and effectiveness. The evaluation data are analyzed using T-Test and Hedges’ g. Results show that F and Fw methods outperform the existing methods in terms of the fault detection ratio per test.Doctoral Thesis Test Case Prioritization for Regression Testing Using Change Impact Analysis(Izmir Institute of Technology, 2019) Ufuktepe, Ekincan; Tuğlular, TuğkanThe test case prioritization aims to order test cases to increase rate of fault detection, and to reduce the time for detecting faults. In this study, a static source code analysis based approach, that uses change impact analysis is proposed. The proposed change impact analysis approach uses program slicing technique, method change information and Bayesian Network. With respect to the change impact analysis results, two test case prioritization approaches called LoM and LoM-Addtl are proposed, which is inspired by the "Law of Minimum" from biology and agronomy. The change impact analysis and test case prioritization approaches are performed on three well-known projects. The proposed change impact analysis results are evaluated with precision and recall metrics. On the other hand, the proposed test case prioritization methods LoM and LoM-Addtl are compared with five other test case prioritization techniques and evaluated with the APFD measure. The results of the change impact analysis showed that when a software has completed 75% of its development, 97%-100% of the affected methods and changed methods are predicted. On the other hand, the LoM and LoM-Addtl test case prioritization approaches showed consistent results when compared to the traditional test case prioritization techniques. However, it has been observed that, LoM and LoM-Addtl performed better than the traditional methods when version jumps are smaller. Furthermore, following an Additional in LoM (LoM-Addtl) has shown better results compare to LoM.Doctoral Thesis Dynamic Itemset Hiding Under Multiple Support Thresholds(Izmir Institute of Technology, 2018) Öztürk, Ahmet Cumhur; Ergenç Bostanoğlu, BelginData sharing is commonly performed between organizations for mutual benefits. However, if confidential knowledge is not hidden before the data is published it may pose threat to security and privacy. The privacy preserving frequent itemset mining is the process of hiding sensitive itemsets from being discovered with any frequent itemset mining algorithm. The privacy constraint of sensitive itemset hiding is sensitive threshold. If support of a given sensitive itemset is under the sensitive threshold, then this sensitive itemset is considered as non-interesting and hidden. One possible way of decreasing support of sensitive itemsets under predefined sensitive threshold is deleting items from a set of transaction. This type of frequent itemset sanitization is called distortion based frequent itemset hiding. The main focus of this thesis is to preserve sensitive itemsets with considering the multiple sensitive thresholds on both static and dynamic environments. Three different distortion based frequent itemset hiding algorithms proposed; Pseodo Graph Based Sanitization (PGBS), Itemset Oriented Pseudo Graph Based Sanitization (IPGBS) and DynamicPGBS are proposed. Both PGBS and IPGBS algorithms are designed for static environment and the DynamicPGBS algorithm is designed for the dynamic environment. The main objective of these three algorithms is to hide all sensitive itemsets with giving minimum distortion on non-sensitive knowledge and data in the resulting sanitized database.
