Master Degree / Yüksek Lisans Tezleri

Permanent URI for this collectionhttps://hdl.handle.net/11147/3008

Browse

Search Results

Now showing 1 - 5 of 5
  • Master Thesis
    Comparison of Classification Algorithms in Pitch Type Prediction Problem
    (Izmir Institute of Technology, 2020) Türkmen, Fatih; Ergenç Bostanoğlu, Belgin
    The dramatic increase in the use of IoT devices has been leading to a huge amount of valuable data to be discovered. The knowledge extraction from such a huge amount of data requires an organized scientific set of processes. This requirement has pointed out the importance of the data mining applications. As a major data mining application, classification is a supervised learning technique that requires a feature set and target class through the training process. For the training process, the key point is determining the appropriate feature set for the classification algorithm. The improvements in cutting-edge technologies such as high resolution camera systems have made extracting the insights about next pitch available. Consequently, pitch type prediction has been standing out as an important research topic. In order to predict next pitch type, existing researches mostly focus on pitcher profile, batter profile and previous pitch data in feature set. There is no study analyzing the effect of the zone information in the prediction of the next pitch type. Therefore, this study has analyzed the contribution of zone information in pitch type prediction. Our approach is that, we aimed to reveal the contribution of zones with the high strike low bat rates for pitch type decision in pitcher and batter player match up. This aim directed us to analyze the pitch type prediction problem for both zone-based and non-zone-based approaches so that we can exhibit how much zone information contributes to the problem through different classification algorithms.
  • Master Thesis
    Analyzing Social Media Data by Frequent Pattern Mining Methods
    (Izmir Institute of Technology, 2018) Güvenoğlu, Büşra; Ergenç Bostanoğlu, Belgin; Ergenç Bostanoğlu, Belgin
    Data mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large dataset. Social media data is one of the most popular and large heterogeneous data collected from social networking sites, microblogs, photo or video sharing sites. Social media represents the entities and their relations. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. The nodes of a graph represent entities and the edges of a graph represent the relations between the entities. So, graph mining is one of the most popular subdivisions of data mining. A frequent pattern is referred to as pattern that is more frequently encountered than the user-defined threshold in a dataset. Frequent patterns in a dataset can give important information about dataset. Using this information, data can be classified or clustered. Frequent patterns can provide different perspective on social media data with respect to sociology, consumer behaviour, marketing, communities. In this thesis, popular frequent pattern mining algorithms have been examined and it has been observed that most algorithms are not suitable for large datasets. Since data in today’s world, especially social networks, has very large data, the existing pattern mining algorithms are not suitable for this data. The aim of this thesis is to implement an existing frequent pattern mining algorithm in parallel manner and to find frequent patterns in a social media data.
  • Master Thesis
    Develepment of Framework for Frequent Itemset Mining Under Multiple Support Thresholds
    (Izmir Institute of Technology, 2016) Darrab, Sadeq Hussein Saleh; Ergenç Bostanoğlu, Belgin
    Frequent pattern mining is an essential method of data mining that is used to extract interesting patterns from massive databases. Traditional methods use single minimum support threshold to find out the complete set of frequent patterns. However, in real word applications, using single minimum support threshold is not adequate since it does not reflect the nature of each item and causes a problem called rare item problem. Recently, several methods have been studied to tackle this problem by avoiding using single minimum item support threshold. The nature of each item is considered where different items are specified with different minimum support thresholds. By this, the complete set of frequent patters are generated without creating uninteresting patterns and losing substantial patterns. In this thesis, we propose an efficient method, Multiple Item Support Frequent Pattern growth algorithm, MISFP-growth, to mine the complete set of frequent patterns with multiple item support thresholds. In this method, Multiple Item Support Frequent Pattern tree, MISFP-Tree, is constructed to store all crucial information to mine frequent patterns. Since in the construction of the MISFP-Tree is done with respect to minimum of Multiple Itemset Support values; pruning and reconstruction phases are not required. To show the efficiency of the proposed method, it is compared with a recent tree-based algorithm, CFP-growth++. To evaluate the performance of the proposed algorithm, various experiments are conducted on both real and synthetic datasets. Experimental results reveal that MISFP-growth outperforms the previous algorithm in terms of execution time, memory space as well as scalability.
  • Master Thesis
    Development of an Application for Dynamic Itemset Mining Under Multiple Support Thresholds
    (Izmir Institute of Technology, 2016) Abuzayed, Nourhan; Ergenç Bostanoğlu, Belgin
    Handling dynamic aspect of databases and multiple support threshold requirement of items are two important challenges of frequent itemset mining algorithms. Frequent itemsets should be updated when the database is updated without re-running the mining algorithm. Frequent itemset mining algorithm should consider different support thresholds in order not to cause rare item problem. Existing dynamic itemset mining algorithms are devised for single support threshold whereas multiple support threshold algorithms are static. This thesis focuses on dynamic update problem of frequent itemsets under multiple support thresholds and introduces Dynamic MIS1 and Dynamic MIS2 algorithms. They are i) tree based and scan the database once, ii) consider multiple support thresholds, and iii) handle increments of additions, additions with new items and deletions. Proposed algorithms are compared to CFP-Growth++ and findings are; in static databases 1) Dynamic MIS1 achieves up to 5 times speed-up against CFP-Growth++ since it does not require tree pruning and merging, 2) execution time of Dynamic MIS2 and CFP-Growth++ are similar, 3) memory usage of Dynamic MIS1 is higher than CFP-Growth++, since it keeps whole tree in memory, in dynamic database 1) Dynamic MIS1 and Dynamic MIS2 perform better than CFP-Growth++ since they run only on increments, 2) Dynamic MIS1 can achieve speed-up of 56 times against CFP-Growth++, whereas the speed-up of Dynamic MIS2 cannot exceed 2 times, 3) Dynamic MIS2 is slightly better than CFP-Growth++ until increment size is less than 85% when the database is large and sparse, 25% when the database is small and dense.
  • Master Thesis
    Dynamic Frequent Itemset Mining Based on Matrix Appriori Algorithm
    (Izmir Institute of Technology, 2012) Oğuz, Damla; Oğuz, Damla; Ergenç Bostanoğlu, Belgin; Ergenç, Belgin
    The frequent itemset mining algorithms discover the frequent itemsets from a database. When the database is updated, the frequent itemsets should be updated as well. However, running the frequent itemset mining algorithms with every update is inefficent. This is called the dynamic update problem of frequent itemsets and the solution is to devise an algorithm that can dynamically mine the frequent itemsets. In this study, a dynamic frequent itemset mining algorithm, which is called Dynamic Matrix Apriori, is proposed and explained. In addition, the proposed algorithm is compared using two datasets with the base algorithm Matrix Apriori which should be re-run when the database is updated.