Minimizing Information Loss in Shared Data: Hiding Frequent Patterns With Multiple Sensitive Support Thresholds

dc.contributor.author Bostanoğlu, Belgin Ergenç
dc.contributor.author Öztürk, Ahmet Cumhur
dc.coverage.doi 10.1002/sam.11458
dc.date.accessioned 2020-07-18T08:34:02Z
dc.date.available 2020-07-18T08:34:02Z
dc.date.issued 2020
dc.description.abstract Privacy preserving data mining (PPDM) is the process of protecting sensitive knowledge from being discovered by data mining techniques in case of data sharing. Privacy preserving frequent itemset mining (PPFIM) is a subtask and NP-hard problem of PPDM. Its objective is to modify a given database in such a way that none of the sensitive itemsets of the database owner can be obtained by any frequent itemset mining technique from the modified database. The main challenge of PPFIM is to minimize the distortion given to the data and nonsensitive knowledge while sanitizing all given sensitive itemsets. Distortion-based sensitive itemset hiding algorithms decrease the support of each sensitive itemset under a predefined sensitive threshold through sanitization. Most of the distortion-based itemset hiding algorithms allow database owner to define a single sensitive threshold for each sensitive itemset. However, this is a limitation to the database owner since the importance of each sensitive itemset varies. In this paper we propose a distortion-based itemset hiding algorithm that allows database owner to assign multiple sensitive thresholds, namely itemset oriented pseudo graph based sanitization (IPGBS) algorithm. The purpose of IPGBS algorithm is to give minimum distortion to the nonsensitive knowledge and data while hiding all sensitive itemsets. For this reason, the IPGBS algorithm modifies least amount of transaction and transaction content. The performance evaluation of the IPGBS algorithm is conducted by using two different counterparts on four different databases. The results show that the IPGBS algorithm is more efficient in terms of nonsensitive frequent itemset loss on both dense and sparse databases. It has considerable good results in terms of number of transactions modified, number of items deleted, execution time and total memory allocation as well. en_US
dc.identifier.doi 10.1002/sam.11458 en_US
dc.identifier.issn 1932-1864
dc.identifier.issn 1932-1872
dc.identifier.scopus 2-s2.0-85083673180
dc.identifier.uri https://doi.org/10.1002/sam.11458
dc.identifier.uri https://hdl.handle.net/11147/8831
dc.language.iso en en_US
dc.publisher Wiley en_US
dc.relation.ispartof Statistical Analysis and Data Mining en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Information loss en_US
dc.subject Itemset mining en_US
dc.subject Privacy preserving itemset mining en_US
dc.title Minimizing Information Loss in Shared Data: Hiding Frequent Patterns With Multiple Sensitive Support Thresholds en_US
dc.type Article en_US
dspace.entity.type Publication
gdc.author.institutional Bostanoğlu, Belgin Ergenç
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::journal::journal article
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology. Computer Engineering en_US
gdc.description.endpage 323 en_US
gdc.description.issue 4 en_US
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality Q1
gdc.description.startpage 309 en_US
gdc.description.volume 13 en_US
gdc.description.wosquality Q1
gdc.identifier.openalex W3019359384
gdc.identifier.wos WOS:000527077200001
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 2.0
gdc.oaire.influence 3.1032652E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 3.6690024E-9
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration National
gdc.openalex.fwci 0.2937191
gdc.openalex.normalizedpercentile 0.62
gdc.opencitations.count 2
gdc.plumx.crossrefcites 2
gdc.plumx.mendeley 6
gdc.plumx.scopuscites 1
gdc.scopus.citedcount 1
gdc.wos.citedcount 1
relation.isAuthorOfPublication.latestForDiscovery 3b51d444-157d-4dff-a209-e28543a80dcd
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
Statistical Analysis.pdf
Size:
4.41 MB
Format:
Adobe Portable Document Format