Improvements in K-Means Algorithm To Execute on Large Amounts of Data

Sülün, Erhan; Püskülcü, Halis

Improvements in K-Means Algorithm To Execute on Large Amounts of Data

dc.contributor.advisor	Püskülcü, Halis
dc.contributor.author	Sülün, Erhan
dc.contributor.author	Püskülcü, Halis
dc.contributor.other	03.04. Department of Computer Engineering
dc.contributor.other	03. Faculty of Engineering
dc.contributor.other	01. Izmir Institute of Technology
dc.date.accessioned	2014-07-22T13:51:15Z
dc.date.available	2014-07-22T13:51:15Z
dc.date.issued	2004
dc.description	Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2004	en_US
dc.description	Includes bibliographical references (leaves. 78)	en_US
dc.description	Text in English; Abstract: Turkish and English	en_US
dc.description	ix, 79 leaves	en_US
dc.description.abstract	By the help of large storage capacities of current computer systems, datasets of companies has expanded dramatically in recent years. Rapid growth of current companies. databases has raised the need of faster data mining algorithms as time is very critical for those companies.Large amounts of datasets have historical data about the transactions of companies which hold valuable hidden patterns which can provide competitive advantage to them. As time is also very important for these companies, they need to mine these huge databases and make accurate decisions in short durations in order to gain marketing advantage. Therefore, classical data mining algorithms need to be revised such that they discover hidden patterns and relationships in databases in shorter durations.In this project, K-means data mining algorithm has been proposed to be improved in performance in order to cluster large datasets in shorter time. Algorithm is decided to be improved by using parallelization. Parallelization of the algorithm has been considered to be a suitable solution as the popular way of increasing computation power is to connect computers and execute algorithms simultaneously on network of computers. This popularity also increases the availability of parallel computation clusters day by day. Parallel version of the K-means algorithm has been designed and implemented by using C language. For the parallelisation, MPI (Message Passing Interface) library hasbeen used. Serial algorithm has also been implemented by using C language for the purpose of comparison. And then, algorithms have been run for several times under same conditions and results have been discussed. Summarized results of these executions by using tables and graphics has showed that parallelization of the K-means algorithm has provied a performance gain almost proportional by the count of computers used for parallel execution.	en_US
dc.identifier.uri	https://hdl.handle.net/11147/3296
dc.language.iso	en	en_US
dc.publisher	Izmir Institute of Technology	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject.lcc	QA76.9.D343 .S62 2004	en
dc.subject.lcsh	Data mining	en
dc.title	Improvements in K-Means Algorithm To Execute on Large Amounts of Data	en_US
dc.type	Master Thesis	en_US
dspace.entity.type	Publication
gdc.author.institutional	Sülün, Erhan
gdc.coar.access	open access
gdc.coar.type	text::thesis::master thesis
gdc.description.department	Thesis (Master)--İzmir Institute of Technology, Computer Engineering	en_US
gdc.description.publicationcategory	Tez	en_US
gdc.description.scopusquality	N/A
gdc.description.wosquality	N/A
relation.isAuthorOfPublication	f3844554-c555-4f40-8a31-c2b1f5f2d3e6
relation.isAuthorOfPublication.latestForDiscovery	f3844554-c555-4f40-8a31-c2b1f5f2d3e6
relation.isOrgUnitOfPublication	9af2b05f-28ac-4014-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4004-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication	9af2b05f-28ac-4003-8abe-a4dfe192da5e
relation.isOrgUnitOfPublication.latestForDiscovery	9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1

Name:: T000441.pdf
Size:: 483.4 KB
Format:: Adobe Portable Document Format
Description:: MasterThesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master Degree / Yüksek Lisans Tezleri