Geodesic Distances for Web Document Clustering

dc.contributor.author Tekir, Selma
dc.contributor.author Mansmann, Florian
dc.contributor.author Keim, Daniel
dc.coverage.doi 10.1109/CIDM.2011.5949449
dc.date.accessioned 2017-03-09T06:53:15Z
dc.date.available 2017-03-09T06:53:15Z
dc.date.issued 2011
dc.description Symposium Series on Computational Intelligence, IEEE SSCI2011 - 2011 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011; Paris; France; 11 April 2011 through 15 April 2011 en_US
dc.description.abstract While traditional distance measures are often capable of properly describing similarity between objects, in some application areas there is still potential to fine-tune these measures with additional information provided in the data sets. In this work we combine such traditional distance measures for document analysis with link information between documents to improve clustering results. In particular, we test the effectiveness of geodesic distances as similarity measures under the space assumption of spherical geometry in a 0-sphere. Our proposed distance measure is thus a combination of the cosine distance of the term-document matrix and some curvature values in the geodesic distance formula. To estimate these curvature values, we calculate clustering coefficient values for every document from the link graph of the data set and increase their distinctiveness by means of a heuristic as these clustering coefficient values are rough estimates of the curvatures. To evaluate our work, we perform clustering tests with the k-means algorithm on the English Wikipedia hyperlinked data set with both traditional cosine distance and our proposed geodesic distance. The effectiveness of our approach is measured by computing micro-precision values of the clusters based on the provided categorical information of each article. © 2011 IEEE. en_US
dc.identifier.citation Tekir, S., Mansmann, F., and Keim, D. (2011, April 11-15). Geodesic distances for web document clustering. Paper presented at the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011. doi:10.1109/CIDM.2011.5949449 en_US
dc.identifier.doi 10.1109/CIDM.2011.5949449 en_US
dc.identifier.doi 10.1109/CIDM.2011.5949449
dc.identifier.isbn 9781424499274
dc.identifier.scopus 2-s2.0-79961193406
dc.identifier.uri http://doi.org/10.1109/CIDM.2011.5949449
dc.identifier.uri https://hdl.handle.net/11147/5014
dc.language.iso en en_US
dc.publisher Institute of Electrical and Electronics Engineers Inc. en_US
dc.relation.ispartof IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011 en_US
dc.rights info:eu-repo/semantics/openAccess en_US
dc.subject Cluster analysis en_US
dc.subject Geodesic distances en_US
dc.subject Wikipedia en_US
dc.subject User interfaces en_US
dc.subject Web document clustering en_US
dc.title Geodesic Distances for Web Document Clustering en_US
dc.type Conference Object en_US
dspace.entity.type Publication
gdc.author.institutional Tekir, Selma
gdc.author.yokid 114496
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.coar.access open access
gdc.coar.type text::conference output
gdc.collaboration.industrial false
gdc.description.department İzmir Institute of Technology. Computer Engineering en_US
gdc.description.endpage 21 en_US
gdc.description.publicationcategory Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı en_US
gdc.description.scopusquality N/A
gdc.description.startpage 15 en_US
gdc.description.wosquality N/A
gdc.identifier.openalex W2146013359
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 1.0
gdc.oaire.influence 2.87191E-9
gdc.oaire.isgreen true
gdc.oaire.keywords Geodesic distances
gdc.oaire.keywords User interfaces
gdc.oaire.keywords Cluster analysis
gdc.oaire.keywords info:eu-repo/classification/ddc/004
gdc.oaire.keywords Wikipedia
gdc.oaire.keywords Web document clustering
gdc.oaire.popularity 6.0352434E-10
gdc.oaire.publicfunded false
gdc.oaire.sciencefields 0202 electrical engineering, electronic engineering, information engineering
gdc.oaire.sciencefields 02 engineering and technology
gdc.openalex.collaboration International
gdc.openalex.fwci 0.93196449
gdc.openalex.normalizedpercentile 0.75
gdc.opencitations.count 0
gdc.plumx.mendeley 3
gdc.plumx.scopuscites 6
gdc.scopus.citedcount 6
relation.isAuthorOfPublication.latestForDiscovery 57639474-3954-4f77-a84c-db8a079648a8
relation.isOrgUnitOfPublication.latestForDiscovery 9af2b05f-28ac-4014-8abe-a4dfe192da5e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Name:
5014.pdf
Size:
143.33 KB
Format:
Adobe Portable Document Format
Description:
Conference Paper

License bundle

Now showing 1 - 1 of 1
Loading...
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: