文摘
The popularity of GPS-equipped gadgets and mapping mashup applications has motivated the growth of geotagged Web resources as well as georeferenced multimedia applications. More and more research attention have been put on mining collaborative knowledge from mass user-contributed geotagged contents. However, little attention has been paid to generating high-quality geographical clusters, which is an important preliminary data-cleaning process for most geographical mining works. Previous works mainly use geotags to derive geographical clusters. Simply using one channel information is not sufficient for generating distinguishable clusters, especially when the location ambiguity problem occurs. In this paper, we propose a two-level clustering framework to utilize both the spatial and the semantic features of photographs for clustering. For the first-level geoclustering phase, we cluster geotagged photographs according to their spatial ties to roughly partition the dataset in an efficient way. Then we leverage the textual semantics in photographs' annotation to further refine the grouping results in the second-level semantic clustering phase. To effectively measure the semantic correlation between photographs, a semantic enhancement method as well as a new term weighting function have been proposed. We also propose a method for automatic parameter determination for the second-level spectral clustering process. Evaluation of our implementation on real georeferenced photograph dataset shows that our algorithm performs well, producing distinguishable geographical cluster with high accuracy and mutual information.