SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

SimUSF: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption

详细信息查看全文

作者：Thilak L. Fernando ; Geoffrey I. Webb
关键词：Similarity measure ; Interval scale ; Clustering ; CBMIR
刊名：Data Mining and Knowledge Discovery
出版年：2017
出版时间：January 2017
年：2017
卷：31
期：1
页码：264-286
全文大小：
刊物类别：Computer Science
刊物主题：Data Mining and Knowledge Discovery; Artificial Intelligence (incl. Robotics); Information Storage and Retrieval; Statistics for Engineering, Physics, Computer Science, Chemistry and Earth Sciences;
出版者：Springer US
ISSN：1573-756X
卷排序：31

文摘

Similarity measures are central to many machine learning algorithms. There are many different similarity measures, each catering for different applications and data requirements. Most similarity measures used with numerical data assume that the attributes are interval scale. In the interval scale, it is assumed that a unit difference has the same meaning irrespective of the magnitudes of the values separated. When this assumption is violated, accuracy may be reduced. Our experiments show that removing the interval scale assumption by transforming data to ranks can improve the accuracy of distance-based similarity measures on some tasks. However the rank transform has high time and storage overheads. In this paper, we introduce an efficient similarity measure which does not consider the magnitudes of inter-instance distances. We compare the new similarity measure with popular similarity measures in two applications: DBScan clustering and content based multimedia information retrieval with real world datasets and different transform functions. The results show that the proposed similarity measure provides good performance on a range of tasks and is invariant to violations of the interval scale assumption.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700