Understanding data fusion within the framework of coupled matrix and tensor factorizations

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文

作者：Evrim Acar ; Morten Arendt Rasmussen ; Francesco Savorani ; Tormod N忙s ; Rasmus Bro
关键词：Data fusion ; Matrix factorization ; Tensor factorization ; Missing data ; CANDECOMP ; PARAFAC
刊名：Chemometrics and Intelligent Laboratory Systems
出版年：15 November, 2013
年：2013
卷：129
期：Complete
页码：53-63
全文大小：1602 K

文摘

Recent technological advances enable us to collect huge amounts of data from multiple sources. Jointly analyzing such multi-relational data from different sources, i.e., data fusion (also called multi-block, multi-view or multi-set data analysis), often enhances knowledge discovery. For instance, in metabolomics, biological fluids are measured using a variety of analytical techniques such as Liquid Chromatography-Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy. Data measured using different analytical methods may be complementary and their fusion may help in the identification of chemicals related to certain diseases. Data fusion has proved useful in many fields including social network analysis, collaborative filtering, neuroscience and bioinformatics.

In this paper, unlike many studies demonstrating the success of data fusion, we explore the limitations as well as the advantages of data fusion. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem, which jointly factorizes multiple data sets in the form of higher-order tensors and matrices by extracting a common latent structure from the shared mode. Using numerical experiments on simulated and real data sets, we assess the performance of coupled analysis compared to the analysis of a single data set in terms of missing data estimation and demonstrate cases where coupled analysis outperforms analysis of a single data set and vice versa.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700