用户名: 密码: 验证码:
Inferential Theory for Factor Models of Large Dimensions under Monotone-Missing Data.
详细信息   
  • 作者:Cahan ; Ercument.
  • 学历:Doctor
  • 年:2013
  • 毕业院校:University of Washington
  • Department:Economics.
  • ISBN:9781303673894
  • CBH:3608892
  • Country:USA
  • 语种:English
  • FileSize:1511725
  • Pages:126
文摘
In this dissertation we investigate the inferential theory for factor models with large cross-section N) and time series T) dimensions under monotone-missing data. The major contribution of the dissertation is the development and testing of an intuitive and parsimonious factor-based imputation FBI) algorithm that preserves the desirable asymptotic properties of the standard complete-data) factor models. FBI uses the principal component PC) estimator obtained from balanced subpanels of the data set to fill in the missing values. Well-behaved asymptotic properties of the factor model estimators obtained from FBI-imputed data sets would decrease the need for truncation,and make it possible to enjoy larger data sets and to carry out factor model inference with more confidence. We also provide the asymptotic distribution of the imputation error born out of FBI. In addition,we compare the small sample performance of FBI with that of Expectation-Maximization EM) algorithm and show that FBI outperforms EM in every measure we consider. Recent advances in information technology allowed researchers to access myriad of economic time series over an increasingly long span and at a reasonable cost. While the increase in the availability of data made it possible to test and understand the economic phenomena better,it also led to the problem of organizing the data in an easy to interpret form. Factor analysis,being a useful method for summarizing the information in data-rich environments,has received increasing attention,and the econometric analysis of large dimensional factor models has become a heavily researched topic.Recently,factor models have received particular attention in the macroeconomic forecast literature. New generation approximate factor models allows the number of observations to be large in both cross-section and time series dimensions. Stock and Watson 2002b) showed that the principal components are consistent estimators of the true latent factors when both N and T approach infinity without imposing any restriction on their relative rates of increase. Bai 2003) proved that estimated factors are consistent and in general asymptotically normal in presence of serial correlation and heteroskedasticity,while Bai and Ng 2002) studied the consistent estimation of number of factors under large N and T assumption. A method of rising interest is to estimate common factors by principal components from large data panels,and then to augment an otherwise standard regression with estimated factors. Stock and Watson 2002a) showed that the feasible forecasts constructed from the estimated factors together with the estimated coefficients converge to the infeasible forecast that would be obtained if the factors and coefficients were known. Bai and Ng 2006) determined the limiting distributions of forecast errors and least squares estimates obtained from factor augmented regressions. They showed that least squares estimates are T -consistent and asymptotically normal if T/N→infinity . The studies cited above obtained their results under the assumption that the data panel from which factors are estimated is balanced,i.e. it does not suffer from any missingness. However,in practice most data panels are unbalanced. When missing data is present,factor model estimates cannot be obtained directly. In order to estimate them,one should first transform the unbalanced panel at hand to a balanced one. This is achieved by either truncation or imputation. When using a truncated data panel one can rely on standard large sample theory since the resulting data set is balanced and contains no estimation error. However,the remaining data after truncation may not be representative for the entire population e.g. survivalship bias). Therefore,small sample performance of truncated data sets may become questionable. On the other hand,when missing data is imputed standard large sample theory is no longer valid if the imputation algorithm does not take into account the estimation error incurred during imputation. If the level of missingness i.e the number of missing cells in the data panel) is kept fixed while the dimensions of the data panel are allowed to increase indefinitely,the effect of missingness on estimation eventually dies out,and standard large sample theory applies as in truncation case. To prevent this bias,large sample theory for factor models that accounts for the missing data should allow missingness to grow indefinitely together with N and T,and determine the asymptotic properties of the factor model estimators. This is the path we take in this study: We consider the large sample properties of factor model estimators extracted from monotone-missing data sets that are imputed with the FBI algorithm. We refer to these estimators as the imputed data ID) estimators. To our knowledge,our research is the first to focus on the asymptotic properties of the factor model estimators obtained with PC from large data sets that have considerable missing data problem. The organization the dissertation is as follows: Chapter I starts with introducing the large dimensional factor models,their estimation,forecast and standard large sample theory. Then,it discusses the missing data mechanisms. Next,we briefly introduce the FBI algorithm and discuss how it exploits the balanced subpanels of the unbalanced data panel to impute the missing values. We refer to the factor model estimators obtained from imputed balanced data sets as imputed data ID) estimators. We show that ID estimators are consistent and aymptotically normal via an extensive Monte Carlo study. This indicates that FBI algorithm preserves the desirable properties of consistency and asymptotic normality of the factor model estimators obtained under complete data. We also consider the large sample behavior of different partitions of the factor estimator. We find that partitioned factor estimators that are exposed to missingness more converge slower and have a higher asymptotic variance. In Chapter II,we focus on the workings and main statistical properties of FBI algorithm. First,we study the statistical properties of the auxiliary interim) factor estimators and show that they are consistent and asymptotically normal. These results are instrumental in determining the large sample distribution of the imputation error. Then,we derives the asymptotic distribution of imputation error under FBI,and show that it is consistent and asymptotically normal. Finally,we express the partitioned CID factor estimators in terms of the observed and imputed components of completed data set Xˆ. This analysis reveal that differences in convergence rate and asymptotic variances across different partitioned estimators found in Chapter 1 can be explained with partitions exposure to missingness. That is,the higher the cross section missingness in a partition,the slower the converge and the bigger the asymptotic variance and vice versa. Chapter III chapter serves mainly two purposes: i) characterization of correlation structure under factor model,i) how the correlation structure affects the relative performance of FBI and Expectation-Maximization algorithms in small sample. To this end,we first establish the relation between factor and correlation structures. We start with a very general factor model under very weak assumptions and determine the analytical form of absolute correlation coefficients in terms of factor model components. Then,assuming that all factor variances are equal we derive the exact probability density function of absolute pairwise correlations. Next,we propose average absolute correlation denoted by mu|rho|) as a summary statistic measuring linear comovement among the series in the data set. Utilizing the derived absolute correlation density,we show that there is a negative relation between the number of factors and the average absolute correlation denoted by mu |rho|) among the series that admit a factor representation. This is a key finding for imputation purposes since all imputation methods exploit the correlation structure of the data set in one way or another. In the second part of the chapter,we compare the small sample performance of FBI and EM algorithms from various perspectives. To this end,we develop various performance metrics for that measure imputation accuracy from different dimensions. Using these measures,we show that FBI outperforms EM algorithm under many different scenarios. We also show that FBI is more robust to higher $r$ lower $mu_{|rho|}$) values than EM and should be preferred for data sets admitting a factor structure even if data is multivariate normal.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700