用户名: 密码: 验证码:
基于随机森林模型的耕地表层土壤有机质含量空间预测①——以河南省辉县市为例
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Spatial Prediction of SOM Content in Topsoil Based on Random Forest Algorithm: A Case Study of Huixian City, Henan Province
  • 作者:韩杏杏 ; 陈杰 ; 王海洋 ; 巫振富 ; 程道全
  • 英文作者:HAN Xingxing;CHEN Jie;WANG Haiyang;WU Zhenfu;CHENG Daoquan;School of Water Conservation and Environment, Zhengzhou University;School of Public Administration, Zhengzhou University;Soil and Fertilizer Station of Henan Province;
  • 关键词:随机森林 ; 土壤有机质 ; 耕地预测制图 ; 辉县市
  • 英文关键词:Random forest;;Soil organic matter;;Agricultural land predictive mapping;;Huixian City
  • 中文刊名:TURA
  • 英文刊名:Soils
  • 机构:郑州大学水利与环境学院;郑州大学公共管理学院;河南省土壤肥料站;
  • 出版日期:2019-02-15
  • 出版单位:土壤
  • 年:2019
  • 期:v.51;No.299
  • 基金:国家自然科学基金项目(40971128)资助
  • 语种:中文;
  • 页:TURA201901021
  • 页数:8
  • CN:01
  • ISSN:32-1118/P
  • 分类号:154-161
摘要
耕地表层土壤有机质含量与作物生长发育密切相关,掌握土壤有机质空间分布对土壤肥力定向培养和农业生产指导具有重要意义。本研究以河南省辉县市5 922个耕地资源管理单元图斑中心点为基础数据,并分别按8∶2、7∶3、6∶4的比例随机划分训练数据集和验证数据集,以土壤类型作为辅助定性变量,利用随机森林模型模拟预测土壤有机质含量与自然环境变量(坡向、曲率、坡度、高程、土壤质地、归一化植被指数NDVI)、社会经济因子(排水能力、灌溉状况)之间的复杂非线性关系。结果表明:①当训练集与检验集中样点数量的比例为8∶2时,对应的随机森林模型总体上预测精度较高;②选用80%基础数据作为训练集时,预测得到的地图与已有图件相比,相关性达到0.859;③当用303个实地数据验证时,预测值与实测值的皮尔逊相关系数为0.595。通过对影响因子的重要性排序,发现土壤质地是研究区农用地表层土壤有机质含量的最重要影响因子。因此,随机森林模型作为机器学习和数据挖掘的有效方法,能较好地模拟输入变量与有机质含量之间的关系,预测图件与实际情况相符,但对有机质含量精细的差异不能很好体现。
        The content of topsoil organic matter strongly influences the growth of crops, so understanding its spatial distribution is of great significance in guiding agricultural production and improving soil fertility. Taking 5 922 center points of polygons in the map of cultivated land management units of the Huixian City in Henan Province as the basic data, this study tried to evaluate the complex non-linear relationship between topsoil organic matter content and influential factors at the county scale by using the model of random forest(RF). Each point included soil types, which were the auxiliary qualitative variables,environmental variables(slope, curvature, slope, elevation, soil texture, NDVI) and socio-economic factors(drainage capacity,irrigation status), and in addition, 5 922 center points was randomly divided into the training data set and verification data set with the ratio of 8︰2, 7︰3 and 6︰4 separately. Then the accuracy of predicted map of SOM was evaluated by three ways according to the model. The results showed that when the ratio of the training data set and verification data was 8︰2, the prediction accuracy of RF model was generally higher, and the correlation was 0.859 between the predicted and the existing maps of SOM.Pearson correlation coefficient was 0.595 between the predicated and measured data of 303 field points. Based on the importance of the influential factors, it was found that soil texture was the most important variable affecting distribution of SOM in the agricultural land of the study area. The results demonstrate that the RF method, as a machine learning and data mining approach,can simulate relationships between the input variables and SOM content, meanwhile, the maps can show reliable predicted results of SOM but couldn't disclose the fine differences in SOM.
引文
[1]Brady,Nyle C.The nature and properties of soils[M].New York:MacMillan,1984
    [2]Skjemstad,J O,Reicosky D C,Wilts A R,et al.Charcoal carbon in U.S.agricultural soils[J].Soil Science Society of America Journal,2002,66(4):1249-1255
    [3]张枝枝,张福平,燕玉超,等.渭河两岸缓冲带的土壤有机质含量分布特征及其影响因子[J].土壤,2017,49(2):393-399
    [4]向红英,柳维扬,彭杰,等.基于连续统去除法的南疆水稻土有机质含量预测[J].土壤,2016,48(2):389-394
    [5]潘根兴,李恋卿,张旭辉,等.中国土壤有机碳库量与农业土壤碳固定动态的若干问题[J].地球科学进展,2003,18(4):609-618
    [6]潘根兴.中国土壤有机碳库及其演变与应对气候变化[J].气候变化研究进展,2008,4(5):282-289
    [7]王岩松,李梦迪,朱连奇.土壤有机碳库及其影响因素的研究进展[J].中国农学通报,2015,31(32):123-131
    [8]Breiman L.Random forests[J].Machine Learning,2001,45(1):5-32
    [9]Li Q,Yue T,Wang C,et al.Spatially distributed modeling of soil organic matter across China:An application of artificial neural network approach[J].CATENA,2013,104(Supplement C):210-218
    [10]Na X D,Zhang S Q,Zhang H Q,et al.Integrating TM and ancillary geographical data with classification trees for land cover classification of marsh area[J].Chinese Geographical Science,2009,19(2):177-185
    [11]郭澎涛,李茂芬,罗微,等.基于多源环境变量和随机森林的橡胶园土壤全氮含量预测[J].农业工程学报,2015,31(5):194-200
    [12]Grimm R,Behrens T,M?rker M,et al.Soil organic carbon concentrations and stocks on Barro Colorado Island-Digital soil mapping using Random Forests analysis[J].Geoderma,2008,146:102-113
    [13]Wiesmeier M,Barthold F,Blank B,et al.Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem[J].Plant and Soil,2011,340(1/2):7-24
    [14]Heung B,Bulmer C E,Schmidt M G.Predictive soil parent material mapping at a regional-scale:A Random Forest approach[J].Geoderma,2014,214/215(2):141-154
    [15]Guo P T,Li M F,Luo W,et al.Digital mapping of soil organic matter for rubber plantation at regional scale:An application of random forest plus residuals kriging approach[J].Geoderma,2015,237/238:49-59
    [16]郭治兴,袁宇志,郭颖,等.基于地形因子的土壤有机碳最优估算模型[J].土壤学报,2017,54(2):331-343
    [17]中华人民共和国农业部.耕地地力调查与质量评价技术规程:NY/T 1634-2008[S].北京:中国标准出版社,2008
    [18]中华人民共和国农业部.测土配方施肥技术规范NY/N1118-2006[S].北京:中国标准出版社,2006
    [19]李启权,王昌全,岳天祥,等.基于定性和定量辅助变量的土壤有机质空间分布预测--以四川三台县为例[J].地理科学进展,2014,33(2):259-269
    [20]张晋昕,李河.回归分析中定性变量的赋值[J].循证医学,2005,5(3):169-171
    [21]马立平.统计数据标准化──无量纲化方法──现代统计分析方法的学与用(三)[J].北京统计,2000(3):34-35
    [22]李立东,陈杰,宋轩,等.空间回归模型在区域数字化土壤制图中的应用--以河南封丘县为例[J].土壤学报,2013,50(1):21-29
    [23]王茵茵,齐雁冰,陈洋,等.基于多分辨率遥感数据与随机森林算法的土壤有机质预测研究[J].土壤学报,2016,53(2):342-354
    [24]张海阳,齐俊传,毛健.基于R语言的数据挖掘算法研究[J].电脑知识与技术,2016,12(28):16-19
    [25]方匡南,吴见彬,朱建平,等.随机森林方法研究综述[J].统计与信息论坛,2011(3):32-38
    [26]李欣海.随机森林模型在分类与回归分析中的应用[J]..应用昆虫学报,2013,50(4):1190-1197
    [27]张良均,谢佳标,杨坦,等.R语言与数据挖掘[M].北京:机械工业出版社,2016
    [28]张雷,王琳琳,张旭东,等.随机森林算法基本思想及其在生态学中的应用--以云南松分布模拟为例[J].生态学报,2014,34(3):650-659

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700