Preview

Dokuchaev Soil Bulletin

Advanced search

The large scale digital mapping of soil organic carbon using machine learning algorithms

https://doi.org/10.19047/0136-1694-2018-91-46-62

Abstract

The results of digital mapping of organic carbon content within the arable horizons of soils and the assessment of obtained models accuracy with the use of machine learning methods for the area of Central Russian Upland in Voronezh Oblast are presented. The digital mapping was based on 22 points of soil samplings, applied for the learning and verification of models, and also on several sets of predictor variables. We took also digital elevation model, its derivatives and also remote sensing data of different spatial resolution as predictor variables. Several methods were used to create the spatial variability models for the investigated property based on the decision trees methods: random forest, boosting regression trees and Bayessian regression trees. The assessment of the models obtained accuracy was conducted by a method of cross-validation. As the accuracy indices we used the determination coefficient, mean absolute error and the root mean square error. The modelling results showed that the use of predictor variables presented by digital elevation model, its derivatives and Landsat 8 data we were able to obtain more sustainable models. The determination coefficient varied from 0.6 to 0.7, RMSEcv, i.e., the prognosing error varied from 0.5791 to 0.6520. Whereas, the best model was obtained with the method of Bayessian regression trees; whereas the predictor variables presented by the digital elevation model, its derivatives and Sentinel 2 data determination coefficient varied from 0.47 to 0.55, and the prognosing error varied from 0.7031 to 0.7909. It was revealed that in the described models according to different data sets the most significant were the various predictor variables.

About the Authors

A. V. Chinilin
RSAU-MTAA
Russian Federation


I. Yu. Savin
V.V. Dokuchaev Soil Science Institute
Russian Federation


References

1. Добровольский Г.В., Урусевская И.С. География почв. М.: Изд-во Моск. ун-та, 2015. 458 c.

2. Жоголев А.В. Актуализация региональных почвенных карт на основе спутниковых и геоинформационных технологий (на примере Московской области): Автореф. дис.. к. с.-х. н. М., 2016. 22 c.

3. Савин И.Ю., Прудникова Е.Ю. Об оптимальном сроке спутниковой съемки для картографирования пахотных почв // Бюл. Почв. ин-та им. В.В. Докучаева. 2014. № 74. С. 66-77.

4. Флоринский И.В. Гипотеза Докучаева - центральная идея цифрового прогнозного почвенного картографирования (к 125-летию публикации). Почвоведение. 2012. № 4. С. 500-506.

5. Arrouays D., Savin I., Leenaars J., McBratney A.B. (eds.) GlobalSoilMap - Digital Soil Mapping from Country to Globe. Balkem: CRC Press, 2018. 174 p.

6. Arrouays D., McKenzie N., Hempel J., Richer de Forges A., McBratney A. GlobalSoilMap: basis of the global spatial soil information system. Balkem: CRC Press, 2014. 494 p.

7. Breiman L. Random Forests // Machine Learning. 2001. № 1 (45). C. 5-32. doi: 10.1023/A:1010933404324

8. Bui E.N., Henderson B.L., Viergever K. Knowledge discovery from models of soil properties developed through data mining // Ecological Modelling. 2006. № 3 (191). C. 431-446. doi: 10.1016/j.ecolmodel.2005.05.021

9. Chen T., Guestrin C. XGBoost: A Scalable Tree Boosting System // Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 785-794 с. doi: 10.1145/2939672.2939785

10. Chipman H.A., George E.I., McCulloch R.E. BART: Bayesian additive regression trees // The Annals of Applied Statistics. 2010. № 1 (4). C. 266-298. doi: 10.1214/09-AOAS285

11. Conrad O., Bechtel M., Bock M., Dietrich H., Fischer E. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4 // Geoscientific Model Development. 2015. № 7 (8). C. 1991-2007. doi: 10.5194/gmd-8-1991-2015

12. Gobin A. Participatory and spatial-modeling methods for land resources analysis. PhD thesis. Katholik Universiteit, Leuven, 2000. 282 c.

13. Grinand C., Arrouays D., Laroche D., Martin M.P. Extrapolating regional soil landscapes from an existing soil map: Sampling intensity, validation procedures, and integration of spatial context // Geoderma. 2008. № 1 (143). C. 180-190. doi: 10.1016/j.geoderma.2007.11.004

14. Hengl T., Heuvelink G.B.M., Kempen B., Leenaars J.G.B., Walsh M. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions // PLOS ONE. 2015. № 6 (10). C. e0125814. doi: 10.1371/journal.pone.0125814

15. Hengl T., Mendes de Jesus J., Heuvelink G.B.M., Ruiperez Gonzalez M., Kilibarda M. SoilGrids250m: Global gridded soil information based on machine learning // PLOS ONE. 2017. № 2 (12). C. e0169748. doi: 10.1371/journal.pone.0169748

16. Hengl T., Leenaars K., Shepherd K.D., Walsh M., Heuvelink G.B.M. Soil nutrient maps of Sub-Saharan Africa: assessment of soil nutrient content at 250 m spatial resolution using machine learning // Nutrient Cycling in Agroecosystems. 2017. № 1 (109). C. 77-102. doi: 10.1007/s10705-017-9870-x

17. Jenny H. Factors of Soil Formation // Soil Science. 1941. № 5 (52). C. 415. doi: 10.1097/00010694-194111000-00009

18. Kuhn M. Building Predictive Models in R Using the caret Package // Journal of Statistical Software. 2008. № 5 (28). doi: 10.18637/jss.v028.i05

19. Lagacherie P., Holmes S. Addressing geographical data errors in a classification tree for soil unit prediction // International Journal of Geographical Information Science. 1997. № 2 (11). C. 183-198. doi: 10.1080/136588197242455

20. McBratney A., Mendonça Santos M., Minasny B. On digital soil mapping // Geoderma. 2003. № 1-2 (117). C. 3-52. doi: 10.1016/S0016-7061(03)00223-4

21. Minasny B., McBratney A.B. A conditioned Latin hypercube method for sampling in the presence of ancillary information // Computers & Geosciences. 2006. № 9 (32). C. 1378-1388. doi: 10.1016/j.cageo.2005.12.009

22. Core R., Team R. A language and environment for statistical computing // 2016.

23. Sollich P., Krogh A. Learning with ensembles: How overfitting can be useful, Proceedings of the 1995 Conference, Vol. 8, 1996. 190-196 с.

24. Taghizadeh-Mehrjardi R., Minasny B., McBratney A.B., Triantafilis J. Digital soil mapping of soil classes using decision trees in central Iran // Proceedings of the 5th Global Workshop on Digital Soil Mapping, 2012. C. 197-202. doi: 10.1201/b12728-40

25. Vermote E., Justice C., Claverie M., Franch B. Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product // Remote Sensing of Environment. 2016. № 185. C. 46-56. doi: 10.1016/j.rse.2016.04.008


Review

For citations:


Chinilin A.V., Savin I.Yu. The large scale digital mapping of soil organic carbon using machine learning algorithms. Dokuchaev Soil Bulletin. 2018;(91):46-62. (In Russ.) https://doi.org/10.19047/0136-1694-2018-91-46-62

Views: 996


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 0136-1694 (Print)
ISSN 2312-4202 (Online)