Identifying Finest Machine Learning Algorithm for Climate Data Imputation in the State of Minas Gerais, Brazil

Abstract

Climate prediction is a relevant activity for humanity and, for the success of the climate forecast, a good historical database is necessary. However, because of several factors, large historical data gaps are found at different meteorological stations, and studies to determine such missing weather values are still scarce. This article describes a study of a combination of several machine learning techniques to determine missing climatic values. This study produced a computational framework, formed by five different methods: linear regression, neural networks, support vector machines, regression bagged trees and random forest. Deep data analysis and a statistical study is conducted to compare these five methods. The study statistically demonstrated that the random forest technique was successful in obtaining missing climatic values for the state of Minas Gerais and can be widely used by the responsible agencies to improve their historical databases, consequently, their climate forecasts.

Published
2018-12-30