Africa is considered to be highly vulnerable to climate change, yet
the availability of observational data and derived products is limited. As
one element of the SASSCAL initiative (Southern African Science Service
Centre for Climate Change and Adaptive Land Management), a cooperation of
Angola, Botswana, Namibia, Zambia, South Africa and Germany, networks of
automatic weather stations have been installed or improved
(
Precise monitoring of climate variability and climate change are challenges for many regions in the world. For Africa it was noted that the lack of adequate data and observation systems seriously hinders the ability of scientists to assess the past and current state of climate (ACC, 2013). This applies to several developing regions in the world. Beside others, Southern Africa lacks historical and current ground-based climate records, which, in turn, hinders the capacity of understanding climate variability in the region (Niang et al., 2014).
As part of the SASSCAL initiative (Southern African Science Service Center
for Climate Change and Adaptive Land Management), 148 automatic weather
stations (AWS) have been installed since 2013 to improve this situation
(Kaspar et al., 2015; Posada et al., 2016). Most of them have been installed
in Angola, Botswana, Namibia and Zambia and few of them in South Africa, and
cover an area of approximately 3.4 million km
In an earlier study, Krähenmann et al. (2013) published an observational
reference dataset for daily minimum and maximum temperature at
0.22
The selection of an adequate interpolation method to produce a gridded dataset can be challenging since various factors, such as station density, data coverage and orographic features have to be considered (Krähenmann and Ahrens, 2013). Several interpolation methods have been proposed in the past (e.g. Meyers, 1994; Bolstad et al., 1998; Daly, 2006; Stahl et al., 2006), but, as Tveito et al. (2006) suggested, the interpolation of monthly mean temperature is simple – at least in comparison to precipitation data – considering its high correlation with elevation or land surface characteristics.
Here, three often used different algorithms have been compared using cross-validation (Isaaks and Srivastava, 1989; Wackernagel, 2003): (1) regression kriging (RK), (2) regression combined with two-dimensional inverse distance weighting (2D-IDW) (R2D, distance measure based on geographical coordinates) and (3) regression with three-dimensional inverse distance weighting (3D-IDW) (R3D, distance measure based on geographical coordinates and elevation). Kriging and 2D-IDW are common interpolation methods in geostatistics and earth sciences in general (Bartier and Keller, 1996; Oliver and Webster, 2015), whereas 3D-IDW is a newly developed approach (Krähenmann et al., 2013)
The study region (11 and 34
Digital elevation model (color coded) for the study area. Circles show the location of the SASSCAL Weathernet stations. The circle filling illustrates the data availability from black for data completely available over the whole period to empty circle for no data available for the period.
Our dataset includes 148 stations (Fig. 1) from the SASSCAL Weathernet
(
We compared three different hybrid interpolation methods that combine
multiple linear regression with residual interpolation. The general
procedure of the three hybrid algorithms is a three-step approach: (a) linear
regression using specific predictors to explain the variation of the monthly
variable, (b) interpolation of the monthly regression residuals to account
for the unexplained variation and (c) summation to yield the final result.
The following methods have been applied to perform residual interpolation:
two-dimensional inverse distance weighting (2D-IDW), three-dimensional IDW (3D-IDW),
and simple kriging (SK). For all statistics, methods and plots we
used R version 3.1.2 (R Development Core Team, 2016,
The regression coefficients necessary for the hybrid methods can be
estimated from point data (observations and predictors at station locations)
if the regression function applied is linear (Heuvelink and Pebesma, 1999).
Multiple linear regression amounts to the following equation:
IDW and kriging interpolation are based on the assumption that points closer
together are more alike than points further apart. For IDW, the values are
estimated from the weighted linear combination of nearby observational
stations (Naoum and Tsanis, 2004):
Cross-validation revealed 2 as the overall optimal beta value (which is the
distance weighting power). Therefore, cross-validation was run over the
whole period and for several weighting powers ranging from 1.5 to 3. To
prevent over-fitting and due to the low number of available observing
stations a constant distance weighting power (
3D-IDW is based on an Euclidean distance measure, which is expanded by
accounting for the elevation
As with the 2-D case, cross validation revealed a distance weighting power of 2
as most suitable.
Simple Kriging involves solving a set of linear equations to minimize the mean squared error of the residuals from the interpolating surface. To ensure spatial homogeneity of the residuals, which is a requirement to solve this least squares problem, a normal score transformation was applied to the data before interpolation (Deutsch and Journel, 1998). For the Simple Kriging process, we applied a spherical variogram model including a nugget variance. The accurate estimation of a variogram model requires a large sample size. However, the station coverage over Southern Africa is poor. Therefore, we have chosen to derive a global variogram from all available stations. Additionally, the previous regression is used to remove major geographic features and to yield a more homogeneous distribution of the residuals.
The variogram parameters were determined using an approach suggested by
Ahrens and Beck (2008). We used a data pool containing all monthly values of
the period September 2014 to October 2015, and estimated a range of
approximately 12
For the validation of the three interpolation methods we employed a leave-one-out cross-validation approach, which involves leaving out each station in turn and estimating its value from the remaining observations using an interpolation method. The estimated value is then compared with the actually observed value, which provides an estimate of the model error at this point (Oliver and Webster, 2015).
The Root Mean Square Error (RMSE) is an often used skill score in model
evaluation. Small RMSE values indicate high model accuracy with theoretical
optimum at a value of 0 denoting a perfect model prediction. The RMSE is
calculated as follows:
As a second performance criterion we used a measurement for the variance preserved in the interpolation data compared to the variance in the observations “VARI” (Krähenmann et al., 2013). VARI is the ratio of the variance of the interpolated values (using cross-validation) and the observed values. It ranges from 0 to 1, with high values indicating that a large portion of variance has been retained in the interpolated data. A VARI greater than one depicts an enhancement of the spatial temperature variation, whereas a value between 0 and 1 implies the observed temperature variability is reduced. The explained variance, however, explains how well the used method/parameters explain the spatial temperature variability.
The VARI equates to:
The suitability of the selected predictors for multiple linear regression
for minimum temperature (
The VARI was used to assess how much variance of the data was retained by
the model, with low values indicating little and high values high retention
of variance. As a single predictor,
RMSEs and VARI averaged over first year of the interpolation period
(September 2014–August 2015). Predictors: elevation (
In conclusion, the combination of all predictors yields the best result in both, RMSE and VARI. However, the more predictors are used within the linear regression, the greater the risk of over-fitting becomes. Previous studies reported high correlation between air temperature and elevation (Benavides et al., 2007; Kurtzman and Kadmon, 1999; Hudson and Wackernagel, 1994) and air temperature and latitude (Wackernagel, 1994), respectively. One reason for the poor elevation performance in the study region could be the effect of other variables (e.g. continentality) and latitudinal effects masking the elevation dependence of air temperature. This can be explained by the vast size of the study area. This issue may be reduced by splitting of the target area into several more homogeneous climate regions which is, however, not feasible due to the low station number. Furthermore, application of a non-linear temperature profile may significantly improve the gridding result. Yet, besides the low station density also the underrepresentation of higher elevations prevents its application.
Furthermore, we found an autocorrelation between longitude (
While the RMSEs (Fig. 2) for all methods exhibit a small decline over the
whole interpolation period for
In general, the evaluated interpolation methods show similar performance in
terms of VARI (Fig. 3) as previously observed regarding the RMSE. For
Results of the RMSE calculation for all interpolation models,
where R
Overall, there is a strong seasonal cycle in the RMSE for minimum temperature, whereas there is no pattern in maximum temperature.
The southern winter season (June, July, and August) is also the dry period, when lowest minimum temperatures occur, particularly in lower elevations. This is due to the lower relative humidity in this season. As a result, temperatures drop particularly in valleys and depressions, whereas air temperatures do not drop as much in elevated areas.
This cascades by a temperature inversion with non-linear temperature profiles to an increase in regression residuals, whose interpolation is compound with increased interpolation errors.
In such cases, modelling temperatures profiles using non-linear regression or varying regression predictors (Hofer et al., 2012) would be of benefit. However, the low station density (Fig. 1) and the under representation of summits and elevated areas does not allow for a robust estimate of non-linear temperature profiles.
Results of the VARI calculation for all interpolation models,
where R
Maps of the interpolated residuals of RI
Temperature map of July 2015 on basis of the linear regression.
Temperature map of July 2015 on basis of RI
Maximum temperature is less affected by changes in relative humidity. During
southern winter months, near surface areas get high insolation rates and air
temperature rises markedly. The air temperature rises therefore more
strongly in low areas than in elevated areas, where temperature remains more
constant. This removes temperature inversion leading to a more linear
temperature gradient during day time. Thus linear regression and residual
interpolation yield better interpolation results and lower interpolation
errors for
As recommended by Tveito et al. (2006) the interpolation results were also visually examined. Figure 4 gives an example of the interpolated residuals using SK, 2D- and 3D-IDW for July 2015, where all the characteristics of the respective methods can be observed. In the case of the 2D-IDW interpolation (Fig. 4, top panels), small circular high- or low-value structures can be observed, with the measuring stations at the centres, while in the area between stations the values quickly tend to zero. In comparison, the SK (Fig. 4, middle panels) yields structures of larger extent and more gradually changing values. Also the value range of SK is smaller than that of 2D-IDW, which can be attributed to the nugget effect (i.e. non-explained variance in the short distances) and thus leads to stronger smoothing. The 3D-IDW (Fig. 4, bottom panels) yields a larger value range and also smaller spatial structures than SK, which tend to follow terrain structures such as mountains and the coast line.
Examining the final interpolation results, linear regression (Fig. 5) and
RK (Fig. 6, middle panels) exhibit the least small-scale variation, where
temperature patterns follow the continentality-index and the DEM. Some of
these terrain following patterns may also be found in the RI-grids (Fig. 6,
top panels), however, the RI-based residuals maps contain also some circular
small-scale structures surrounding observing stations. R3D-based grids
(Fig. 6, bottom panels) exhibit some small-scale structures, however, they are
mostly non-circular and the station distribution is less evident. Overall,
the R3D method performed best in terms of RMSE, VARI and visual inspection,
and was thus chosen for the gridding of the
In this study, we generated a 0.22
In this period, temperature values range from
Minimum temperature maps on basis of three dimensional inverse
distance weighting method from September 2014 until August 2015. Unit:
Maximum temperature maps on basis of three dimensional inverse
distance weighting method from September 2014 until August 2015. Unit:
Analysing the interpolation maps for
In the maps of
A general problem is the station data availability; to get as reliable
results as possible, we used all available data for each month. Since
several stations did not deliver data within the whole period, we had to
calculate the REM for each single map. However, since the region is sparse
in data, this method was a compromise between data density and homogeneity
over time. More stations, especially in Angola, and a data transmission
would improve the model outcome. Furthermore, longer time series over a
period of ten years are required to evaluate the model results more
robustly. Using a multi model approach could improve the model result
further with regard to the seasonality of dry/wet seasons in the region
(e.g. Hofer et al., 2012). Our data set, therefore, encounters typical
problems of gridded data (as widely documented in the literature) and,
hence, information on the number of stations entering the gridding procedure
is vital for the user to assess the limitations for a specific place and
time (e.g. Mitchell and Jones, 2005). All data used in this publication are
available at (
A gridded dataset of monthly mean daily maximum (
The splitting of the interpolation process using regression combined with IDW increases robustness of the interpolation, since the regression and IDW distance parameters can be separately modified. Multi-dimensional IDW distance allows explicitly separating residuals in elevated areas from those in valleys and depressions. Although several caveats remain, it was decided to use one model for the whole time period to reduce the computational time (and therefore to make later applications of the tool more feasible).
The topography of the regions is characterized mainly by a flat area. Only at the Atlantic coast, the north of Angola and the valley of the Sambesi River there are more complex topological patterns. We assume that the results would not improve significantly by using a more complex model.
In general the aim of this work was to implement simple and easy to implement methods for temperature interpolations, which require only station observations, geographical coordinates, elevation and continentality – to provide an interpolation tool for local meteorological services.
Further studies could show the information increase with more complex models such as different model approaches for seasons or non-Euclidean distance measures.
Also the splitting of the target area and the application of a non-linear temperature profile would be of great benefit. However, this is not feasible due to the low station density and the under representation of elevated areas.
The data used in this publication are available at
The authors declare that they have no conflict of interest.
DWD's contribution to SASSCAL is funded by Germany's Ministry for Education and Research (BMBF, grant no. 01LG1201J). The SASSCAL-Weathernet dataset has been provided by Gerhard Muche, Thomas Hillmann and Katrin Josenhans (University of Hamburg). Edited by: O. E. Tveito Reviewed by: two anonymous referees