Validation of three satellite-derived databases of surface solar radiation using measurements performed at 42 stations in Brazil

The SoDa website (www.soda-pro.com) is populated with numerous solar-related Web services. Among them, three satellite-derived irradiation databases can be manually or automatically accessed to retrieve radiation values within the geographical coverage of the Meteosat Second Generation (MSG) satellite: the two most advanced versions of the HelioClim-3 database (versions 4 and 5, respectively HC3v4 and HC3v5), and the CAMS radiation service. So far, these databases have been validated against measurements of several stations in Europe and North Africa only. As the quality of such databases depends on the geographical regions and the climates, this paper extends this validation campaign and proposes an extensive comparison on Brazil and global irradiation received on a horizontal surface. Eleven stations from the Brazilian Institute of Space Research (INPE) network offer 1 min observations, and thirty-one stations from the Instituto Nacional de Meteorologia (INMET) network offer hourly observations. The satellite-derived estimates have been compared to the corresponding observations on hourly, daily and monthly basis. The bias relative to the mean of the measurements for HC3v5 is mostly comprised between 1 and 3 %, and that for HC3v4 between 2 and 5 %. These are very satisfactory results and they demonstrate that HC3v5, and to a lesser extent HC3v4, may be used in studies of long-term changes in SSI in Brazil. The situation is not so good with CAMS radiation service for which the relative bias is mostly comprised between 5 and 10 %. For hourly irradiation, the relative RMSE ranges from 15 to 33 %. The correlation coefficient is very large for all stations and the three databases, with an average of 0.96. The three databases reproduce well the hour from hour changes in SSI. The errors show a tendency to increase with the viewing angle of the MSG satellite. They are greater in tropical areas where the relative humidity in the atmosphere is important. It is concluded that except for the overestimation by CAMS radiation service, the three databases are suitable for studies of the solar resources in Brazil.


Introduction
Knowledge of the solar resource at ground level is a critical issue for developing solar energy. Of particular interest here is the surface solar irradiation (SSI), i.e. the downwelling broadband solar irradiation received at ground level on a horizontal plane. Many studies have demonstrated the potential of satellite images to assess the SSI (Rigollier et al., 2004), and their use within the framework of feasibility studies of solar plants.
The SoDa Service (www.soda-pro.com) is dedicated to professionals in solar energy (Gschwind et al., 2006) and provides an access to different solar-related resources, including databases containing estimates of the SSI (Wald et al., 2002). Among them are the HelioClim-3 database (abbreviated HC3) and the CAMS radiation service derived from images acquired by the Meteosat Second Generation (MSG) satellite. CAMS stands for Copernicus Atmosphere Monitoring Service and is a follow-up of the successive Europeanfunded MACC (Monitoring Atmospheric Composition & Climate) projects. The three databases contain SSI data over Published by Copernicus Publications. 82 C. Thomas et al.: Validation of three satellite-derived databases of surface solar radiation the area observed by MSG, i.e. Europe, Africa, the eastern part of South America and Middle East. They span from 2004 up to the current day. HC3 has been set up in 2005. It is widely used by professionals in solar energy, whether companies or academics with approximately 4 million requests for time-series of SSI per year. The CAMS radiation service has been introduced more recently in 2014 and is getting more and more popular, especially among academics.
So far, these databases have been validated against measurements of several stations in Europe and North Africa only. As the quality of such databases depends on the geographical regions and the climates, this article extends this validation campaign to Brazil, i.e. to a geographical area located on the edge of the field of view of MSG. It presents the results of an objective evaluation of HC3 version 4 (HC3v4) and 5 (HC3v5) and CAMS radiation service against the measurements of SSI performed at 31 stations belonging to the Instituto Nacional de Meteorologia (INMET) and 11 stations of the Brazilian Institute of Space Research (INPE) network.

HC3
MSG images are acquired every 15 min at Transvalor and MINES ParisTech premises. They are routinely processed with the Heliosat-2 method (Rigollier et al., 2004) to update the HC3 database (Blanc et al., 2011). Heliosat-2 combines a clear sky model with a "cloud index". The cloud index approach is based on the assumption that the appearance of a cloud over a pixel results in an increase of reflectance in visible imagery (Moussu et al., 1989); the attenuation of the downwelling shortwave irradiance by the atmosphere over a pixel is related to the magnitude of change between the reflectance that should be observed under a cloud-free sky and that currently observed. This magnitude of change is quantified by the cloud index. HC3v4 and HC3v5 are the two most advanced versions of HC3. HC3v4 uses the ESRA clear-sky model (Rigollier et al., 2000) with the climatological database of the Linke turbidity factor of Remund et al. (2003) as input. The major drawback of this database is that it is never updated to take into account changes in the atmosphere turbidity due to local effects such as maritime inputs, volcanoes, fires, evolution of the water vapor content, pollution. . . The McClear clear sky model (Lefèvre et al., 2013) is an outcome of the European-funded MACC (Monitoring Atmospheric Composition & Climate) projects. It takes as input updated information on the properties of the cloud-free atmosphere updated every 3 h and provides estimates of the SSI that should be observed if the sky were cloud-free for any site in the world since 2004. HC3v5 is an attempt to overcome the limitation of the climatological database of Remund et al. (2003) by combining HC3v4 and the McClear model .
HC3 estimates of SSI are available at integration periods (or summarizations) of 15 min, 1 h, 1 day and 1 month. The temporal coverage of data is from 1 February 2004 up to current day-2 for the version 5, and day-1, real time and even d+1 forecast data for version 4. HC3 provides global irradiation received on horizontal surface and empirical decomposition models are applied to compute all the components of the radiation over a horizontal, fix-tilted and normal plane for the actual weather conditions. When a request is launched, postprocessing layers are applied for instance to modulate the radiation values inside the MSG pixels to take into account the actual elevation of the required location, or to compute the shadowing effect of the far horizon. HC3 time series can be manually retrieved either via the SoDa website, or automatically via a machine-to-machine access. Several other valueadded services based on this resource are also available as a one-shot request, such as the purchase of a volume of HC3 time series or Typical Meteorological Years on a given area, irradiation maps, in-situ measurement completion. . .

CAMS radiation service
The CAMS radiation service, previously named MACC-RAD database (Hoyer-Klick et al., 2015), is available for free via the CAMS and SoDa portals. It makes use of the Heliosat-4 method which models the radiative transfer in atmosphere to compute the SSI (Qu, 2013). The SSI can be approximated as the product of the irradiance under clear atmosphere given by the McClear model by a modification factor due to APOLLO cloud properties and ground albedo (Oumbe et al., 2014). The database of APOLLO cloud properties is the property of the German DLR, and results from the processing of the different channels of the MSG satellite. The ground albedo is that from Blanc et al. (2014).
The CAMS radiation service provides time series of global, direct, and diffuse irradiations on horizontal surface, and direct irradiation on plane normal to sun rays (DNI for direct normal irradiation) for the actual weather conditions. The time coverage of data is from 1 February 2004 up to 2 days ago. Data are available with a time summarization ranging from 15 min to 1 month. CAMS radiation service as well as McClear can be accessed directly on the SoDa website using the corresponding interface, or using the interoperable OGC-compliant Web Processing Service (Lefèvre et al., 2013).

Brief overview of the stations and quality control
The in-situ measurements are made of two distinct data sets: the hourly measurements collected by 31 stations of the IN-MET network, and the 1 min measurements collected by the INPE network. Figures 1 and 2 show respectively the loca-  The geographical coordinates of the stations as well as the period of availability of the data are not given here for the sake of conciseness. They are available in the following pages: -INPE stations: www.soda-pro.com/help/helioclim/ helioclim-3-validation/brazil-inpe -INMET stations: www.soda-pro.com/help/helioclim/ helioclim-3-validation/brazil-inmet.
All stations, but one, have measurements spanning more than 1 year from 2005 to 2014, depending on the station. Prior to the comparison at the hourly summarization, a thorough quality check procedure as recommended by WMO (1981) has been applied on the measurements of the stations. As only the global SSI is available, no consistency check was possible via cross-comparison of the global and its direct and diffuse components. This procedure turned out to be insufficient for these measurements. The test to discard the "not-plausible" values (i.e. "Extremely Rare Limits" and "Physical Possible Limits") was too permissive and too many outliers remained. We have added additional tests based on G dry which is the SSI that should be observed if the sky were cloud-free with a null turbidity ("dry" sky). G dry was computed with the ESRA model setting the Linke turbidity factor to 0. We have kept a measurement G only if: Time shift was also observed for several days in the time series of INPE measurements. We decided to discard the measurements of a whole day if the time shift exceeds 10 % of the day length.
Remaining INPE 1 min measurements were aggregated to generate hourly irradiation if at least 85 % of the 1 min slots are available and valid. Aggregation is based on a smart average technique that takes into account the sun position at each minute and uses the clearness index.

Protocol of evaluation
The protocol addressed the comparison of hourly, daily and monthly satellite-based irradiation values against the measurements of the 42 stations.
The three satellite-derived databases under concern are derived from the MSG imagery, i.e. from images acquired every 15 min. Satellite estimates need to be aggregated to the hourly time step. This is done in the usual case by summing up the four 15 min values. It may happen that MSG images may have been unavailable. In the creation of HelioClim-3 and CAMS radiation service, the decision had been taken that if at least one image was available in the day, an intelligent interpolation based on the clearness index and taking into account the sun position every minute was applied to synthesize all 15 min irradiation within this day. Such cases are rare and were included in the comparison. Since February 2004, only 12 days do not have any valid image during the day and as a consequence are missing in the satellite-based databases.
At that step, both estimates and measurements were available at the hourly time step, and the comparison was carried out. First: -Measurements at night, sunrise and sunset were set to 0, as well as slots when hourly mean of irradiance does not exceed 10 W m −2 .
-Estimates were set to "Not a Number" (NaN) when the measurements are missing, and reciprocally. In this way, the data sets contain the same number of data with coincidence in time.
-For the validation of daily and monthly values, hourly values were aggregated to generate partial daily and monthly sums if at least respectively 65 and 50 % of the hourly slots were available and valid.
Then deviations (estimates−measurements) were computed. In addition to the correlation coefficient, the deviations were summarized by the bias, and the bias relative to the mean of the observation in percent, also named relative bias, and the root mean square error (RMSE), and relative RMSE in percent.
For the sake of conciseness, all results are not provided here. These results are available for three different summarizations (hourly, daily and monthly summarizations) at: The correlation coefficient is very large for all stations and the three databases, with an average of 0.96. These databases reproduce well the hour from hour changes in SSI. The correlation coefficients are similar for the three databases.
HC3v4 shows a tendency to overestimate the SSI: the relative bias ranges from −3 to 13 %, with most values comprised between 2 and 5 %. HC3v5 exhibits better results with a smaller bias: between −3 and 7 % and most values between 1 and 3 %. These are very satisfactory results and they demonstrate that HC3v5, and to a lesser extent HC3v4, may be used in studies of long-term changes in SSI in Brazil. The situation is not so good with CAMS radiation service. The relative bias ranges between 2 and 16 % with most values comprised between 5 and 10 %. This fairly large overestimation of the SSI by CAMS radiation service has been noted by Thomas et al. (2016) for stations located in Europe and Africa under different climates. The relative RMSE follows the trend found for relative bias. HC3v5 exhibits the smallest RMSE: from 13 to 31 % with most values comprised between 17 and 23 %. These values are satisfactory for most applications in solar energy domain.
Figures 3, 4 and 5 respectively exhibit the 2-D histogram of HC3v4, HC3v5 and CAMS radiation service versus the hourly measurements for the INMET station of Ararangua. The results are in line with the average values indicated in Table 1, with a relative bias, a relative RMSE and a correlation coefficient of: -6.9 and 22 % and 0.965 for HC3v4 -1.6 and 19 % and 0.969 for HC3v5 and 7.6 and 25.9 % and 0.949 for the CAMS radiation service.
The 31 INMET stations are mainly located in the western and the southern part of the country. Even if the global ranking of the databases remain unchanged whatever the area, the quality exhibits worse results for the stations located close to the border and on the west like Porto-Velho, Cacoal and   Vilhena than for the rest of the stations. Indeed, for respectively HC3v5, HC3v4 and CAMS radiation service, the relative bias for these stations is close to 7, 9 and 11 %; the relative RMSE in percent always exceeds 30 %, and the correlation is still good but just above 0.9. The main explanation to this trend is that the western part of the country is located on the edge of the MSG coverage. As a consequence, the pixel size in the east-west direction is above 12 km, and as the climate of the Rondônia is tropical with a lot of heavy rains and clouds, the estimation from satellite imagery faces difficulty in modelling accurately the SSI. This was observed by Marie-Joseph et al. (2013) for a similar climate in Guiana. Though the number of stations is low compared to IN-MET, the 11 INPE stations are more equi-distributed on the Brazilian territory. The global ranking of the databases is the same, with HC3v5 in the first position, and HC3v4 in the second one. A different behaviour of the statistical results can be observed in the north-eastern (NE) area, with the stations of Sao Luiz, Natal and Petrolina, and the south-western (SW) area with the stations of Cuiaba, Campogrande and Chapeco. In NE, the RMSE and correlation values are noticeably better than in SW, which is in line with the expectations since the stations in NE are closer to the nadir of the satellite and consequently the pixel size is smaller. What was unexpected is the increase of more than 3 % in bias for both HC3v4 and v5, and more than 5 % for CAMS radiation service. This is probably due to the climate of this area, in particular in the coastal areas, where humidity is very high. This tropical area experiences wet and dry seasons, and is known for the heavy rain falls and the high amount of water vapour in the atmosphere. Though more work should be devoted to this aspect, a preliminary conclusion is that in this NE area, the estimation of the SSI from satellite imagery is too large likely because the methods Heliosat-2 or -4 do not reproduce accurately the extinction of the radiation by clouds.

Conclusions and perspectives
This paper presents an objective comparison of three satellite-derived radiation databases against the measurements of 42 stations in Brazil. A great attention has been paid to the quality check procedure of the measurements; by relying on published papers and by completing the procedure adding a few criteria based on clear sky data to discard the remaining outliers.
Very satisfactory results have been obtained for HC3v5, and to a lesser extent HC3v4. Both may be used in studies of long-term changes in SSI in Brazil. CAMS radiation service is a recent service and is still in infancy. An overestimation of McClear of approx. 4 % has already been observed for the Baseline Surface Radiation Network station of Brasilia (Lefèvre et al., 2013). At least part of the overestimation of the CAMS radiation service could be potentially explained by the overestimation of the radiation in clear sky conditions. Further work is needed to confirm or infirm this observation, and also to improve the quality of the CAMS radiation service for applications in solar energy. Nevertheless, it is concluded that except for the overestimation by CAMS radiation service, the three databases are suitable for studies of the solar resources in Brazil.
This work is an extension of the validation survey carried out over several BSRN stations located in the Meteosat coverage (Thomas et al., 2016). It enables a better knowledge of the capacity of these databases in predicting the SSI. These validation results pave also the way to the geographical extension of the HelioClim-3 database scheduled in 2016-2017 by providing a first insight of the quality in the area covered by both MSG and GOES-East satellites to carry out cross validation.