Validation of the new HelioClim-3 version 4 real-time and short-term forecast service using 14 BSRN stations

Abstract. Meteosat Second Generation (MSG) satellite images acquired every 15 min during daytime are currently processed by the Heliosat-2 method every night to generate the HelioClim-3 (HC3) database of the surface solar irradiation for the day before. A new service is proposed based on version 4 of HC3 (HC3v4) that offers real-time and forecasted irradiation for horizons up to a few hours. The service is based on a local persistence of the clear-sky index. Its results were compared to coincident high quality 15 min global irradiations measured in fourteen stations belonging to the Baseline Surface Radiation Network (BSRN). For forecasts for a temporal horizon of 15 min ahead, the relative bias and root mean square error (RMSE) range respectively from 0 to 2 %, and 20 to 23 % for most stations. The correlation coefficient ranges from 0.94 to 0.95. These performances are similar to HC3v4 for the same stations. Expectedly, the quality of the forecasts degrades as the temporal horizon increases. For 1 h ahead forecasts of 15 min irradiation, the relative bias, root mean square error (RMSE) and correlation coefficient range respectively from −3 to 1 %, 30 to 37 %, and 0.90 to 0.91.


Introduction
The knowledge of the solar potential of a site or an area is essential for many applications; for instance, this ensures a good return on investment for the photovoltaic and concentrated solar power plants, but this is also important for other domains of application, such as agriculture, health... Satellite-derived surface solar irradiation (SSI) databases, such as HelioClim-3 (HC3), have already demonstrated their ability to supplement ground station measurements by providing a long-term archive of irradiation values over a large area and on a regular grid (Lefèvre et al., 2014).
The SoDa Service (www.soda-pro.com) is a one-stop access to many different resources and Web services related to solar energy (Gschwind et al., 2006). The SoDa Service is populated by different SSI databases, such as the successive versions of HC3. The two most advanced versions of HC3 are version 4 (HC3v4) and version 5 (HC3v5), which differ by the clear sky model exploited to derive SSI (Qu et al., 2014). The latest version HC3v5 is exploiting the Mc-Clear model (Lefèvre et al., 2013), and is the most advanced and most accurate version (Eissa et al., 2015;Thomas et al., 2016a, b).
Needs were expressed by users for a real-time and forecast capability for horizons up to a few hours. Though McClear itself may be used in forecasting mode (Lefèvre et al., 2013), its inputs as used in HC3v5 are only available with a delay of 2 days, which makes impossible so far the creation of a realtime and short-term forecast service based on HC3v5. The current HC3v4 service did not have such a built-in capability either. Meteosat Second Generation (MSG) satellite images acquired every 15 min during daytime via the EumetCast system are processed by the Heliosat-2 method (Rigollier et al., 2004) every following night to generate the HC3 database. All valid images of the day are necessary to perform the method as temporal interpolation may occur to eliminate effects of gaps due to missing images.
A service having real-time and forecast capability for horizons up to a few hours would help in managing photovoltaic plants and intelligent buildings, and would potentially increase their return on investment. This paper presents such a service based on HC3v4. The technical challenge was twofold: on the one hand, the method should be fast enough to enable the processing of large areas in real time, i.e. in a few minutes after the image acquisition. On the other hand, its outputs should be such as they can be directly ingested in the own processes of the customers. The paper describes the method developed to provide HC3v4 real time and shortterm forecasts irradiations until the end of the current day. A procedure in two successive steps was adopted for the validation of the service. It has been first tested by fifteen users for about one month in April 2015 with the aim of assessing the fitness-for-use. Then, a scientific assessment of the performances of this new service was carried out by a back testing approach or hindcasting. In this approach, images for a past period were input to the method and the outputs were compared to coincident high quality measurements of the global irradiation received on a horizontal surface (GHI) performed at fourteen stations belonging to the Baseline Surface Radiation Network (BSRN).

The HC3v4 real-time and short-term forecast service
The HC3 database stores 15 min GHI values estimated from MSG imagery since the first MSG satellite has been operational in February 2004. HC3v4 exploits the ESRA clearsky model (Rigollier et al., 2000) with the climatological database of the Linke turbidity factor of Remund et al. (2003) as input. This database is composed of one value per day over one year and on a grid every 5' of arc angle both in latitude and longitude. 5' of arc angle represents approximately 9 km along a longitude. As there is no update of this Linke turbidity factor, HC3v4 may fail in correctly predicting the actual atmospheric content and possible changes in SSI due to local effects such as maritime inputs, volcanoes, fires, evolution of the water vapour content, pollution... The HC3v4 archive is available on the MSG area, from February 2004 up to day-1. The Heliosat-2 method is ap-plied only for solar elevation angles above 12 • . Since 2015, a new service has been developed, called HC3v4 real-time and short-term forecast service. It is abbreviated in HC3v4 forecast in the following. Its principle is the persistence of the clear-sky index Kc, defined as the ratio of the GHI to the GHI in clear-sky conditions, i.e. cloud-free conditions. Figure 1 illustrates this principle by giving an example of 15 min slots available until 11:25 a.m. (Fig. 1a) for a given day d. The last slot available indicates half cloudy half sunny weather; the main assumption of the persistence states that the same type of weather will be available until the end of the current day, modulated by the sun position (Fig. 1b), i.e. Kc will be the same as the past instant. Then, a new MSG image is acquired, and the weather is nicer than planned 15 min ago. The forecast is consequently adjusted (Fig. 1c). Every 15 min, the service provides the data until the end of the day, with estimates based on all the slots available until the current instant, and forecasts based on local persistence afterwards. Each received MSG image is processed with the same method Heliosat-2 and same inputs than for HC3v4 but in real time and with a limitation of 1 • for the solar elevation angle instead of 12 • .

Brief overview of the stations and quality control
The short-term forecasts provided by the service are compared to the observations of several BSRN stations for different temporal horizons. BSRN is a collection of measurements of GHI, diffuse irradiations on horizontal surface, and direct irradiations over a normal plane (DNI) of high quality suitable for validation (König-Langlo et al., 2014;Ohmura et al., 1998;Roesch et al., 2011a, b). Measurements are acquired every 1 min. Figure 2 and Table 1   used for the quality assessment of the data, located in the MSG coverage. The last column of Table 1 gives the periods of data used in this evaluation. Prior to the comparison at different summarizations, a thorough quality check procedure has been applied onto the 1 min BSRN data as recommended by WMO (1981). The major steps can be summarized as follows: -Set night, sunrise and sunset values, i.e. when the solar elevation angle is less than 1 • , to zero, -Discard values beyond "extremely rare limits" and "physical possible limits", -Perform the consistency checks when the three radiation components are available.
Then, a temporal aggregation was performed to generate the values at the different summarizations. This procedure is of utmost importance since it directly impacts the validation results. Our approach was as follows: -Fill the 1 min gaps by interpolating the irradiation values taking into account the sun position at each instant. Then generate the 15 min irradiation from the 1 min BSRN measurements if at least 85 % of the slots were available before interpolation. Compute the quantities summarizing the deviation at 15 min, -Generate the hourly, daily and monthly irradiation by summing up the 15 min irradiation if at least respectively 75, 65 and 50 % of the slots are available. No temporal interpolation is applied, leading to partial sums. Compute the quantities summarizing the deviation at the hourly, daily and monthly time steps.

Protocol of evaluation
This section describes the protocol for the comparison of the short-term forecasts to coincident high quality irradiations measured in the fourteen stations.
Estimates were set to "Not a Number" (NaN) when the measurements were missing, and reciprocally. In this way, the data sets contain the same number of data with coincidence in time. Then, both the estimates and the observations were aggregated to generate the data at the different summarizations for comparison.
Among all the statistical quantities that can be computed to assess the deviation between two datasets, we selected the bias, and the bias relative to the mean of the observation in percent, also named relative bias, and the root mean square error (RMSE), and relative RMSE in percent, and the correlation coefficient (correl. coeff.). These quantities were computed for the GHI, and for: For the sake of conciseness, all results are not provided here. The Tables A1 and A2 relative to the statistics for the 15 min GHI, for temporal horizons of 15 min and 1 h have been placed in Appendix. All the other figures are visible at: http://www.soda-pro.com/soda-products/ by selecting the "HC3 time series in real time" service.

Interpretation of the validation results
The validation results are summarized in the Table 2. It presents the range of the relative bias, relative RMSE and correlation coefficient for the fourteen stations. The "Av" column is the range of the most often found values. The new method based on the persistence of Kc exhibits satisfactory results for temporal horizons of 15 min and 1 h. The quality of the 15 min irradiation forecasted 15 min ahead is comparable to that obtained for the archived HC3v4 data (Eissa et al., 2015;Thomas et al., 2016a, b). As a consequence, the assumption of persistence of the current weather for the next instant of acquisition is plausible.
For 1 h ahead, the bias does not evolve much, and thus no trend of errors is introduced by the method. The RMSE increases which means a higher spreading of the data, and thus an increased difficulty of the method predicting the correct weather when increasing the temporal horizon. This is confirmed by slightly lower correlation coefficients. Expectedly, this tendency keeps growing with the increasing temporal horizons. Figure 3 supports this last remark. It exhibits the 15 min GHI values for 27 March 2007 at Carpentras. The black, blue and green lines correspond respectively to the in-situ measurements, the forecasts 15 min ahead and those 1 h ahead. The weather for this day is clear at the beginning and end of the day, with a cloudy event occurring at mid-day. The blue line is most often very close to the black line, demonstrating the accuracy of the 15 min ahead forecast in this case. The green line is in full agreement with the black line at the beginning and the end of the day, i.e. when the sky is clear and the persistence assumption is true. Otherwise, the green line has the same shape than the black one but with a time-lag of 1 h. This observation is fully consistent with the persistence assumption. The difference between the blue and green lines demonstrates the increasing difficulty of the method in correctly assessing the radiation when the temporal horizon increases. Figure 4 exhibits the same type of plots but for a perfectly clear sky day, on 14 May 2004. The persistence assumption is clearly valid in this case. Consequently the method faces no difficulty in estimating the next slot based on the previous one whatever the temporal horizon. That is why the three lines are almost superimposed, except for the first 1 h ahead forecasts which should wait for the first estimated hourly irradiation to provide the next one, creating this step at 05:00 UT on the green line.
Adv. Sci. Res., 13, 129-136, 2016 www.adv-sci-res.net/13/129/2016/  Figures 3 and 4 demonstrate that the method has more difficulty in forecasting SSI in cloudy situations. This point coincides with the greatest RMSE obtained for Brazilian sites located in tropical areas (between 20-23 Wh m −2 for 15 min horizon, and 25-27 Wh m −2 for 1 h horizon). In particular, Brasilia experiences numerous storms with heavy rain falls during the rainy season, from April to October, i.e. for more than half of the year. RMSE are also greater for cloudy sites in Europe than for the other European sites. Nevertheless, we should also point out that the relative error is high for the northern stations because the mean of the irradiation is very low. Figures 5 and 6 illustrate the link between the performance of the method and the temporal variability of the radiation. Figure 5 exhibits a 2-D histogram of the daily mean of Kc on the horizontal axis and the daily standard deviation of Kc on the vertical axis for Carpentras. The frequency in percent is coded in colour with the scale given on the right. Red and brown small squares close to a mean value of Kc of 0.8 means that there is a large frequency of clear-sky conditions. On the opposite, there are few overcast situations with Kc less than 0.2. One notes the arch shape of this 2-D histogram. Ex- pectedly, the left part exhibits small variability: for a given Kc during an overcast day (Kc ≤ 0.2), the range of observed standard deviation is small: between 0.05 and 0.2. This range cannot be large because the hourly irradiation, hence Kc, is small in these conditions. The right part exhibits high Kc with fairly low variability of the standard deviation of Kc during each day. Actually, the intra-day variability of Kc for Carpentras is usually low for clear-sky conditions. This means that quite often a high Kc means an entire cloud-free day at Carpentras. Figure 6 exhibits the same graph but for Brasilia. One observes a similar shape but the values are more spread. For any Kc, there is a large scattering of the observed standard deviations. There are more cases of mid-values of Kc than in Carpentras. There is also a very large spread of the standard deviations for high Kc which means that there are many days where the cloud-free conditions prevail but with appearance of clouds. This alternation of cloud/no-cloud situations makes an accurate forecast more difficult than in the case of Carpentras.
A final remark on these validation results is that, despite good RMSE and correlation coefficient, the method tends to underestimate the values for Sede Boqer. It appears that the satellite-derived databases have some problems to capture the actual content of the atmosphere in clear sky conditions for this specific location and this explains this observation.

Conclusions and perspectives
A new service has been presented to provide satellite-based irradiation values in real time, i.e. following the MSG image acquisition, and short-term forecasts until the end of the current day. The results of the validation demonstrated that though very simple from a scientific point of view, this method is adequate and reliable enough to build a sustainable value-added service based on this service.
A pre-operational version of the service was tested by 15 users for approx. one month in April 2015. They assessed the benefits of this new service in their daily work. The overall conclusion from this users' survey was very positive. The feedback and returns on experience of the different testers enabled the refinement of the service, which is fully operational since July 2015. In particular, the whole SoDa infrastructure has been duplicated to ensure more robust service to users. So far, five customers have purchased this service and are satisfied with the supplied data. Among them, three were testers during the test period.