CNR-ISAC 2 m temperature monthly forecasts : a first probabilistic evaluation

Abstract. The 2 m temperature probabilistic forecasts collected, on a weekly basis, in about one year of CNR-ISAC monthly forecasting activity are evaluated in this work. RPSS and reliability diagrams are computed on a tercile classification of forecast and observed temperatures. The RPSS, averaged over all the available cases, shows that the system has a residual predictive skill beyond week 2 on some peculiar regions. Reliability diagrams show that, in general, the probability forecasts of above-normal observed temperature are more reliable than below-normal temperature. Although the results are based on a limited period, they can represent a reference for similar works based on other subseasonal forecasting systems.


Introduction
The improvements in modelling and data observation and assimilation techniques, and the continuous development of computing capacity, have allowed for numerical weather prediction to get new goals in the recent decades: besides the advancements in medium range forecasts, the possibility to produce skilful subseasonal forecasts has been increasingly explored in the last 10-15 years (e.g.Hudson et al., 2011;Vitart, 2014).Both ensemble initialization and higher resolution have enhanced the capability of global modelling systems to exploit the main sources of subseasonal predictability (Brunet et al., 2010) that, on this time scale, can be both internal and external to the atmosphere.In particular, predictive skill is favoured by initial atmospheric states that are characterized by persistent low-frequency variability patterns, such as the Madden-Julian Oscillation (Mani et al., 2014), and by anomalies in the boundary conditions, such as, for example, the land surface and snow cover (Vitart et al., 2012).
Currently, several meteorological centres in the world produce operational dynamical subseasonal forecasts.Most of these forecasting efforts have been recently gathered in the framework of a World Weather Research Programme/World Climate Research Programme initiative on subseasonal to seasonal (S2S) prediction (Robertson et al., 2015).At the ISAC Institute of the Italian National Research Council (CNR), after an initial period of experimental monthly fore-casting activity (Mastrangelo et al., 2012), the forecasting system based on the GLOBO model (Malguzzi et al., 2011) has been renewed to become one of the current 11 operational forecasting systems participating in the S2S project.Reforecast and forecast simulations from these 11 centres contribute to the S2S database (Vitart et al., 2017), a major aim of the same project.
A first assessment of the probabilistic predictive skill of the CNR-ISAC forecasting system is performed and provided as the main goal of this work.The evaluation is based on outputs collected in about one year of operational activity, and is performed limited to the 2 m temperature parameter.The prediction of this parameter is considered of primary relevance since 2 m temperature is intrinsically related to human activities and closely connected to some atmospheric phenomena peculiar of the subseasonal timescale as, for instance, heat waves and dry spells (e.g.Hudson et al., 2015).
Following this introductory section, the CNR-ISAC forecasting system is described in Sect. 2 together with the data used for this work.Results are presented in Sect. 3 and conclusions are discussed in Sect. 4.

Forecast and verification data and methods
The CNR-ISAC monthly forecasting system is built upon the aforementioned experimental version of the same system Published by Copernicus Publications.Week 1 and is briefly described in this section.More details on the GLOBO model and the design of the forecasting system are given in Mastrangelo et al. (2012).
In the current version, the atmospheric general circulation model GLOBO is run with a horizontal grid spacing of about 0.56 × 0.80 • latitude/longitude, 54 vertical hybrid levels, 7 soil levels, and stored on a regular grid with horizontal resolution of 1.5 × 1.5 • latitude/longitude in compliance with the S2S database requirements.A mixed lagged-perturbed ensemble forecast made up of 41 members is produced once a week: 10 perturbed members are initialized every 6 synoptic hours on each Sunday through the data of the National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS) runs at lead time 0; the control run is initialized at 00:00 UTC on Monday with the GEFS control forecast data at lead time 0. The resulting ensemble covers 31 days starting at 00:00 UTC on each Monday.A modelled 30-year reference climatology, from 1981 to 2010, is obtained through 31-day reforecast simulations initialized, on 73 equally spaced calendar days, with data derived from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim reanalyses (Dee et al., 2011).The resulting reference climatological dataset is used to compute calibrated forecast anomalies and probabilities of several meteorological variables.Specifically, a bias-reduction calibration of the forecast ensemble mean is applied by combining a large number of reforecasts through the weighted averaging technique described in Mastrangelo et al. (2012).This technique has been proven to effectively smoothing the an-nual cycle of the model bias improving the predictive skill.Similarly, a climatological distribution is obtained from the reforecasts simulations to compute the terciles used to produce probabilistic categorical forecasts.The same technique is applied to reanalysis data to reconstruct a verifying climatological distribution homogeneous to the modelled one.
In this preliminary work, a total of 58 monthly forecasts, initialized once a week from 29 March 2015 to 1 May 2016, are verified against the ERA-Interim reanalyses.

Results
The evaluation exercise described in this work is based on the weekly averaged (starting from the first forecast day) forecast probabilities of two main dichotomous events: 2 m temperature above (below) the upper (lower) tercile.An overview of the probabilistic predictive skill is provided through the ranked probability skill score (RPSS, Wilks, 2011).This score is positive if the examined forecast outperforms a reference forecast that, here, is the climatological one.Figure 1 shows the mean RPSS of the 58 cases over land grid points.The skill reduces along the forecast range over most of the areas but the equatorial regions, where the decrease is light.Desert or dry areas of Africa, Middle East and Australia also show slightly positive RPSS values, suggesting that soil properties may strongly affect temperature prediction.Also, several mid-latitude areas, for example the broad Mediterranean region and North-Western Northern America, still preserve a residual predictive skill also beyond week 2.However, especially a wide area of the Asian continent shows negative values since week 1.This feature could be related to soil scheme initialization issues, in particular on snowcovered or complex orography areas other than to a systematic actual lack of predictive signal.
Reliability diagrams (Wilks, 2011) are shown in Fig. 2 to give more insights on the statistic relationship associating forecast probabilities and observed occurrences of the two temperature events.Hit rates have been computed from weekly averaged forecasts for 5 probability bins on each land grid point, and subsequently averaged over the Northern Hemisphere extratropical area (> 20 • N latitude).Figure 2 shows that the forecast system loses most of its reliability by week 2, with only minor differences between week 3 and 4. The forecast of above-normal temperature events (red curves) is, in general, more reliable than the forecast of below-normal events (blue curves): the latter is more overforecast (cold bias), especially for large forecast probabilities, in week 1 and 2. The system loses resolution in week 3 and 4, as suggested by the flattening of the reliability curves and by the forecast probability frequencies that peak around the observed climatological value (see histograms in Fig. 2).Cold (below-normal observed temperature) events, however, are reliably forecast in case of very low probabilities, which also is the most frequent case.The frequency of high forecast probability of cold events is too low to evaluate the associated reliability.

Summary and conclusions
The CNR-ISAC monthly forecasting system is operationally run on a weekly basis to produce 31-day ensemble forecasts.Data from this activity, promoted by the Italian National Civil Protection Agency, contribute to the S2S database since the beginning of November 2015.In this work, about one year (58 weeks) of 2 m temperature probabilistic forecasts has been evaluated against ECMWF ERA-Interim reanalyses to obtain some preliminary indication on the performance of the forecasting system.
RPSS analysis indicates that 2 m temperature predictability beyond week 2 is low and influenced by soil properties over some regions, with greater predictability in equatorial regions and some desert areas.Reliability diagrams show that, in general, the probability forecasts of warm events (above-normal observed temperature) are more reliable.Beyond week 2, even if a general lack of resolution is observed, low probability forecasts of cold events appear to be reliable.
The results presented here are closely related to the climatological properties of the examined limited period, which, for instance, featured a strong El Niño event (Xue and Kumar, 2016).However, this work could be a useful reference for model verification and comparison.

Figure 2 .
Figure 2. Reliability diagrams computed from 58 forecasts of 2 m temperature and averaged over the extratropical Northern Hemisphere for week 1 (a), week 2 (b), week 3 (c), week 4 (d).Diagrams of the forecast probability frequency are shown on the top-left (upper tercile) and bottom-right (lower tercile) corner of each panel.