Skip to main content

Official Journal of the Asia Oceania Geosciences Society (AOGS)

Ensemble size versus bias correction effects in subseasonal-to-seasonal (S2S) forecasts

Abstract

This study explores the ensemble size effect on subseasonal-to-seasonal (S2S) forecasts of the European Center for Medium-Range Weather Forecasts (ECMWF) model. The ensemble forecast skill and its sensitivity to the ensemble size are assessed for the troposphere and stratosphere, and compared with theoretical estimates under the perfect model assumption. The degree of skill improvement in ensemble-mean forecasts with increasing ensemble size agrees well with theoretical estimates in the troposphere. However, in the stratosphere, increasing the ensemble size does not yield as much of the skill improvement as expected. Decomposition of the mean square skill score reveals that the weak ensemble size effect in the stratosphere is primarily caused by a large unconditional bias, which exhibits no apparent decrease with increasing ensemble size. Removing such bias significantly improves the S2S forecast skill and ensemble size effect, suggesting that bias correction is crucial for S2S forecasts, especially in the stratosphere.

Introduction

Subseasonal-to-seasonal (S2S) forecasts target time scales ranging from two weeks to two months, bridging the gap between weather and climate predictions (Robertson and Vitart 2018). Despite the importance and growing demand for S2S forecasts, the forecast skill within this time range remains lower than that of weather and climate because the S2S time scale is too long to preserve the memory of the initial conditions but is too short to be strongly influenced by the variability of the surface boundary conditions (Vitart et al. 2017).

Constructing an ensemble prediction system (EPS) is one of the straightforward and conventional methods to improve the S2S forecast skill. It is well known both theoretically and practically that increasing the number of ensemble members leads to model forecast skill improvement, referred to as the ensemble size effect. After Leith (1974), who theoretically showed that a reasonable accuracy of Monte Carlo forecasts can be obtained with an ensemble size as small as eight, several studies have examined a reasonable ensemble size (e.g., Déqué 1997; Kumar et al. 2001). Following these studies, most operational EPSs adopt ensemble sizes ranging from 15 to 51 members by considering the balance between the computational cost and performance (Leutbecher 2019).

Previous studies on the ensemble size effect have mostly been based on the perfect model assumption, in which the model is assumed to perfectly simulate the evolution of the real atmosphere when the true state is given as an initial condition (Murphy 1988b). This assumption considers only the uncertainty in the initial conditions and ignores the model uncertainty. It is difficult to directly apply this assumption to operational forecasts because the model itself has systematic errors. Model errors play an important role in limiting the model prediction skill and predictability (Schneider et al. 2003; Stan and Kirtman 2008). Branković et al. (1990) showed that systematic model errors considerably reduce the skill gain from ensemble forecasting.

This study revisits the theoretical estimate of the ensemble-mean forecast skill and compares it to the operational forecast skill derived from the European Centre for Medium-Range Weather Forecasts (ECMWF) real-time S2S EPS, which has a large ensemble size (Vitart et al. 2017). In particular, the differences in forecast skill and ensemble size effect between the stratosphere and troposphere and those between the tropics and extratropics are explored. The reason for their differences is analyzed via skill score decomposition proposed by Murphy and Epstein (1989).

Data and methods

Data

The ECMWF real-time S2S forecast dataset is used in this study due to its large ensemble size (51 members). The ECMWF EPS simulates initial uncertainties using an ensemble of data assimilations (EDA) and singular vectors and model uncertainties using stochastic schemes (e.g., Buizza et al. 2008; Leutbecher et al. 2017). Forecasts are initialized twice a week and integrated for 46 days. All forecasts initialized from June 2015 to May 2018 (312 forecasts) are analyzed, but only the results of forecasts initialized during the December–January–February (DJF; 77 forecasts) and June–July–August (JJA; 80 forecasts) periods are presented in this study.

The ECMWF S2S prediction model uses a Variable Resolution EPS (VAREPS; Buizza et al. 2007; Vitart et al. 2008), in which the model horizontal resolution is changed during model integration. The atmospheric horizontal resolution is approximately 32 km up to forecast day 10 and approximately 64 km from day 10 onwards before the resolution upgrade in the Integrated Forecasting System (IFS) cycle 41r2 on 8 March 2016. Afterward, the resolution is set to approximately 18 km for the first 15 days and approximately 36 km thereafter. ECMWF Interim reanalysis (ERA-Interim; Dee et al. 2011) data, which are used as initial conditions, are employed as reference data for model verification and climatology calculation. In all analyses, the pentad moving-averaged data are used, and the climatology is calculated for the period 1981–2010. Verifications are conducted for 50- and 500-hPa geopotential height forecasts, which represent the stratospheric and tropospheric forecast skills, respectively, in the Northern Hemisphere extratropics (30°–90°N), tropics (30°S–30°N), and Southern Hemisphere extratropics (30°–90°S). The bias-corrected forecasts are obtained by subtracting the mean bias from the forecasts in a cross-validated way, wherein the year for which the bias is calculated is left out (Polkova et al. 2022).

Evaluation metrics

This study evaluates forecasts with the mean square skill score (MSSS), which is a diagnostic measure used in the standardized verification system for deterministic long-range forecasts (World Meteorological Organization 2006). The MSSS is defined as one minus the ratio of the mean square error (MSE) of forecasts (MSEf) to the MSE of reference data (MSEc):

$$\mathrm{MSSS}\left(\tau ,{N}_{e}\right)=1-\frac{{\mathrm{MSE}}_{f}\left(\tau ,{N}_{e}\right)}{{\mathrm{MSE}}_{c}\left(\tau \right)},$$
(1)

where

$${\text{MSE}}_{f} \left( {\tau ,N_{e} } \right) = \left\langle {\left\{ {\hat{f}_{{ij}} \left( {\tau ,N_{e} } \right) - o_{{ij}} \left( \tau \right)} \right\}^{2} } \right\rangle ,$$
$${\widehat{f}}_{ij}\left(\tau ,{N}_{e}\right)=\frac{1}{{N}_{e}}{\sum }_{k=1}^{{N}_{e}}{f}_{ijk}\left(\tau \right),$$
$${\text{MSE}}_{c} \left( \tau \right) = \left\langle {\left\{ {c_{{ij}} \left( \tau \right) - o_{{ij}} \left( \tau \right)} \right\}^{2} } \right\rangle .$$

Here, f is the forecast; o is the observation; c is the daily climatology of the ERA-Interim data; τ is the forecast day; and subscripts i, j, and k denote the forecast initialization time, grid point, and ensemble member, respectively. The angle bracket denotes an area-weighted average over all grid points within the region of interest (Ng) and over all initialization dates (Nf) (e.g., MSEf  =  \(\frac{1}{{N}_{f}{\sum }_{j=1}^{{N}_{g}}\mathrm{cos}{\theta }_{j}}{\sum }_{i=1}^{{N}_{f}}{\sum }_{j=1}^{{N}_{g}}{\left\{{\widehat{f}}_{ij}-{o}_{ij}\right\}}^{2}\mathrm{cos}{\theta }_{j}\), where θj is the latitude in degrees). \(\widehat{f}\) denotes the average of the first Ne ensemble members, where Ne ranges from 1 to 51 for the ECMWF real-time S2S forecasts. The order of ensemble members follows the numbering provided by the S2S dataset. Note that MSEc is the climatological variance of a given variable, representing the natural variability. The MSSS equals one for perfect predictions, but typically decreases with time as forecast errors grow. When the error of the prediction system (MSEf) is larger than the natural variability, the MSSS becomes negative, which is considered that the system loses its predictability.

Skill decomposition

Murphy and Epstein (1989) proposed a method to identify the source of the forecast skill through MSSS decomposition. The MSSS is decomposed into four terms involving the anomaly correlation coefficient (AC), conditional bias (CB), unconditional bias (UB), and difference between the mean sample and historical climatologies (CV) as follows:

$$\mathrm{MSSS}\left(\tau ,{N}_{e}\right)=\frac{{\mathrm{AC}}^{2}\left(\tau ,{N}_{e}\right)-{\mathrm{CB}}^{2}\left(\tau ,{N}_{e}\right)-{\mathrm{UB}}^{2}\left(\tau ,{N}_{e}\right)+{\mathrm{CV}}^{2}\left(\tau \right)}{1+{\mathrm{CV}}^{2}\left(\tau \right)},$$
(2)

where

$${\text{AC}}^{2} \left( {\tau ,N_{e} } \right) = \frac{{\left\langle {\left\{ {\widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right) - \langle \widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right)\rangle } \right\}\left\{ {o\prime _{{ij}} \left( \tau \right) - \langle o\prime _{{ij}} \left( \tau \right)\rangle } \right\}} \right\rangle ^{2} }}{{\left\langle {\left\{ {\widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right) - \langle \widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right)\rangle } \right\}^{2} \rangle \langle \left\{ {o\prime _{{ij}} \left( \tau \right) - \langle o\prime _{{ij}} \left( \tau \right)\rangle } \right\}^{2} } \right\rangle }},$$
$${\text{CB}}^{2} \left( {\tau ,N_{e} } \right) = \left\{ {{\text{AC}}\left( {\tau ,N_{e} } \right) - \frac{{\sqrt {\left\langle {\left\{ {\widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right) - \langle \widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right)\rangle } \right\}^{2} } \right\rangle } }}{{\sqrt {\left\langle {\left\{ {o\prime _{{ij}} \left( \tau \right) - \langle o\prime _{{ij}} \left( \tau \right)\rangle } \right\}^{2} } \right\rangle } }}} \right\}^{2} ,$$
$${\text{UB}}^{2} \left( {\tau ,N_{e} } \right) = \frac{{\left\{ {\left\langle {\widehat{{f\prime }}_{{ij}} \left( {\tau ,N_{e} } \right) - o\prime _{{ij}} \left( \tau \right)} \right\rangle } \right\}^{2} }}{{\left\langle {\left\{ {o\prime _{{ij}} \left( \tau \right) - \left\langle {o\prime _{{ij}} \left( \tau \right)} \right\rangle } \right\}^{2} } \right\rangle }},$$
$${\text{CV}}^{2} \left( \tau \right) = \frac{{\left\langle {o\prime _{{ij}} \left( \tau \right)} \right\rangle ^{2} }}{{\left\langle {\left\{ {o\prime _{{ij}} \left( \tau \right) - \langle o\prime _{{ij}} \left( \tau \right)\rangle } \right\}^{2} } \right\rangle }}.$$

Here, the prime denotes the anomaly with respect to the long-term climatology calculated from the ERA-Interim data. For instance, \({f{\prime}}_{ij}={f}_{ij}-{c}_{ij}\) is the anomaly in the forecast field generated by the i-th initialization at j-th grid point. In this study, long-term climatology of reanalysis is taken to be the reference for both observations and forecasts (Murphy and Epstein 1989) to decompose the MSSS into the above four terms. Note that the third and fourth terms on right hand side of Eq. (2) (i.e., UB2 and CV2) become zero when observed and forecasted anomalies are defined against their respective climatologies which are calculated over the same period as the forecasts to be assessed (Goddard et al. 2013).

The term AC2 in Eq. (2) is the square of the anomaly correlation coefficient and is considered as a measure of the potential skill that the prediction system can have when the biases are eliminated (Murphy 1988a; Murphy and Epstein 1989). This term ranges from zero to one, and the prediction system generally has a higher skill when this value is closer to one. The term CB2, which is the squared difference between the anomaly correlation coefficient and the ratio of the standard deviations of the forecast and observed anomalies, is a measure of the conditional bias. This term vanishes only when the slope of the regression line, in which the observed anomalies are regressed on the forecast anomalies, is equal to one. The term UB2 is the square of the ratio of the mean model bias to the standard deviation of the observed anomalies. This term is a measure of the unconditional bias and vanishes only for unbiased forecasts. Note that even if the model bias is small, UB2 can be large if the variance of observation is sufficiently small. The difference between the conditional bias and unconditional bias is well illustrated in Fig. 5 of Bradley et al. (2019). The term CV2 is the square of the ratio of the mean of the observed anomalies to the standard deviation of the observed anomalies (i.e., the inverse of the coefficient of variation of the observed anomalies) and is related to how much the observed state deviates from the climatology. For more information, see Murphy and Epstein (1989).

Theoretical ensemble size effect

Under the perfect model assumption, the forecast skill of the ensemble-mean forecast, determined by the MSE, can be expressed in terms of the ensemble size (Ne) and the average skill of individual ensemble members \(\overline{{\mathrm{MSE} }_{f}\left(\tau \right)}\) (Murphy 1988b):

$${\mathrm{MSE}}_{f}\left(\tau ,{N}_{e}\right)=\frac{{N}_{e}+1}{2{N}_{e}}\overline{{\mathrm{MSE} }_{f}\left(\tau \right)}.$$
(3)

See the Appendix for a detailed derivation of Eq. (3). By substituting Eq. (3) into Eq. (1), the theoretical MSSS of the ensemble-mean forecast can also be expressed in terms of the ensemble size and the mean score of individual member forecasts \(\overline{\mathrm{MSSS }\left(\tau \right)}\) as follows:

$$\mathrm{MSSS}\left(\tau ,{N}_{e}\right)=1-\frac{{N}_{e}+1}{2{N}_{e}}\left\{1-\overline{\mathrm{MSSS }\left(\tau \right)}\right\}.$$
(4)

The above two equations indicate an improved forecast skill with increasing ensemble size.

Figure 1a shows how the theoretical MSSS increases with the ensemble size for a given \(\overline{\mathrm{MSSS} }\). Adding the first few ensemble members dramatically improves the forecast skill of EPS, especially when the skill of individual ensemble members is lower. However, with increasing ensemble size, the ensemble size effect becomes saturated. The theoretical MSSS of the ensemble-mean forecast for an infinite ensemble size is

$$\mathrm{MSSS}\left(\tau ,\infty \right)=\frac{1}{2}\left\{1+\overline{\mathrm{MSSS }\left(\tau \right)}\right\},$$
(5)

which is the maximum MSSS that the EPS can have. When the average MSSS of individual ensemble members, \(\overline{\mathrm{MSSS} }\), is lower than −1, the MSSS of the ensemble-mean forecast becomes negative, indicating a loss of forecast skill even with an infinite ensemble size.

Fig. 1
figure 1

a Theoretical MSSS of the ensemble-mean forecast for a given mean score of individual ensemble members (\(\overline{\mathrm{MSSS }\left(\tau \right)}\) = −1, −0.6, −0.2, 0.2 and 0.6; represented by different colors) and b SIR multiplied by 100 (%) with varying ensemble sizes

To measure the degree of skill improvement achieved by the ensemble-mean forecast, we define the skill improvement ratio (SIR) as the ratio of the increase in the MSSS attained using the ensemble-mean forecast to its theoretical maximum value with an infinite ensemble size:

$$\mathrm{SIR}\left({N}_{e}\right)=\frac{\mathrm{MSSS}\left(\tau ,{N}_{e}\right)-\overline{\mathrm{MSSS }(\tau )}}{\mathrm{MSSS}\left(\tau ,\infty \right)-\overline{\mathrm{MSSS }(\tau )}}.$$
(6)

Under the perfect model assumption, the SIR is given by (Ne–1)/Ne, which is a function of the ensemble size only and independent of the skill of individual ensemble members (Fig. 1b). For example, 10 and 51 ensemble members lead to an increase in the MSSS equivalent to 90% and 98%, respectively, of its maximum value. Again, it is clear from Fig. 1b that the ensemble size effect becomes saturated with increasing number of ensemble members.

Results

Figure 2 shows the temporal evolution of the MSSSs of individual ensemble members calculated from Eq. (1) (solid gray lines) and their mean (solid black line) as well as the MSSSs of the ensemble mean with varying ensemble sizes (Ne = 2, 10, and 51; solid colored lines) for 50- and 500-hPa geopotential height forecasts initialized during the DJF 2015–2018 period in the Northern Hemisphere extratropics, tropics, and Southern Hemisphere extratropics, and the corresponding theoretical MSSSs of the ensemble-mean forecast calculated from Eq. (4) (dashed colored lines). At both the 50- and 500-hPa levels in all regions, a high degree of forecast skill is maintained over the first few forecast days, but the skill sharply decreases afterward. The stratospheric forecast skill decreases slower and, therefore, remains higher than the tropospheric skill on earlier forecast days. However, it decreases continuously with time during the integration period, in contrast to the tropospheric forecast skill which becomes nearly constant after approximately 30 forecast days. The stratospheric MSSSs eventually become much lower than the tropospheric ones on later forecast days (solid colored lines in Fig. 2). The forecast skill in the tropics is higher than that in the extratropics, which can be partly explained by the strong modulation of the tropical atmosphere by the underlying sea surface temperature (Shukla 1998).

Fig. 2
figure 2

Temporal evolution of the MSSSs for (left) 50- and (right) 500-hPa geopotential height forecasts initialized during the DJF 2015–2018 period in the (top) Northern Hemisphere extratropics, (middle) tropics, and (bottom) Southern Hemisphere extratropics of individual ensemble members (solid gray lines) and their mean (solid black line) as well as the MSSSs of the ensemble mean with varying ensemble sizes (Ne = 2, 10, and 51; solid colored lines). The corresponding theoretical MSSSs of the ensemble-mean forecast calculated from Eq. (4) with varying ensemble sizes are shown as dashed colored lines

While the operational forecast skill is in good agreement with the theoretical estimate in the troposphere (compare the solid and dashed colored lines in the right column), there is a large discrepancy between them in the stratosphere (left column). The stratospheric MSSSs are substantially lower than the corresponding theoretical values except after 40 forecast days with Ne = 2 in the Northern Hemisphere extratropics (Fig. 2a). This is particularly true in the tropics and summer hemisphere (Fig. 2b and c). The tropical stratospheric skill of the ensemble-mean forecast with Ne = 2 is even lower than the average skill of individual ensemble members (compare the solid dark cyan and black lines in Fig. 2b). The skill improvement is also relatively minor in the tropics and summer hemisphere when the ensemble size is increased from Ne = 2 to Ne = 51 (compare the solid dark cyan and dark blue lines in Fig. 2b and c). Similar results are also obtained with the JJA forecasts (Additional file 1: Fig. S1). These results indicate that the ensemble size effect is different depending on the region.

The discrepancy between the estimated and operational forecast skills implies that increasing the ensemble size does not necessarily ensure as much of an improvement in forecast skill as expected. This also indicates that the perfect model assumption does not hold for operational forecasts, especially in the stratosphere. In this study, the source of the weak ensemble size effect in the stratosphere is identified with the skill decomposition method described in the Data and Method section.

Figures 3 and 4 show the results of MSSS decomposition, i.e., AC2, CB2, UB2, and CV2, for the 50- and 500-hPa geopotential height forecasts initialized during the boreal winter in the Northern Hemisphere extratropics and tropics, respectively. The overall features of the Southern Hemisphere extratropics are similar to those of the Northern Hemisphere extratropics and are therefore not presented here. AC2 decreases more slowly with the forecast days in the stratosphere than in the troposphere. Therefore, its values in the stratosphere are higher than those in the troposphere (first row in Figs. 3 and 4). In particular, in the tropical stratosphere, the forecasts maintain relatively high AC2 values during the forecast period. This implies that the model possesses a higher potential forecast skill in the stratosphere than in the troposphere, especially in the tropics. AC2 increases with increasing ensemble size, as shown in previous studies (e.g., Leith 1974; Murphy 1988b; Branković et al. 1990), in both the stratosphere and troposphere.

Fig. 3
figure 3

Temporal evolution of the four terms (from top to bottom: AC2, CB2, UB2, and CV2) of MSSS decomposition for (left) 50- and (right) 500-hPa geopotential height forecasts initialized during the DJF 2015–2018 period in the Northern Hemisphere extratropics of individual ensemble members (gray lines) and their mean (black line) as well as those of the ensemble mean with varying ensemble sizes (Ne = 2, 10, and 51; colored lines). Note that CV2, which is independent of the forecasts, is indicated with a green line

Fig. 4
figure 4

Same as Fig. 3, but in the tropics

The CB2 tends to increase with the forecast days, but the increase rate decreases and even becomes negative on later forecast days (second row in Figs. 3 and 4). In the extratropics, its values in the stratosphere are lower than those in the troposphere on earlier forecast days but higher on later forecast days (Fig. 3b and f). However, in the tropics, CB2 is much smaller in the stratosphere than that in the troposphere throughout the forecast period (Fig. 4b and f). The conditional bias, which represents the linear slope between the forecast and observed states, can be understood as errors in the amplitude of eddies deviating from the mean state. Son et al. (2020) who examined the extratropical prediction skill of the S2S prediction models showed that eddy amplitude error is higher in the extratropical troposphere than in the stratosphere on earlier forecast days, and this may be attributable to the failure to predict the amplitude of synoptic scale eddies in the troposphere. CB2 effectively decreases with increasing ensemble size. Even with 10 ensemble members (Ne = 10), CB2 becomes close to zero in the tropics.

While both AC2 and CB2 make the MSSS increase with the ensemble size, UB2 exhibits no obvious decrease with increasing ensemble size (Fig. 4c). Note that UB2 in the tropical stratosphere with Ne = 2 is larger than that of individual forecasts. Hence, UB2 inhibits the ensemble size effect. In the stratosphere, UB2 increases more dramatically with the forecast days in the tropics than in the extratropics (Figs. 3c and 4c). In the troposphere where the model bias is small (Lawrence et al. 2022), UB2 is close to zero throughout the forecast period (Figs. 3g and 4g). Note that despite the large model bias related to the wintertime polar vortices (Lawrence et al. 2022), UB2 in the extratropical stratosphere is relatively small because the variance of observation is large there (Additional file 1: Fig. S2). In contrast, in the tropical stratosphere where the variance of observation is small, a considerably large UB2 is observed. Note that UB can be understood as a failure to predict the mean state. Son et al. (2020) showed that the model errors associated with the zonal-mean flow grow rapidly in the stratosphere. The tropical stratospheric bias might be related to the quasi-biennial oscillation (QBO) which dominates the circulation of the tropical stratosphere (Lawrence et al. 2022). The CV2, which is independent of the forecasts, is large in the tropics because of a small variance of observation, while it is negligible in the extratropics.

Because UB2 inhibits the ensemble size effect, one of the simplest ways to enlarge the gain obtained from ensemble forecasts is to subtract the model mean bias from the forecast, referred to as bias correction. Figure 5 shows the temporal evolution of the MSSSs of the forecasts after bias correction. Compared with the results in Fig. 2, it is apparent that the MSSS increases considerably in the stratosphere. In particular, in the tropical stratosphere, even the individual forecasts after bias correction maintain MSSSs greater than 0.5 until the end of the forecast period. Note that lower degrees of freedom could be the physical reason why the geopotential height forecasts in the tropical stratosphere requires a smaller ensemble size than other regions to reach the saturation of skills. More importantly, the discrepancy between the estimated and operational skills in the stratosphere substantially decreases. As shown in Additional file 1: Fig. S3, the operational skill after bias correction becomes almost the same as the theoretical skill during the JJA period. Therefore, the expected forecast skill improvement with increasing ensemble size is attained after bias correction even in the stratosphere. Similar results are also obtained in temperature forecasts as shown in Additional file 1: Figs. S4 and S5. These changes in the operational forecast skill and ensemble size effect are more pronounced at the S2S time scale.

Fig. 5
figure 5

Same as Fig. 2, but after bias correction

Here, it should be stated that the above analyses aim not to evaluate the skill of the real-time forecasts, but to quantify the importance of the unconditional bias and its relationship to the theoretical estimate. For this purpose, the model mean bias is corrected with the real-time forecasts (see Data and Methods). However, in the operational forecasts, the bias correction is conducted with the long-term reforecasts. By considering the application of this study to the operational forecasts, the same analyses are repeated by correcting the model mean bias with the long-term reforecasts over the past 20 years which are available from the S2S prediction project website. Although there is a difference in the degree to which skill is improved, similar results are obtained (Additional file 1: Fig. S6). This result indicates that the ensemble size effect closer to the theoretical estimate can be obtained by removing the unconditional bias regardless of the details of the bias correction method.

Summary

This study explores the forecast skill and its dependency on the ensemble size in ECMWF real-time S2S forecasts. The results are particularly compared with theoretical estimates obtained under the perfect model assumption. In the troposphere, a high degree of skill is maintained for the first few days, after which the forecast skill rapidly declines and then stabilizes, and exhibit a good agreement with the theoretical estimate. The forecast skill in the stratosphere is higher than that in the troposphere on earlier forecast days, but decreases continuously, eventually becoming lower than the tropospheric skill on later forecast days. The stratospheric skill decreases much faster than the theoretical estimate, especially in the tropics.

The discrepancy between the estimated and operational skills in the stratosphere is mostly attributed to the unconditional bias resulting from model drift. By applying bias correction, the stratospheric forecast skill is substantially increased, becoming comparable to the theoretical estimate. The skill improvement with increasing ensemble size also follows the theoretical estimate. This result does not mean that the model can be assumed to be perfect, but should be understood in the limited sense that it is possible to expect an ensemble size effect obtained under the perfect model assumption when the unconditional bias is negligible. The results obtained in this study are not unique to the ECMWF model. Although not shown, essentially the same results have been found in analyses of the Centre National de Recherches Météorologiques (CNRM) model of Météo-France.

While bias correction can improve the ensemble size effect, the extent to which it can do so practically depends on various factors such as the specific bias correction method, the quality of reforecasts or climatology used for bias correction, and the target variable. The importance of bias correction for operational forecasts may not be surprising (e.g., Son et al. 2020). However, this study quantifies its importance and relationship to theoretical estimates. The optimal ensemble size for operational S2S forecasts is also estimated. For example, it is revealed that an ensemble size of 10 to 20 yields a forecast skill that is approximately 90% to 95% of the theoretical maximum skill.

Availability of data and materials

The ECMWF S2S real-time forecast data (http://apps.ecmwf.int/datasets/data/s2s) and ERA-Interim data (https://apps.ecmwf.int/datasets/data/interim_full_daily) are available online.

Abbreviations

AC:

Anomaly correlation coefficient

CB:

Conditional bias

CNRM:

Centre National de Recherches Météorologiques

CV:

Difference between the mean sample and historical climatology

DJF:

December–January–February

ECMWF:

European Center for Medium-Range Weather Forecasts

EPS:

Ensemble prediction system

ERA-Interim:

ECMWF Interim reanalysis

IFS:

Integrated Forecasting System

JJA:

June–July–August

MSE:

Mean square error

MSSS:

Mean square skill score

QBO:

Quasi-Biennial oscillation

S2S:

Subseasonal-to-seasonal

SIR:

Skill improvement ratio

UB:

Unconditional bias

VAREPS:

Variable resolution EPS

References

  • Bradley AA, Demargne J, Franz KJ (2019) Attributes of forecast quality. In Handbook of hydrometeorological ensemble forecating Springer, Berlin

    Book  Google Scholar 

  • Branković Č, Palmer TN, Molteni F, Tibaldi S, Cubasch U (1990) Extended-range predictions with ECMWF models: time-lagged ensemble forecasting. Q J R Meteorol Soc 116:867–912

    Article  Google Scholar 

  • Buizza R, Bidlot J-R, Wedi N, Fuentes M, Hamrud M, Holt G, Vitart F (2007) The new ECMWF VAREPS (variable resolution ensemble prediction system). Q J R Meteorol Soc 133:681–695

    Article  Google Scholar 

  • Buizza R, Leutbecher M, Isaksen L (2008) Potential use of an ensemble of analysises in the ECMWF ensemble prediction system. Q J R Meteorol Soc 134:2051–2066

    Article  Google Scholar 

  • Dee DP, Uppala SM, Simmons AJ, Berrisford P, Poli P, Kobayashi S, Andrae U, Balmaseda MA, Balsamo G, Bauer P, Bechtold P, Beijaars ACM, van de Berg L, Bidlot J, Bormann N, Delsol C, Dragani R, Fuentes M, Geer AJ, Haimberger L, Healy SB, Hersbach H, Hólm EV, Isaksen L, Kållberg P, Köhler M, Matricardi M, McNally AP, Monge-Sanz BM, Morcrette J-J, Park B-K, Peubey C, de Rosnay P, Tavolato C, Thépaut J-N, Vitart F (2011) The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Q J R Meteorol Soc 137:553–597

    Article  Google Scholar 

  • Déqué M (1997) Ensemble size for numerical seasonal forecasts. Tellus 49A:74–86

    Article  Google Scholar 

  • Goddard L, Kumar A, Solomon A, Smith D, Boer G, Gonzalez P, Kharin V, Merryfield W, Deser C, Mason SJ, Kirtman BP, Msadek R, Sutton R, Hawkins E, Fricker T, Hegerl G, Ferro CAT, Stephenson DB, Meecl GA, Stockdale T, Burgman R, Greene AM, Kushnir Y, Newman M, Carton J, Fukumori I, Delworth T (2013) A verification framework for interannual-to-decadal predictions experiments. Clim Dyn 40:235–272

    Article  Google Scholar 

  • Kumar A, Barnston AG, Hoerling MP (2001) Seasonal predictions, probabilistic verifications, and ensemble size. J Clim 14:1671–1676

    Article  Google Scholar 

  • Lawrence ZD, Abalos M, Ayazagüena B, Barriopedro D, Butler AH, Calvo N, de la Cámara A, Chalton-Perez A, Domeisen DIV, Dunn-Sigouin E, Carciá-Serrano J, Garfinkel CI, Hindley NP, Jia L, Jucker M, Karpechko AY, Kim H, Lang AL, Lee SH, Lin P, Osman M, Palmeiro FM, Perlwitz J, Polichtchouk I, Richter JH, Schwartz C, Son S-W, Statnaia I, Taguchi M, Tyrrell NL, Wright CJ, Wu RW-Y (2022) Quantifying stratosphere biases and identifying their potential sources in subseasonal forecast systems. Weather and Climate Dynamics 3:977–1001

    Article  Google Scholar 

  • Leith CE (1974) Theoretical skill of Monte Carlo forecasts. Mon Weather Rev 102:409–418

    Article  Google Scholar 

  • Leutbecher M (2019) Ensemble size: How suboptimal is less than infinity? Q J R Meteorol Soc 145:107–128

    Article  Google Scholar 

  • Leutbecher M, Lock S-J, Ollinaho P, Lang STK, Balsamo G, Bechtold P, Bonavita M, Christensen HM, Diamantakis M, Dutra E, English S, Fisher M, Forbes RM, Goddard J, Haiden T, Hogan RJ, Juricke S, Lawrence H, MacLeod D, Magnusson L, Malardel S, Massart S, Sandu I, Smolarkiewicz PK, Subramanian A, Vitart F, Wedi N, Weisheimer A (2017) Stochastic representations of model uncertainties at ECMWF: state of the art and future vision. Q J R Meteorol Soc 143:2315–2339

    Article  Google Scholar 

  • Murphy AH (1988a) Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon Weather Rev 116:2417–2424

    Article  Google Scholar 

  • Murphy JM (1988b) The impact of ensemble forecasts on predictablity. Q J R Meteorol Soc 114:463–493

    Article  Google Scholar 

  • Murphy AH, Epstein ES (1989) Skill scores and correlation coefficients in model verification. Mon Weather Rev 117:572–581

    Article  Google Scholar 

  • Polkova I, Schaffer L, Aarnes Ø, Baehr J (2022) Seasonal climate predictions for marine risk assessment in the Barents sea. Climate Serices 26:100291

    Article  Google Scholar 

  • Roberts CD, Senan R, Molteni F, Boussetta S, Mayer M, Keely SPE (2018) Climate model configurations of the ECMWF integrated forecasting system (ECMWF-IFS cycle 43r1) for HighResMIP. Geosci Model Dev 11:368–372

    Article  Google Scholar 

  • Robertson AW, Vitart F (2018) Sub-seasonal to seasonal prediction: the gap between weather and climate forecasting. Elsevier, Amsterdam

    Google Scholar 

  • Schneider EK, Dewitt D, Rosati A, Kirtman BP, Ji L, Tribbia JJ (2003) Retrospective ENSO forecasts: sensitivity to atmospheric model and ocean resolution. Mon Weather Rev 131:3038–3060

    Article  Google Scholar 

  • Shukla J (1998) Predictability in the midst of chaos: a scientific basis for climate forecasting. Sicence 282:728–731

    Article  Google Scholar 

  • Son S-W, Kim H, Song K, Kim S-W, Martineau P, Hyun Y-K, Kim Y (2020) Extratropical prediction skill of the subseasonal-to-seasonal (S2S) prediction models. J Geophys Res Atmos 125:e2019JD031273

    Article  Google Scholar 

  • Stan C, Kirtman BP (2008) The influence of atmospheric noise and uncertainty in ocean initial conditions on the limit of predictability in a coupled GCM. J Clim 21:3487–3503

    Article  Google Scholar 

  • Vitart F, Buizza R, Balmaseda MA, Balsamo G, Bidlot J-R, Bonet A, Fuentes M, Hofstadler A, Molteni F, Palmer TN (2008) The new VarEPS-monthly forecasting system: a first step towards seamless prediction. Q J R Meteorol Soc 134:1789–1799

    Article  Google Scholar 

  • Vitart F, Ardilouze C, Bonet A, Brookshaw A, Chen M, Codorean C, Déqué M, Ferranti L, Fucile E, Fuentes M, Hendon H, Hodgson J, Kang H-S, Kumar A, Lin H, Liu G, Liu X, Malguzzi P, Mallas I, Manoussakis M, Mastrangelo D, MacLachlan C, McLean P, Minami A, Mladek R, Nakazawa T, Najm S, Nie Y, Rixen M, Robertson AW, Ruti P, Sun C, Takaya Y, Tolstykh M, Venuti F, Waliser D, Woolnough S, Wu T, Won D-J, Xiao H, Zaripov R, Zhang L (2017) The subseasonal to seasonal (S2S) prediction project databasse. Bull Am Meteorol Soc 98:163–173

    Article  Google Scholar 

  • World Meteorological Organization (WMO) (2006) Standardized verification system (SVS) for long-range forecasts (LRF) (attachment II.8) in the manual on the global data-processing and forecasting system (WMO-No. 485)

Download references

Funding

This work was carried out through the R&D project “Development of a Next-Generation Numerical Weather Prediction Model by the Korea Institute of Atmospheric Prediction Systems (KIAPS)”, funded by the Korea Meteorological Administration (KMA2020-02212). This work was also supported by Korea Environment Industry and Technology Institute (KEITI) through “Climate Change R&D Project for New Climate Regime”, funded by Korea Ministry of Environment (MOE) (2022003560004).

Author information

Authors and Affiliations

Authors

Contributions

JYH organized, wrote, and revised the manuscript. SWK collected and analyzed data and prepared the first draft of the manuscript. CHP contributed to data collection and analysis. SWS performed supervision and revised the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Seok-Woo Son.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

40562_2023_292_MOESM1_ESM.docx

Additional file 1: Fig. S1 Same as Fig. 2, but for the JJA forecasts. Fig. S2 Global distribution of the standard deviation of observation for (a) 50- and (b) 500-hPa geopotential heights. Fig. S3 Same as Fig. 5, but for the JJA forecasts. Fig. S4 Same as Fig. 2, but for temperature forecasts. Fig. S5 Same as Fig. 5, but for temperature forecasts. Fig. S6 Same as Fig. 5, but with the long-term reforecasts.

Appendix

Appendix

A detailed derivation of Eq. (3) is as follows:

$${\text{MSE}}_{f} \left( {\tau ,N_{e} } \right) = \left\langle {\left\{ {\hat{f}_{{ij}} \left( {\tau ,N_{e} } \right) - o_{{ij}} \left( \tau \right)} \right\}^{2} } \right\rangle$$
$$= \,\left\langle {\left\{ {\hat{f}_{{ij}} \left( {\tau ,N_{e} } \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}^{2} + \left\{ {o_{{ij}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}^{2} - 2\left\{ {\hat{f}_{{ij}} \left( {\tau ,N_{e} } \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}\left\{ {o_{{ij}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}} \right\rangle$$
$$= \,\left\langle {\frac{1}{{N_{e}^{2} }}\sum\limits_{{k = 1}}^{{N_{e} }} {\left\{ {f_{{ijk}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}^{2} } + \frac{2}{{N_{e}^{2} }}\sum\limits_{{k = 1}}^{{N_{e} }} {\sum\limits_{{l = k + 1}}^{{N_{e} }} {\left\{ {f_{{ijk}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}} } \left\{ {f_{{ijl}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\} + \left\{ {o_{{ij}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}^{2} - 2\left\{ {\hat{f}_{{ij}} \left( {\tau ,N_{e} } \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}\left\{ {o_{{ij}} \left( \tau \right) - \bar{f}_{{ij}} \left( \tau \right)} \right\}} \right\rangle ,$$

where an overbar denotes an average over a forecast probability density function (p.d.f), and the second and last terms, i.e., averages of the sum of random covariances, are zero. Then, by denoting the variance of a forecast p.d.f by D,

$${\mathrm{MSE}}_{f}\left(\tau , {N}_{e}\right)=\frac{1}{{N}_{e}}\langle D\rangle +\langle {d}_{o}\rangle \approx \left\{\left(1+{N}_{e}\right)/{N}_{e}\right\}\langle D\rangle ,$$

where do = \({\left\{{o}_{ij}\left(\tau \right)-{\overline{f} }_{ij}\left(\tau \right)\right\}}^{2}\), and \(\langle {d}_{o}\rangle =\langle D\rangle\) is assumed for simplicity. Thus, using that MSEf(τ, 1) = 2\(\langle D\rangle\) for Ne = 1, we obtain the following equation indicating that an ensemble mean forecast is superior to an individual forecast:

$${\mathrm{MSE}}_{f}\left(\tau , {N}_{e}\right)=\left\{\left(1+{N}_{e}\right)/{2N}_{e}\right\}{\mathrm{MSE}}_{f}\left(\tau , 1\right),$$

which corresponds to Eq. (5) in Murphy (1988b).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, JY., Kim, SW., Park, CH. et al. Ensemble size versus bias correction effects in subseasonal-to-seasonal (S2S) forecasts. Geosci. Lett. 10, 37 (2023). https://doi.org/10.1186/s40562-023-00292-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40562-023-00292-9

Keywords