Predicting temperature and precipitation during the flood season based on teleconnection

In recent years, the damages resulting from abnormal hydrometeorological climate have substantially increased over the world due to the climate variability and change. Especially, the flood damage has been severely occurred during the flood season almost every year in Korea. For an example, we had the localized heavy rainfalls for 54 days in flood season of 2020 and had huge property damage and loss of life. Therefore, the study needs to be conducted to improve the predictive power of seasonal time-scale forecasts spanning one to several months for the damage reduction and prevention. In this regard, this study aims to provide a priori predictions (several months ahead) of the climate variable at target sites with a statistical method based on teleconnection with global climatic conditions. Herein, the paradigm of the temperature and precipitation prediction in the Geum river basin in Korea is presented. The purposes of the study are also (1) to analyse the characteristics of summer temperatures and precipitation according to the occurrence of El Niño/La Niña and (2) to suggest a seasonal prediction model that can consider the effects of the occurrence of El Niño and La Niña during the flood season. The model is constructed by classifying the data period into El Niño, La Niña, and neutral status. Then we have shown that the prediction model improves the predictive power for the predictions of climate variables such as temperature and precipitation at mid-latitude stations which Korea is located. Therefore, this study demonstrates the possibility of improving the predictive power for forecasting temperature and precipitation by the prediction model considering climate variability.


Introduction
The frequency and magnitude of extreme events (e.g., droughts, floods) are increasing worldwide. The seasonal forecasting that considers changes in the climate variables on a scale spanning one to several months can be more effective for managing the extreme events than long-term forecasting (Wood and Lettenmaier 2006). Previous studies have statistically predicted the regional climate (e.g., temperature, precipitation) of a target site based on global scale-climate variables, such as sea surface temperature (SST) and geopotential height (GPH), through teleconnection. Statistical analysis methods, such as multiple regression analysis and machine learning techniques, have been mainly applied to predict climate variables such as temperature and precipitation based on teleconnection (Asong et al. 2018;Cho et al. 2016;Kim et al. 2018;Lee et al. 2018;Sittichok et al. 2018). In recent years, the studies on El Niño-Southern Oscillation (ENSO), a climate variability factor that affects the global climate, have also been conducted (Amarasekera et al. 1997;Bonsal et al. 2001;Broman et al. 2020;Denise et al. 2017;Feng et al. 2020;Korecha and Sorteberg 2013;Mamalakis et al. 2018;Meißner et al. 2017;O'Reily et al. 2018;Seibert et al. 2017;Shabbar and Yu 2012;Shabbar and Khandekar 1996;Silva et al. 2019).
However, the accuracy and reliability of the seasonal forecasting techniques are still inadequate. It is because the seasonal climate, especially in the mid-latitudes, is affected by various climate factors such as air currents in the tropical ocean and Artic oscillation (Cai et al. 2011;Cho et al. 2016;Cao et al. 2017;Lee 2015;Lee et al. 2016;Gerlitz et al. 2016;He and Wang 2013;He et al. 2017;Kim and Ahn 2012;Nur'utami and Hidayat 2016;Ouyang et al. 2014;Park and Ahn 2016;Qiu et al. 2014;Singhrattna et al. 2005).
The Korean Peninsula is located on the western border of the North Pacific Ocean and is influenced by the El Niño and La Niña phenomena. Therefore, there is large seasonal variability in precipitation and the studies have been conducted on the teleconnection between the precipitation in Korea and ENSO events. Say, Cha et al. (1999) analysed the relationship between ENSO and the climate in Korea, showing that El Niño has the tendency to modulate temperature. Kim et al. (2008a, b) analysed the effects of ENSO on the frequency and spatial distribution characteristics of rainfall in Korea. Lee et al. (2016) identified the climatic teleconnections between ENSO and mid-latitude precipitation over South Korea. Previous studies have either analysed the relationship between ENSO and the climate or made seasonal predictions using climate variables; however, there is yet to be a study on the seasonal prediction of climate variables considering climate variabilities, such as El Niño and La Niña.
Therefore, this study aims to perform seasonal prediction of climate variables considering the effects of El Niño/La Niña on the climate in Korea. Particularly, seasonal predictions of the flood season temperature and precipitation using lagged teleconnection with SST and GPH were performed. Section "Materials and methods" briefly describes the El Niño/La Niña phenomenon and the seasonal prediction method based on lagged teleconnection. In addition, the evaluation indicators used to assess the predictive power of the seasonal prediction model are described. In "Results" section, the temperature and precipitation characteristics according to the occurrence of El Niño/La Niña are analysed. "Conclusions" section provides a discussion of the results and conclusions.

Study area and data collection Study area
In this study, the Geum River basin in Korea with an area of 9645.5 km 2 and river length of 384.8 km was selected for the analysis (Fig. 1). The Geum River is the third largest river in South Korea and flows from the central inland to the West Sea (Kim 2012). The elevation of the Geum River is not as high as the other rivers, but its river length is considerable with a smooth river slope and wide plain developed in the downstream region. The total area of the basin is composed of 62% forests, 15% rice paddies, and 11% fields, while the rest of the area is composed of urbanized areas, grasslands, and bare lands (Ahn et al. 2013).

Hydrometeorological data
The study was performed based on data collected from six weather stations (Cheongju, Daejeon, Chupungryeong, Boeun, Buyeo, and Geumsan). For the analyses, the monthly mean temperature and precipitation data from the Meteorological Administration's Automated Surface Observing System (ASOS) were collected. Table 1 lists the observation periods and specifications of each weather station. Figure 2 shows the monthly mean temperatures at the six weather stations, demonstrating a similar pattern with relatively equal values. Figure 3 shows the monthly precipitation values at the six weather stations, showing relatively large monthly precipitations in Cheongju and Buyeo in August 1995, and in Boeun and Daejeon in August 1998. In addition, less rainfall was recorded during the flood season (June to September) in 2013 and 2015.

Climate data
The SST and GPH data provided by NCEP/NCAR were collected as the global climate data. These materials are available for download from IRI/LDEO climate data library. The SST anomaly was used to examine the correlation of the weather data at the different weather stations and the GPH data at 850 hPa, which exhibits a high correlation with the weather change, was used. The ranges of the SST and GPH data are listed in Table 2.
The analysis data used were obtained from 1993 to 2016, which is the available range from the weather stations. As this study aims to obtain seasonal prediction during the flood season, the data from June to September within the period of analysis was used. The data from June 1993 to September 2012 were used to build the model and the data from June 2013 to September 2016 were used for the verification.

El Niño and La Niña phenomena
El Niño refers to a phenomenon in which the SST in the tropical Pacific Ocean is higher than its usual value, lasting for several months or longer. This occurs due to the interaction between the ocean and atmosphere in the tropical Pacific Ocean, particularly the weakening of the east-west trade winds at the equator. When the trade winds are weakened, the regions with strong convective activity in the Western Pacific expand and move to the mid-Pacific region. As the seawater moves to the east due to the changes in atmospheric circulation, the thermocline in the Eastern Pacific region deepens and SST rises, leading to a change in the atmosphere. In contrast, the opposite occurs for La Niña, in which SST in the mid-Pacific region is lower than its usual value.
When the five consecutive 3-month running mean of the SST anomaly is + 0.5 °C or higher in the tropical Pacific Niño 3.4 region (5° S to 5° N, 170° W to 120° W), which is the most commonly used region for monitoring El Niño and La Niña, the first month is deemed as the beginning of El Niño. Conversely, when the five consecutive 3-month running mean of the SST anomaly is − 0.5 °C, the first month is regarded as the beginning of La Niña. El Niño and La Niña phenomena occur in the tropical Pacific region, but they affect global weather and climate, including the global temperature and precipitation, through the teleconnection of the atmosphere and ocean.

Seasonal prediction model for temperature and precipitation based on lagged teleconnection
Global climate model (GCM) is known to reproduce atmospheric fluctuations more accurately on a global scale than a detailed climate distribution on a regional scale. Therefore, the indirect prediction of the longterm climate of a region based on identifying the factors that directly or indirectly affect the regional climate using GCM is expected to have better performance. The method used to estimate the local climate using the correlation between the regional climate, and oceanic and  atmospheric circulation over a wide area is called "statistical downscaling". Statistical downscaling is an indirect downscaling method used in predicting precipitation and temperature in a target basin based on the observed data, which are observed climate factors that have been acquired either present or recently. This is done by considering the lag time that may exist between the global-scale climate pattern and precipitation or temperature in the target basin. The predictors are calculated based on past observations and are only considered when the lag time is greater than the lead time, considering the lag time between the climate factor and dependent variables (Choi and Moon 2013;Kim et al. 2008a, b;Kim and Park 2010;Kim et al. 2007;Schepen et al. 2012;Wang et al. 2008).
In this study, the SST and GPH data provided by the National Centres for Environmental Prediction and National Centre for Atmospheric Research (NCEP/ NCAR) were used as the observed climate factors to predict the monthly mean temperatures and precipitation. The delayed teleconnection of 1-6 months between the dependent variables and the global climate factors were considered. The time series of the most correlated grid was extracted to construct the prediction model, as shown in Fig. 4.
In this study, the data period for constructing the prediction model was classified as El Niño, La Niña, and neutral status. The observational temperature and precipitation data, SST anomaly, and GPH data were classified for each period. Seasonal prediction was performed by constructing the model based on the teleconnection analysis according to the lag time.

Predictive power evaluation indices
In this study, the normalized root-mean-square error (NRMSE) and mean absolute percentage error (MAPE) were used as indices to evaluate the predictive power of the model. The correlation coefficient is an index ranging from − 1 to 1 that shows the degree of linear relationship between the prediction and observation data. The correlation coefficient of 1 and − 1 represents the positive and negative correlation between the prediction and observation, respectively. NRMSE is obtained by dividing the RMSE by the range of observation (maximum-minimum); the closer the NRMSE is to 0 (%), the smaller the difference between the prediction and observation data. MAPE is the degree that accounts for error in the prediction; the closer the MAPE is to 0 (%), the smaller the difference between the prediction and observation data. These indices are calculated as follows: Fig. 4 Prediction model for temperature and precipitation based on the teleconnection between precipitation and temperature. SST is the sea surface temperature and GPH is the geopotential height. The prediction model was constructed by extracting the time series of the grid with the highest correlation. T and P denote the monthly mean temperature and precipitation, respectively; and a, b, c, a' , b' , and c' are the regression coefficients for the time series of the grid with the highest correlation in the SST anomaly and GPH data where y i denotes the observation, y i denotes the prediction, n denotes the number of data, and Max y i and min y i denote the maximum and minimum observations, respectively.

Analysis of weather characteristics during the flood season according to the occurrence of El Niño and La Niña
To identify the effects of ocean and atmospheric fluctuations on the summer temperature and precipitation in Korea considering the El Niño and La Niña phenomena, the data were classified into El Niño, neutral, and La Niña. The characteristics of the temperature and precipitation during each period were compared and analysed. The results were obtained by comparing the distribution of the monthly mean temperature and precipitation data for each state during the flood period from June 1993 to September 2016.
From the distribution of the monthly mean temperature in the form of a box plot based on the classification of the periods (Fig. 5), the monthly mean temperature during the El Niño period was 19-28.1 °C and the mean value was lower with a wider range of distribution compared with the neutral and La Niña periods. The deviation range of the quartile values for each classification was less than 1 °C and there were no significant characteristics for all studied periods (Table 3). From the distribution of the monthly precipitation in a box plot for each classification (Fig. 6), the largest mean and range were noted in the La Niña period, followed by the neutral and El Niño periods. During the El Niño  period, the mean monthly precipitation was 181.7 mm with a median of 145.7 mm. Meanwhile, the mean was 263.7 mm with a median of 216.7 mm during the La Niña period, indicating a wider range of distribution. In the case of the neutral period, the mean was 232.1 mm and the median was 209.7 mm, demonstrating values between those of the El Niño and La Niña periods. For the distribution of the monthly precipitation (Table 3), there were differences for each classified period, suggesting that the climate factors of the El Niño and La Niña periods directly or indirectly influenced the monthly precipitation during the flood season.
One-way analysis of variance (ANOVA) was performed to quantitatively analyse the statistical significance in the difference in the distribution of each classification. This analysis method is used to compare the variation between and within three or more groups to determine the significance of the difference between them. The results of the analysis based on the division of the monthly precipitation and mean temperature as a function of the classification of the period are listed in Table 4. The P-ratio is the ratio of the mean-of-squares between and within the groups. If the P value is within 5% of the significance level, the difference between the groups was considered significant. For the monthly mean temperature, the P value was 0.25, indicating that the difference between the groups was not significant. In contrast, the P value for the monthly precipitation was 0.02, suggesting a statistically significant difference. Therefore, the data characteristics for the period of each status should be reflected to predict the monthly precipitation.
Precipitation during the flood season is known to be affected by the El Niño and La Niña phenomena. Every summer, the warm and humid North Pacific anticyclone from the southwest of the Korean peninsula rises to the north, and the cold and humid Okhotsk sea anticyclone descends from the northeast and forms a seasonal rain front. If El Niño occurs during this period,  the trade winds, which blow from east to west and cause the spread of hot water from the Western to the Eastern Pacific, are weakened. Correspondingly, the North Pacific anticyclone, which develops in the east of Japan, fails to receive sufficient water vapor, thereby considerably inhibiting its development. As the force of the North Pacific anticyclone is weakened, the seasonal rain front cannot be formed or cannot move to the north, thereby affecting the southern region only. Therefore, the occurrence of El Niño during the rainy season results in a "dry rainy season" with considerably lower rainfall. Accordingly, the precipitation decreases compared to the same period in a normal year and the number of typhoons also decreases. This further results in a decrease in the precipitation during the late summer (late August to early September). In contrast, the opposite occurs due to the La Niña phenomenon. Particularly, when La Niña occurs in the summer, the number of typhoons in the Korean Peninsula increases and the precipitation during the second rainy season tends to increase. This identifies the impacts of El Niño and La Niña on precipitation during the flood season in Korea. Therefore, developing a model to predict monthly precipitation in the summer considering the classification of the period (El Niño/La Niña/neutral) would increase the predictive power.

Seasonal climate prediction based on lagged teleconnection General seasonal prediction based on teleconnection
To identify the climate factors that will be used as independent variables for the prediction model of temperatures and precipitation in the target basin, the correlation between the meteorological data, and global-scale ocean and atmospheric climate data were analysed. The monthly mean temperature and observation data collected from the six weather stations were used as the representative values. The global-scale SST anomaly and GPH data from the same period were also used. The correlation coefficient was calculated for each grid in the GCM considering the lag time (1-6 months). The conceptual diagram is shown in Fig. 1. A linear regression model was constructed to predict temperature and precipitation using the SST anomaly and GPH data with the highest correlation calculated for each lag time. The time series data of the grid with the highest correlation coefficient (for a lag time greater than the lead time) was used as the independent variable of the prediction model. For example, to achieve accurate predictions one month ahead, all teleconnection climate factors selected with lag times of 1-6 months can be used as independent variables. However, for predictions two months ahead, the teleconnection climate factors with a lag time of one month were excluded. Table 5 lists the equations of the prediction model for temperature and precipitation based on teleconnection as a function of lag time. In the model, the SST anomaly and GPH were expressed as SST and GPH, respectively, and the numbers in the parentheses indicate the lag time.
The results obtained by the prediction model for the monthly mean temperature are shown in Fig. 7 and Table 6. As shown in the graphs, the prediction values were slightly higher than the observation results obtained in September 2014 for lead times in the range of 1-4 months and temperature was generally underestimated from July to August 2013. Nonetheless, the trend of the observation matched well with the general prediction for all lead times.
The prediction model was evaluated using the correlation coefficient between the predicted and observed monthly mean temperatures and precipitation for each lead time. In Table 8, the correlation coefficient was 0.6 or higher for the monthly mean temperature for all lead times. With the exception of the predicted results obtained for a lead time of six months, outstanding results with a correlation coefficient of 0.7 were obtained. For the predicted monthly precipitation, the overall correlation coefficient was less than 0.3 regardless of the lead time. Therefore, the prediction model based on lagged    teleconnection was applicable to the monthly mean temperature but not to the monthly precipitation.

Development and application of the prediction model considering the occurrence of El Niño and La Niña phenomena
In the previous section, the prediction obtained by the general prediction model for monthly precipitation did not simulate the observation well for the flood season of 2015 when El Niño occurred. Therefore, in this study, a model for predicting the monthly mean temperature and precipitation was presented considering the effects of El Niño and La Niña.
To consider the effects of the El Niño and La Niña phenomena, the training period from June 1993 to September 2012 was divided into three stages: El Niño, La Niña, and neutral. The observed temperature and precipitation data, and SST anomaly and GPH data were established according to the classification of the status. A regression model was constructed based on the analysis of the teleconnection as a function of the lag time. After constructing three prediction models for each status, the monthly mean temperature and precipitation were predicted using their corresponding model during the verification period of June 2013 to September 2016.
The prediction model methods for the monthly mean temperature and precipitation based on the effects of El Niño and La Niña were referred to as the "modified method" to distinguish it from the general method. The results of the prediction model for the monthly mean temperature using the modified method are shown in Fig. 9 and Table 9. According to the results, the prediction reflected the tendency of the observation data except for the prediction with a lead time of 6 months. In addition, the temperature predictions in June 2015 were overestimated for all lead times. The results of the prediction model for the monthly precipitation using the modified method are shown in Fig. 10 and Table 10. In the resulting graphs, the precipitation was overestimated but the predicted precipitation was reduced in 2015 as the drought during this period was considered (Fig. 11).
The modified prediction model was evaluated using the correlation coefficient between the predicted and observed monthly mean temperatures and precipitation for each lead time. As shown in Table 11, the predicted monthly mean temperature demonstrated a correlation coefficient ≤ 0.6 for all lead times and the predicted monthly precipitation demonstrated a correlation coefficient in the range of 0.43-0.65 depending on the lead time.

Comparison of the results of the seasonal prediction models
In this section, a comparative analysis of the predicted results obtained using the modified method and general method was performed, and the applicability of the modified method was evaluated. The correlation coefficients of the predicted results obtained using the general method and modified methods with the observed results were obtained as a function of the lead time. The NRMSE and MAPE values are listed in Tables 12 and 13. In comparison of the results obtained using the two methods, Fig. 9 Results of the modified temperature prediction model lower NRMSE and MAPE values denote better predictive power.
Comparing Figs. 7 and 9, the general method accurately predicted the temperature, which corresponds with the observed values, while the modified method overestimated the temperature in some periods. Comparing the evaluation indicators in Table 12, the predicted results obtained by applying the general model were more accurate than the modified method. For the monthly precipitation, the general method overestimated the values during 2015, which was likely caused by the effects of El Niño on precipitation. As a result of the application of the modified method, which considered the effects of El Niño, the prediction errors of the 2015 results were reduced. The results in Table 13 suggest that the modified method achieved outstanding results, except for the NRMSE results with lead times of 5 and 6 months. However, the NRMSE results  . 10 Results of the modified precipitation prediction model obtained by the general method and modified method were relatively similar. Therefore, it is more advantageous to apply the general model to predict the average monthly temperature data and the modified model to predict the monthly precipitation data.

Discussion
To explain the teleconnection between each climate factor, and the monthly mean temperature and precipitations in the target basin, the global-scale GPH and SST anomaly data are presented in Appendix 1 starting from    1993, 2003, and 2002). Among the three years with the highest total precipitation, La Niña occurred during the flood season in 1998 and 2011. In 2003, the SST anomaly data suggested the lower SST in the El Niño and La Niña monitoring regions, which is attributed to the strengthened east-west trade wind. For these years, SST increased in the Western Pacific waters during the flow season from June to September. Conversely, in the years with the lowest total precipitation, El Niño occurred in 2015 and 1994. While the SST in the sea area near the Korean peninsula was higher than the surrounding area during the flood season in 1994 and 2001, the SST in the Philippine Sea was lower than that in the surrounding area. In 2015, the SST in all Western Pacific waters were consistently low during the flood season. Contrary to the years with the high levels of precipitation, the east-west trade wind was weakened and the hot water from the Western Pacific region spread wider toward the Eastern Pacific region.
Meanwhile, high GPH anomaly data were noted from the mid-Pacific to the Korean regions during the years with the highest total precipitation in the order of 1998, 2011, and 2003. In the years with the lowest total precipitation, low GPH values, in the order of 2015, 1994, and 2001, were observed in the coastal waters of Peru in the Eastern Pacific region from April to May, which migrated in the northwest direction during the flood season to eventually pass the Korean Peninsula during the flood season. Therefore, the results here indicate that the monthly GPH distribution affected the precipitation during the flood season.
The correlation between the monthly mean temperature and SST anomaly data suggested that the monthly mean temperature is closely associated with the SST in the Korean seas. In other words, regardless of the SST in the surrounding sea areas, the monthly mean temperature varies according to the SSTs in the Korean Seas. The high and low monthly mean temperatures were predicted when the GPH near the Korean Peninsula was high and low, respectively.  In summary, precipitation was affected by the occurrence of El Niño and La Niña, change in the SST of the seawater, which moved from the Pacific equator to the Western Pacific and Indian Ocean, and change in the GPH, which moved from the Eastern Pacific region in the northwest direction. The monthly mean temperature was only affected by the SST and GPH near the Korean Peninsula, indicating that the El Niño or La Niña phenomena have no significant effects on the teleconnection.
The grids with the highest teleconnection coefficient were used as the predictors of the general model and modified method according to the lag time (Appendix 2 and 3). From Appendix 2, it is difficult to determine significant characteristics by looking at the position of the grid displayed in the SST anomaly data as a function of lag time. However, by observing the position of the grid displayed in the GPH data, the grids of the mid-latitudes are characterized by a high positive correlation.
In the case of the delayed teleconnection coefficients for the El Niño, La Niña, and neutral states in Appendix 3, the grid with a high correlation coefficient of the SST anomaly with the monthly precipitation during the El Niño and La Niña periods moved toward the West Pacific region from the equator at a lag time of six months to one month (Figs. 26,28). From the correlation coefficients of the GPH with the monthly precipitation during the El Niño and La Niña periods, positive correlations gradually appeared in the northwest direction across the Pacific Ocean in the waters near Peru during the La Niña period and negative correlations were observed during the El Niño period with high positive correlations at its edge (Figs. 26,28). In the neutral period, a grid with a position similar to that of the general model in Appendix 2 was selected for the monthly precipitation and a grid with a mid-latitude position selected similar to that in Appendix 2 was selected for the monthly mean temperature (Fig. 27). Therefore, the model improved by taking into consideration the climate factors that affect the actual precipitation using the modified method with the classified period. The monthly mean temperature was not directly affected by the occurrence of El Niño or La Niña phenomena, as indicated by the insignificant difference in the results obtained using the general model and modified method.

Conclusions
This study attempted to achieve seasonal predictions using delayed teleconnection considerations the effects of the El Niño or La Niña phenomena on the temperature and precipitation during the flood season. By comparing the results obtained by the general and modified methods, the predictive power for precipitation improved using the modified method. In contrast, there was no significant difference found in terms of the predictive power for the temperature. Thus, it is more appropriate to use the general method to predict temperature. This was also confirmed by the global-scale GPH and SST anomaly data of the years with the highest and lowest monthly mean temperatures and precipitations, and the positions of the grids used as the predictors for the study model in the Appendices. Therefore, this study improved the predictive power for precipitation at mid-latitude points that are heavily affected by climate variability, including El Niño and La Niña. Based on the results, future studies could improve the predictive performance for seasonal climate by considering more climate factors associated with teleconnection.