Estimating hourly surface shortwave radiation over northeast of the Tibetan Plateau by assimilating Himawari-8 cloud optical thickness

To reduce the uncertainty estimation of clouds and improve the forecast of surface shortwave radiation (SSR) over the Tibetan Plateau, a new cloud assimilation system is proposed which is the first attempt to directly apply the four-dimensional local ensemble transform Kalman filter method to assimilate the cloud optical thickness (COT). The high-resolution spatial and temporal data assimilated from the next-generation geostationary satellite Hima-wari-8, with the high-assimilation frequency, realized an accurate estimation of the clouds and radiation forecasting. The COT and SSR were significantly improved after the assimilation by independent verification. The correlation coefficient (CORR) of the SSR was increased by 11.3%, and the root-mean-square error (RMSE) and mean bias error (MBE) were decreased by 28.5% and 58.9%, respectively. The 2-h cycle assimilation forecast results show that the overestimation of SSR has been effectively reduced using the assimilation system. These findings demonstrate the high potential of this assimilation technique in forecasting of SSR in numerical weather prediction. The ultimate goal that to improve the model forecast through the assimilation of cloud properties requires further studies to achieve.


Introduction
The vigorous and rapid cloud convective activities and intense solar radiation over the Tibetan Plateau have challenged the accurate estimation of cloud and surface shortwave radiation (SSR) (Yang et al. 2008;Wang et al. 2015;Zhong et al. 2019).The accurate simulation of clouds is difficult using numerical weather models, but they directly affect SSR, especially in summer (Sharma et al. 2016;Hines et al. 2019).Cloud assimilation techniques provide the best optimal estimations of cloud parameter information for improving simulation results (Errico et al. 2007;Lopez 2007;Jones and Stensrud 2015;Benjamin et al. 2021).However, challenges remain in selecting suitable assimilation data and the deficiency of available assimilation techniques (Chen et al. 2015;Kurzrock et al. 2018;Kotsuki et al. 2020).
The data assimilated from cloud observations mainly come from ground-based radar and satellite sources (Errico et al. 2007;Jones et al. 2016).Although assimilation from radar data can effectively improve the analysis field in cloud-resolving models, these are deficient in having only a limited spatial range.It is also difficult to apply radar data in areas where ground observation stations are scarce, such as on the Tibetan Plateau.Satellites are an important source for data assimilation because of their strong spatiotemporal continuity and wild range (Thépaut 2003;Benedetti and Janisková, 2008;Bauer et al. 2011;Geer et al. 2018).Compared with polarorbiting satellites, geostationary satellites can obtain a broad range of spatiotemporally continuous, characteristic cloud-parameter information (Kurzrock et al. 2018).The Himawari-8 next-generation satellite provides highprecision cloud-parameter products that enable highfrequency data assimilation and verification (Yang et al. 2020(Yang et al. , 2022)).The main assimilation methods currently used in the geophysical modeling community include three-dimensional variational assimilation (3D-Var), four-dimensional variational assimilation (4D-Var), and ensemble Kalman filter (EnKF) (Kurzrock et al. 2018).The 3D-Var can only be used at the beginning of the simulation to improve the initial conditions.The 4D-Var increases the assimilation frequency compared with the 3D-Var.Benedetti (2008) used the 4D-Var to assimilate the cloud optical thickness (COT) from the Moderate Resolution Imaging Spectroradiometer, although the temporal resolution of polar-orbiting satellite data is relatively insufficient.Yang et al. (2009) compared the assimilation effects of the EnKF and 4D-Var methods and found that the assimilation effect was better with the EnKF by correcting the errors of the background field.Dai et al. (2019) used the four-dimensional local ensemble transform Kalman filter (4D-LETKF) to make the hourly aerosol assimilation of Himawari-8 aerosol optical thickness (AOT) which obviously improved the simulation effect of the AOT.The direct assimilation techniques used presently, based on satellite emissivity or brightness temperature, have a poor assimilation effect in cloudy regions because they cannot contain the information of cloud thickness (Liu et al. 2012).The COT represents the extinction effect of clouds, which is suitable in assimilation among the many parameters of cloud products (Otsuka et al. 2021).In general, the assimilation effect of existing assimilation schemes is unsatisfactory.There is an urgent need for high-frequency and high-accuracy assimilation for clouds assimilation.
The primary purpose of enhancing assimilation impact is to provide a more accurate initial field, as the initial field is a crucial driver of models and influences the accuracy of forecast results.Therefore, the main goal of this work is to utilize assimilation techniques to improve cloud simulation and ultimately advance the forecast effect of surface shortwave radiation.In this study, we attempted to utilize high-precision satellite data to obtain a high-precision estimation of cloud cover and solar radiation in an area where there were few ground observation stations--the Tibetan Plateau.A new cloud assimilation system was developed by firstly implementing the 4D-LETKF method in the mesoscale Weather Research and Forecasting (WRF) numerical model to assimilate the COT directly.Observation data and multiple hydrometeor variables were used to establish the covariance relationship between the two-dimensional (2D) COT and the 3D multivariate variables.The advantage of this method is that it can increase the assimilation frequency without simultaneously costing more in computational resources.The high-frequency satellite observational data and the 4D-LETKF make assimilation possible on an hourly basis.The assimilation and forecast experiments were conducted respectively.The sections in this paper are organized as follows: Sect."Methods" briefly outlines the methodology and the experimental setup; Sect."Results" gives the results, and Sect."Summary and Discussion" summarizes the key findings and provides a discussion of the study.

Data and assimilation system
The next-generation Himawari-8 satellite started its official operations on 7 July 2015, covering the Asia-Oceania regions from 80°E to 160°W between 60°S and 60°N.The equipped airborne hyperspectral imager sensor covers 16 observation bands, from visible light through near infrared.The Himawai-8 satellite not only provides information on cloud microphysical parameters, but also includes the radiation parameters at the surface, which has been used for studies of cloud temporal variations such as diurnal and cloud tracking (Yang et al. 2020(Yang et al. , 2022)).The Himawari-8 cloud optical properties are freely available at the website of the Japan Aerospace Exploration Agency Himawari Monitor (http:// www.eorc.jaxa.jp/ ptree/ index.html).The most outstanding feature of the Himawai-8 satellite data is that it not only has the advantages of a traditional geostationary satellite, such as wide space coverage and good time continuity, but also overcomes the shortcomings of traditional geostationary satellites, such as low spatial resolution and poor accuracy, having a spatial resolution of 0.05 degrees and a temporal resolution of 10 min.In this study, the COT from Himawari-8 was used for data assimilation and the SSR was used for verification.
The LETKF is an update of the traditional EnKF (Hunt et al. 2007).The 4D-LETKF is a more efficient method developed from the traditional EnKF that avoids frequent switching between models and data in the assimilation process (Cheng et al. 2019;Dai et al. 2019).In this study, we extended this method to the assimilation of COT to obtain an optimal estimation of cloud properties for model simulations.The 4D-LETKF assimilation scheme optimizes the model state through the following five formulas: where the suffixes a and b represent the analysis and background fields, respectively; the state vector x repre- sents the ensemble mean results; the matrix X represents the perturbations; the weight matrix w a specifies a linear combination of the background ensemble perturbations to the background mean to obtain the analysis ensemble mean; the matrix Y b is the background ensemble perturbation matrix in the observation space or the background observation ensemble perturbation matrix; the matrix R is the observation error covariance matrix; the matrix I is the identity matrix; the vector y o denotes the assimilated observations (i.e., in this study, Himawari-8 COT); the vector y b represents the ensemble mean background observations; and the matrix Y b represents ensemble background observation perturbations.
We implemented a cloud assimilation system into the WRF version 4.2 to realize the assimilation of hydrogenic particles.Figure 1(a) depicts the flowchart of the 4D-LETKF cloud assimilation system.The background field was constructed using the initial and boundary conditions, which came from the operational Global Data Assimilation System (GDAS) final analysis from the National Centre for Environmental Prediction (NCEP).The used data in our research from NCEP have a spatial resolution of 0.25º (NCEP,2015).The dynamic framework and microphysical settings are described in Sect."Model Description and Experimental Setup".Before the assimilation, the model had to be fully spined for at least 12 h, then the forecast result was output.The COT is a diagnosed variable that is output from the "solar diagnose" section, calculated from the cloud water and water vapor mixing ratios, cloud effective radius, among other things. (1) The satellite COT data from Himawari-8 were used for the true value.Before the assimilation, the quality control was undertaken, the observation data were interpolated to the WRF grid point.Only the values within the range of three standard deviations of the mean value were averaged to ensure the accuracy and authenticity of the observed values.Then, the covariance relationship was established between the COT and the 3D cloud water and water vapor mixing ratios.After the assimilation, the analysis result is used as the input data for the next cycle of the calculation or can be used in forecast research.

Model description and experimental setup
The WRF model was developed by the National Centre for Atmospheric Research and the National Oceanic and Atmospheric Administration (Powers et al. 2017).The advantage of the WRF model is that it contains several microphysical parameterization and dynamical schemes; version 4.2 was used in this study.As COT is a diagnostic quantity, during the calculation, the cloud water path was first obtained by integrating the cloud water mixing ratio in the vertical direction, obtained from the quotient of the cloud water path and the effective radius of cloud droplet particles.
The simulation time ran from 2020.8.14 UTC 12:00 to 2020.8.16 UTC 12:00, the simulation region ranged from 33°N to 43°N and 94°E to 104°E, with a spatial resolution of 10 km horizontally and 30 vertical levels, and with a time step of 30 s.The schematic diagram of the simulated The following items were adopted: the new Thompson microphysical scheme (Thompson et al. 2008), the Grell 3D scheme for convective dynamic parameterization, and the rapid radiation transfer model for general circulation models longwave and shortwave radiation scheme.The initial and boundary conditions were from Global Forecast System data.Ten ensemble prediction members were set in the simulation.EnKF calculates the background error covariance matrix through ensemble prediction.It is necessary to introduce certain variable perturbations to build the differences among the ensemble members.WRF model contains different types of perturbations schemes.As temperature, wind, and humidity play an important role in cloud generation and development, so we choose stochastically perturbed physical tendencies (SPPT) scheme, which carries out random disturbance on potential temperature, wind speed and humidity (Christensen et al. 2017).In terms of disturbance setting, 15 km was selected in the range of random perturbation length scale, to make the disturbance change more drastic and make the characteristics of small and medium scale changes between different members of the cloud more obvious.The standard deviation of the random perturbation field at each grid point was 0.5, which is the default setting, to make the perturbation amplitude of the model moderate, and the calculation stability.The observation error standard deviation of the retrieved COT was set to 5%.The assimilation was carried out hourly.The first 12 h were used to fully spin up.From 2020.8.16 UTC 00:00 to UTC 12:00, a forecast experiment was conducted.

Results
To evaluate the effects of the Himawari-8 COT assimilation, the statistical criteria, including the mean bias error (MBE), the root-mean-square error (RMSE), and the correlation coefficient (CORR) were calculated.Figure 2 shows the spatial distribution of the COT and the CORR at 2020.8.15 UTC 06:00, 08:00, and 10:00.In (a), (b), and (c), warm colors indicate that the COT is the largest, cold colors indicate that the COT is small, and blue is the smallest, and in (d), (e), deep red indicates that the simulated results are in the best agreement with the observed results at this point.The cloud coverage changed little over the three times slots, being mainly distributed in the north and southeast of the simulated area.However, the area where the COT had large values increased, indicating that the clouds were in a stage of vigorous development.The COT in the cloudiest area reached 150 at UTC 10:00.Thick clouds affect both the development of deep convective clouds and SSR.Before assimilation, the spatial distribution of the clouds was close to the observed data, although the large-value area of COT was not consistent in the southeast of the study area.At UTC 10:00, the COT in the southeast region had even decreased compared with UTC 08:00.After assimilation, the spatial distribution of the COT agreed with the observational data [see Fig. 2(c)], especially in the area where the COT was underestimated.Figure 2(d, e) shows that the CORR of the COT at all three moments increased after assimilation, and at UTC 10:00, the MBE, RMSE, and CORR had all improved.
The accurate estimation of the SSR had a positive impact on the solar energy forecast and surface temperature.The cloud water and water vapor mixing ratios directly affect the SSR.In Fig. 3 (a), (b), and (c), warm colors indicate that the SSR is the largest, cold colors indicate that the SSR is small, and blue is the smallest, and in (d), (e), deep red indicates that the simulated results are in the best agreement with the observed results at this point.Figure 3(a) gives the value of the SSR from Himawari-8 at UTC 06:00 as more than 1,000 W/m 2 in the cloud-free region and less than 300 W/m 2 in the cloudy region, indicating that the cloud extinction effect was very strong.Figure 3(b, c) shows the 1-min simulation results for the SSR before and after assimilation.Under clear skies, the SSR reached 900 W/m 2 at UTC 06:00, which is comparatively consistent with the observational data, whereas the value of the SSR in the cloudy region was seriously overestimated.After assimilation, the overestimation phenomenon in the cloudy region was significantly improved, although the value was still higher than the observational data.
In the validation section, we utilized two kinds of data, including the Moderate Resolution Imaging Spectroradiometer (MODIS) 1 km resolution surface shortwave radiation product named MCD18C1 (https://search.earthdata.nasa.gov/search/granules?p=C2484081120-LPCLOUD&pg[0][v]=f&pg[0][gsk]=-start_date&q=M CD18&tl=1694317323.841!3!!) and the 5 km resolution Himawari-8 geostationary satellite surface shortwave radiation product.Since the MODIS data have a 3-hourly time resolution, the data start from UTC 0:00, we used it to validate the SSR results at UTC 06:00.At UTC 08:00 and UTC 10:00, the SSR from Himawari-8 was applied for validation.In addition, we compared the correlation between Himawari-8 and MODIS surface shortwave radiation at UTC 06:00, as shown in Additional file 1: Figure S3.The CORR was reached 0.89, indicating a high correlation from these two kinds of data.As can be seen in Fig. 3(d, e), the MBE, RMSE, and CORR were all improved at the three moments after assimilation, the simulation and observational results sharing good consistency, especially at UTC 06:00, where the MBE was reduced from 194.86 W/m 2 to 51.45 W/m 2 , indicating that the improvement in SSR after cloud assimilation was most obvious for intense solar radiation.
Due to the absorption and scattering effects of clouds on shortwave radiation, the SSR is reduced under cloud property.The water vapor can also reduce the SSR through absorbing shortwave radiation.Therefore, clouds water and water vapor are important factors affecting SSR.Additional file 1: Figures S1 and S2 show the vertical profile of meridional mean cloud water mixing ratio and water vapor mixing ratio, as well as the increments after the assimilation, respectively.It can be seen from the figures that both the cloud water and water vapor increased after the assimilation in most area compared to before assimilation.This may explain the reason of assimilation improved the overestimation of surface shortwave radiation.As the spatial resolution of CERES data was one degree, therefore, CERES data were used to verify the meridional mean of surface shortwave radiation in Fig. 4. Figure 4 shows the meridional mean of the SSR for 2020.8.15 UTC 06:00, 08:00, and 10:00.In this section, The Clouds and the Earth's Radiant Energy System (CERES) 1 degree resolution hourly surface shortwave radiation data was used for validation.(https:// cerestool.larc.nasa.gov/ ord-tool/ jsp/ SYN1d egEd4 1Sele ction.jsp).Before assimilation, the overestimation of the SSR was most significant at 98ºE at UTC 06:00 and 100ºE at UTC 08:00, indicating that the simulation effect worsened with greater cloud extinction.After assimilation, the SSR corresponded well with the satellite observation data, although the assimilation effect was weak at 94ºE and 104ºE, which may be because the model did not simulate the cloud water mixing ratio at the boundary region where the cloud existed, and the disturbance error was zero in the cloudless region that could not be assimilated in the 4D-LETKF model.This difficulty could be solved by enlarging the scope of the simulation region and nesting in the future.In general, the assimilation effect at UTC 06:00 and UTC 08:00 was better than that at UTC 10:00.This is because the weaker SSR at UTC 10:00 made the extinction effect of thick clouds act like that of thin clouds.In general, the assimilation effect is more obvious with intense solar radiation and stronger cloud extinction.
To better evaluate the assimilation effect over a longer period of time, the hourly results from a 12-h run for 2020.08.15 UTC 0:00-11:00 were chosen for analysis, the daily mean results of the COT and SSR being provided in Fig. 5.In Fig. 5, the color represents of (a)-(g) is similar to Figs. 2 and 3 on the same kind.Figure 5(a) shows that the COT was relatively large in the north, but especially in the southeast of the simulated region where the value was more than 80. Figure 5(b) gives the results from before assimilation.Here, the cloudy area is similar to what was observed, while the range and intensity of the COT in the north and southeast was underestimated.This indicates that the cloud simulation was relatively weak throughout the day.Figure 5(c) shows that the 4D-LETKF scheme effectively ameliorated the underestimation, with both the area of cloud and the intensity of the COT being stronger than before assimilation, and the cloudy region with COT values greater than 40 being more consistent with the observed data, which directly affected the extinction effect and the SSR at the surface.In Fig. 5(d-f ), the overall overestimation of the SSR in the simulated region can be seen to have been eliminated.Figure 5(i, j) shows that the SSR was overestimated in a range of greater than 450 W/m 2 before assimilation, this defect having been mitigated after the assimilation.Overall, the daily average results indicated that the simulation accuracy of the COT and SSR were effectively improved.The CORR of the COT increased by 34.7%, whereas the RMSE and MBE decreased by 6.3% and 22.5%, respectively.The CORR of the SSR increased by 11.3%, whereas the RMSE and MBE decreased by 28.5% and 58.9%, respectively.
The purpose of this assimilation technique is to provide better initial conditions for the model, thereby  improving the forecast result.Therefore, we conducted forecast and validation experiments from 2020.8.16 UTC 0:00 to 12:00, as shown in Fig. 6.The red line represents the CERES satellite hourly surface shortwave radiation observation data averaged over the simulation domain, the black line is a combination of 2-h forecasts result, the blue line is the forecast result with assimilation performed every two hours, i.e., each time after assimilation, a two-hour forecast is conducted.Since cumulus clouds have short lifecycles and change rapidly, we focused on the short-term forecast skill and obtained the temporal evolution of surface shortwave radiation over 12 h through cyclic assimilation and forecast.It can be seen from Additional file 1: Table S1 that the average overestimation of surface shortwave radiation was reduced from 34% to 19.9% through the cycling assimilation and forecast.As shown in the Fig. 6, after assimilation, the overestimation of surface shortwave radiation was effectively improved, which was possibly due to the effective improvement in the spatial distributions of cloud water and water vapor by assimilating these two hydrometeors that have a greater impact on surface shortwave radiation, thereby ameliorating the overestimation of surface shortwave radiation.

Summary and discussion
In this study, a new cloud assimilation system was constructed by coupling the 4D-LETKF into the new version of the WRF model.The covariance relationship between 2D COT and multiparameter 3D hydrometeors variables was established to obtain the best estimate of regional cloud cover.The ultimate purpose of this study is to improve the short-term forecast accuracy of SSR by assimilating cloud properties using the new assimilation technique.The approach to achieve this goal is to utilize high-resolution cloud optical thickness data to provide more accurate initial conditions of cloud water and water vapor in the model through assimilation, and thus improving the short-term forecast skill of surface shortwave radiation.There were three major innovations in this work.First, the use of high-resolution temporal and spatial data from the Himawari-8 next-generation geostationary satellite gave reliable accuracy, which made it possible to increase the time frequency of assimilation to the hourly level.Second, the assimilation of cloud water and water vapor mixing ratios which provided a more coordinated initial field.Third, the model framework utilized the 4D-LETKF scheme which was demonstrated as greatly enhancing the computational efficiency by Dai et al. (2019), making it possible to conduct high-frequency cyclic assimilation experiments.In this study, the single-time and daily average COT and SSR results verified the assimilation effect.The results showed that the phenomenon of the underestimation of COT and overestimation of SSR were significantly improved through applying the assimilation system, especially in the region of large COT and at the time with the strongest solar radiation.The daily mean CORR of the COT was increased by 34.7%, whereas the RMSE and MBE were decreased by 6.3% and 22.5%, respectively.The daily mean CORR of the SSR was increased by 11.3%, whereas the RMSE and MBE were decreased by 28.5% and 58.9%, respectively.The 2-h cyclic forecast results showed that the model successfully reduced the overestimation of SSR.As opposed to previous studies, we attempted to use new assimilation models, new satellite data, and direct assimilation methods to investigate the COT and SSR over the Tibetan Plateau, which has been little studied.However, the model forecast experiments is quite limited, to truly realize the long-term, high-precision assimilation of all-sky clouds, further research is needed, and this study is the first step of this research that requires further studies to achieve.Firstly, only the central region of the Tibetan Plateau was investigated as the simulation region, and not the whole plateau due to the consideration of selecting areas and time where clouds developed vigorously.In the future, we plan to extend the simulation area and test the applicability of this assimilation system in other regions.Secondly, the assimilation of ice particles was not considered.We only assimilated the water clouds; future research will be extended to assimilating multiple cloud particles.Besides, the sensitivity tests are planned to conduct to study the dependence of assimilation effects on different types of rainfall clouds and the setting of observation errors, to evaluate the persistence of assimilation effects on longer time scales, and the influence of different weather events on assimilation effects and to explore the influence of different microphysical schemes on assimilation effects, to provide a basis for the coupling of model and assimilation system.

Fig. 1 a
Fig. 1 a The Flowchart of the Ensemble Kalman Filter Cloud assimilation system coupled with WRF.b The schematic diagram between the simulated region and the Tibetan Plateau

Fig. 5
Fig. 5 Daily mean results for 2020.8.15. a COT from Himawari-8; b COT from first guess; c COT after assimilation; d SSR from Himawari-8; e SSR from first guess; f SSR after assimilation; g scatter plots of Himawari-8 and WRF COTs from first guess; h scatter plots of Himawari-8 and WRF COTs after assimilation; i scatter plots of Himawari-8 and WRF SSRs from first guess; and j scatter plots of Himawari-8 and WRF SSRs after assimilation