Skip to main content

Official Journal of the Asia Oceania Geosciences Society (AOGS)

Spatial comparison of inland water observations from CYGNSS, MODIS, Landsat, and commercial satellite imagery

Abstract

Accurate and timely inland waterbody extent and location data are foundational information to support a variety of hydrological applications and water resources management. Recently, the Cyclone Global Navigation Satellite System (CYGNSS) has emerged as a promising tool for delineating inland water due to distinct surface reflectivity characteristics over dry versus wet land which are observable by CYGNSS’s eight microsatellites with passive bistatic radars that acquire reflected L-band signals from the Global Positioning System (GPS) (i.e., signals of opportunity). This study conducts a baseline 1-km comparison of water masks for the contiguous United States between latitudes of 24°N-37°N for 2019 using three Earth observation systems: CYGNSS (i.e., our baseline water mask data), the Moderate Resolution Imaging Spectroradiometer (MODIS) (i.e., land water mask data), and the Landsat Global Surface Water product (i.e., Pekel data). Spatial performance of the 1-km comparison water mask was assessed using confusion matrix statistics and optical high-resolution commercial satellite imagery. When a mosaic of binary thresholds for 8 sub-basins for CYGNSS data were employed, confusion matrix statistics were improved such as up to a 34% increase in F1-score. Further, a performance metric of ratio of inland water to catchment area showed that inland water area estimates from CYGNSS, MODIS, and Landsat were within 2.3% of each other regardless of the sub-basin observed. Overall, this study provides valuable insight into the spatial similarities and discrepancies of inland water masks derived from optical (visible) versus radar (Global Navigation Satellite System Reflectometry, GNSS-R) based satellite Earth observations.

Introduction

Inland waterbodies (defined as lakes, rivers, streams, reservoirs, and wetlands for purposes of this study) play a critical role in terrestrial water storage and hydrological processes (Brönmark and Hansson 2002; Bullock and Acreman 2003). The extent and location of inland waterbodies are key inputs for hydrological models that inform water resources management for a variety of agricultural, industrial, climate applications, and algorithm development for soil moisture retrievals (Papa et al. 2010; Vörösmarty et al. 2022). Water masks convey this information by classifying inland areas as either water or non-water (land, vegetation, impervious surface, etc.). While water masks may be derived from fieldwork, drone observations, or aerial surveillance, these methods tend to be labor-intensive, time consuming, and difficult to replicate at frequent timescales for continuous monitoring.

Space-based Earth observations have emerged as a reasonable method for remotely generating inland water masks (Asadzadeh Jarihani et al. 2013; Palmer et al. 2015; Soman and Indu 2022). This has been demonstrated by two widely accepted global inland water mask products: (1) the Landsat (Pekel) water mask which aggregated 3 million optical Landsat images to categorize water occurrence from 1984 to 2020 at 30 m spatial resolution (Pekel et al. 2016) and (2) the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument onboard the NASA Terra and Aqua satellites through land cover classifications from 2001 to 02021 at 250 m spatial resolution (Sulla-Menashe et al. 2019). However, these water masks are constrained by their dependence on optical sensors which are impeded by cloud cover and limited temporal revisit intervals, such as one day for MODIS and over 10 days for Landsat, contingent on latitude. For example, King et al. (2013) estimates MODIS-observed cloud fraction over land to be ~ 55% (King et al. 2013). This is concerning because the maximum inland waterbody extent likely occurs during rainy/cloudy conditions. Furthermore, these water masks are only available at annual timescales because a year’s worth of data are required to obtain sufficient cloud-free observations at a global scale (Pekel et al. 2016; Sulla-Menashe et al. 2019).

Recently, the Cyclone Global Navigation Satellite System (CYGNSS) has proven to be a useful Earth observation system for delineating inland waterbodies. This constellation of microsatellites, developed by the University of Michigan and the Southwest Research Institute, was launched by the National Aeronautics and Space Administration (NASA) for the primary research objective of monitoring tropical cyclone intensification via constellations of eight microsatellites using passive bistatic radars to observe signals of opportunity from reflected Global Positioning System (GPS) L-band signals (Ruf et al. 2018; see Additional file 1: SI 4 for a summary table of CYGNSS).

Several approaches have been proposed for detecting inland waterbodies using CYGNSS via Global Navigation Satellite System Reflectometry (GNSS-R) using properties of coherent surface reflectivity which is greater over inland water than over land (Al-Khaldi et al. 2021; Gerlein-Safdi and Ruf 2019; Ruf et al. 2021). These methods include binary thresholding prediction (Al-Khaldi et al. 2021; Morris et al. 2019; Wan et al. 2019), forward modeling (Chew and Small 2020), random walker algorithms (Gerlein-Safdi and Ruf 2019; Wang et al. 2022), and machine learning (Ghasemigoudarzi et al. 2022).

Three relevant studies which compared a CYGNSS-derived water mask to either Landsat or MODIS products are described as follows. First, Gerlein-Safdi and Ruf (2019) used a random walker algorithm to delineate inland water based on the standard deviation of CYGNSS surface reflectivity data (Gerlein-Safdi and Ruf 2019). It performed well when compared with MODIS-derived water masks and handdrawn water masks for select regions (Gerlein-Safdi and Ruf 2019). A need was identified to develop and validate a reliable long-term CYGNSS-based water mask, such as the annual map demonstrated in this study, to serve as a basemap which CYGNSS data could then be used to identify anomalous variations in inland waterbody extent at sub-annual temporal scales (Gerlein-Safdi and Ruf 2019). Second, Al-Khaldi et al. (2021) used a method of binary signal-to-noise ratio (SNR) thresholding to delineate inland waterbodies within the maximum CYGNSS spatial coverage (Al-Khaldi et al. 2021). A comparison was conducted with the Landsat (Pekel) water mask and regional uncertainties were identified from relying on a single SNR threshold at a global scale, such as missing waterbodies which were obstructed by vegetation (Al-Khaldi et al. 2021). Third, Wang et al. (2022) used a similar method as Gerlein-Safdi and Ruf (2019) by using a random walker algorithm to delineate inland water based on the power ratio of CYGNSS data (Wang et al. 2022). The accuracy of the method was high when compared with Landsat-derived water masks for the Congo Basin and Amazon Basin.

Currently, there is no standard method (i.e., a procedure which is widely used and accepted) for generating a CYGNSS-based inland water mask as shown through previous studies by the variety of methods such as binary thresholding prediction, forward modeling, random walker algorithms, and machine learning. The novelty of this study is to provide a foundational comparison of CYGNSS-based water masks to Landsat and MODIS water masks to improve understanding of the spatial agreement and disagreement of these products, specifically by employing a mosaic of surface reflectivity SNR thresholds at the sub-basin level. This is a necessary step toward achieving a standardized CYGNSS water mask.

This study aims to improve understanding of the spatial extent by which three water masks independently derived from Landsat, MODIS, and CYGNSS agree or disagree on the extent and location of inland water. Specifically, the main research goals of this study are to:

  1. 1.

    Derive a 1-km comparison water mask for Landsat, MODIS, and CYGNSS data for 2019 over the contiguous United States between latitudes of 24°N–37°N.

  2. 2.

    Compare the regional performance of CYGNSS, MODIS, and Landsat water masks at a watershed level via quantifiable statistics derived from confusion matrices.

  3. 3.

    Assess the 1-km comparison water mask performance using high-resolution optical commercial satellite imagery collected in 2019 for diverse locations within the study area.

Results from this study will serve as a foundational reference for future studies by improving understanding of the relative utility and robustness of CYGNSS, MODIS, and Landsat-based inland water classifications. This contributes valuable insight into the spatial and regional strengths and limitations of each observation system which is important to understand prior to applying these data to real-world hydrological applications.

Data and methods

Three satellite-based Earth observation systems were used to derive a single comparison water mask: (1) CYGNSS, (2) MODIS, and (3) Landsat. Each product was pre-processed, as described in the corresponding  “Cyclone global navigation satellite system (CYGNSS) data”—“Landsat (Pekel) data” Sections, to derive a bivariate water mask where each pixel was either classified as inland water or non-inland water. The bivariate water mask was re-gridded to a common spatial resolution and projection of the 1-km National Snow and Ice Data Center (NSIDC) Equal-Area Scalable Earth (EASE) Grid 2.0 using nearest neighbor interpolation which was selected because it maintains the original data values (0 and 1) and performs well for categorical data (Brodzik et al. 2012). Due to uncertainty in inland water classification of coastal areas, data collected within 25-km of coastlines from all three observation systems were excluded for the analyses. A comparison of the three bivariate water masks was then performed to generate a single comparison water mask where each pixel is (1) classified as inland water or non-water, and (2) indicates which of the three observation systems classified it as such (i.e., Landsat, MODIS, or CYGNSS; see “Confusion matrices and related statistics” Section for further details). Examples of high-resolution satellite imagery were overlayed on the comparison water mask to investigate performance. A flowchart of this methodology is provided in Supporting Information 1 (Additional file 1: SI 1).

The study area was defined as the contiguous United States between approximate latitudes of 24°N to 37°N and was determined by the spatial coverage of CYGNSS (Fig. 1A). A singular annual timestep of 2019 CYGNSS data was used to match the annual temporal resolution of the MODIS and Landsat water mask products. This data was made available in the pre-released CYGNSS v3.2 ocean/land merged L1 data which are publicly available upon request to the CYGNSS Science Team.

Fig. 1
figure 1

CYGNSS observations and bivariate water mask products for the contiguous United States between latitudes of approximately 24–37°N for 2019. A Spatial plot of CYGNSS 1-km 50th percentile surface reflectivity signal-to-noise-ratio (SNR) data. B CYGNSS bivariate water mask at 1-km spatial resolution derived from basin-specific binary thresholding of SNR values. C MODIS bivariate water mask at 1-km spatial resolution derived from the Land Water Mask (MCD12Q1) for 2019. D Landsat bivariate water mask at 1-km spatial resolution derived from the Landsat Global Surface Water product (commonly referred to as the Pekel water mask) for 2019

Cyclone Global Navigation Satellite System (CYGNSS) data

CYGNSS, an eight-microsatellite constellation, uses a passive bistatic radar to observe reflected GPS signals within L-band frequencies to obtain a reduced revisit time of 2.8 (median) and 7.2 (mean) hours per day between observations with a spatial coverage of ± 38° (Ruf et al. 2018). For this study, the area of interest was covered by 348 days (95% of the year) of usable CYGNSS data in 2019 which were obtained from 8 microsatellites, each equipped with 4 delay Doppler maps.

A sensitivity analysis was conducted to determine that the 50th percentile of CYGNSS observations was optimal by maximizing the F1-score relative to a reference dataset (see Additional file 1 for further details). Additionally, sub-basins of the study area were individually considered to determine the optimal surface reflectivity signal-to-noise ratio (SNR) threshold within a given sub-basin which maximized F1-score, a balance between precision and recall, relative to the reference dataset. This is useful when there is an imbalance of classes within the dataset, such as many land pixels and few water pixels across the total study area, because it takes both false positive and false negative errors into account. Small changes in the threshold window change both precision and recall, resulting in either an increased or decreased F1-score. A higher F1-score indicates a better balance between precision and recall, meaning the predicted water mask makes fewer false positive and false negative classifications. However, a high F1-score may also indicate low confidence in the reference dataset as there is high disagreement between the products. For further details on the sensitivity analysis and a table summarizing SNR thresholds used for each sub-basin, please see Additional file 1.

To derive the CYGNSS 1-km water mask, the 50th percentile CYGNSS SNR data were resampled to a 1-km NSIDC EASE Grid 2.0 and then classified as either inland water or non-inland water using the SNR thresholds (Additional file 1: SI 7). SNR values within the threshold were classified as inland water whereas SNR values greater than or less than the threshold values were classified as non-inland water.

Additional information and data from CYGNSS can be accessed here in 2024: https://podaac.jpl.nasa.gov/CYGNSS.

Moderate resolution imaging spectroradiometer (MODIS) data

MODIS is a sensor onboard NASA’s Terra and Aqua satellites in sun-synchronous polar orbit which captures 36 spectral bands ranging from the visible (0.4 μm) to thermal infrared (14.4 μm) regions of the electromagnetic spectrum, enabling them to image the Earth’s surface every 1 to 2 days (Sulla-Menashe et al. 2019). In this study, the 250 m Land Water Mask (MCD12Q1) for 2019 was used. This product defines inland water as follows using the International Geosphere-Biosphere Program (IGBP) classification scheme: 1) permanent wetlands (30–60% water cover and greater than 10% vegetation cover); 2) permanent snow and ice (at least 60% of the area covered by snow and ice for at least 10 months of the year); and 3) waterbodies (at least 60% of the area covered by permanent water (Friedl et al. 2010). To derive the MODIS 1-km water mask, the Land Water Mask was resampled to a 1-km EASE Grid 2.0. All pixels classified as inland water using the IGBP classification scheme were classified as inland water in the MODIS 1-km water mask. Otherwise, the pixels were classified as non-inland water.

Additional information and data from MODIS can be accessed here in 2024: https://lpdaac.usgs.gov/products/mcd12q1v006/.

Landsat (Pekel) data

The Landsat Global Surface Water product (commonly referred to as the Pekel water mask) shows surface water occurrence since 1984 at a spatial resolution of 30 m using three million archival image scenes from the Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper-plus (ETM +), and Landsat 8 Operational Land Imager (OLI) which have an over 10-day temporal resolution (Pekel et al. 2016). In this study, the Pekel Seasonality Map was used which classifies both permanent water (areas inundated for 12 months) and seasonal water (areas inundated for less than 12 months), so long as the inland water is open to the sky, larger than 30 m, and unobstructed by vegetation (Sulla-Menashe et al. 2019). To derive the Landsat 1-km water mask, the Pekel Seasonality Map was resampled to a 1-km EASE Grid 2.0 and pixels which were classified as inland water by the Pekel Seasonality Map were also classified as inland water in the Landsat 1-km water mask. Otherwise, the pixels were classified as non-inland water.

Additional information and data from the Landsat Global Surface Water product can be accessed here in 2024: https://global-surface-water.appspot.com/#data.

Confusion matrices and related statistics

To quantifiably compare the three bivariate water masks, confusion matrices and related statistics were used by defining the CYGNSS water mask as the predicted water mask and the other products (either Landsat, MODIS, or a combination of Landsat and MODIS) as the reference water mask. Additionally, Landsat and MODIS were directly compared by assuming each as the reference and predicted water masks. For each comparison, confusion matrix values were calculated for the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The following statistics were then calculated for each sub-basin and results were visualized as heatmaps: precision (P, Eq. 1), recall (R, Eq. 2), specificity (SP, Eq. 3), miss rate (M, Eq. 4), false detection rate (FDR, Eq. 5), F1-score (F1, Eq. 6), and accuracy (A, Eq. 7):

$${\text{P}}=\frac{TP}{TP+FP},$$
(1)
$${\text{R}}=\frac{TP}{TP+FN},$$
(2)
$${\text{SP}}=\frac{TN}{TN+FP},$$
(3)
$${\text{M}}=\frac{FN}{FN+TN},$$
(4)
$${\text{FDR}}=\frac{FP}{FP+TP},$$
(5)
$${\text{F}}1=2\left(\frac{\frac{TP}{TP+FP}*\frac{TP}{TP+FN}}{\frac{TP}{TP+FP}+\frac{TP}{TP+FN}}\right)=2\left(\frac{P*R}{P+R}\right),$$
(6)
$${\text{A}}=\frac{TP+TN}{TP+TN+FP+FN}.$$
(7)

High-resolution optical commercial satellite data

Commercial imagery obtained from Planet Labs, Inc. and DigitalGlobe (a subsidiary of Maxar Technologies) were used for visual assessment of the comparison water mask for select locations. Multispectral observations collected by Planet Labs were obtained from the Dove R and Dove Classic satellite constellations (spatial resolution of approximately 3 m and temporal revisit period of 1-day). Multispectral observations collected by DigitalGlobe were obtained from GeoEye-1 (spatial resolution of approximately 1.65 m and temporal revisit period of 1–3 days) and WorldView-3 (spatial resolution of 0.31–30 m and temporal revisit period of 1–4.5 days).

Additional information of the commercial satellite imagery can be accessed here in 2024: https://www.planet.com/ and https://www.maxar.com/.

Results and discussion

CYGNSS observations and bivariate water masks

In Fig. 1A, the 50th percentile CYGNSS surface reflectivity SNR values for 2019 were spatially plotted and varied between 138 and 223 dB with a mean of 150 dB, a median of 149 dB, and a standard deviation of 6.65 dB. A CYGNSS bivariate water mask (Fig. 1B) was derived using the sub-basin SNR thresholds defined in Sect. “Cyclone global navigation satellite system (CYGNSS) data”. The bivariate water masks for MODIS and Landsat are, respectively, shown in Fig. 1C and D. Overall, MODIS classified the smallest percentage of the study area as inland water at just 1.1%. CYGNSS classified 1.3% of the study area as inland water. Lastly, Landsat classified the greatest percentage of the study area as inland water at 1.4%. To improve understanding of these differences in inland water area estimates, an evaluation indicator of the ratio of inland water to catchment area is provided for sub-basins within the study area. This reveals the spatial variabilities of the CYGNSS, MODIS, and Landsat water masks which had the highest disagreement in regions of wetlands and branching inland waterways (see Sect. “Confusion matrix statistics of bivariate water masks” for further details). Additionally, the number of daily 1-km pixel observations by CYGNSS, MODIS, and Landsat were calculated for 2019 within the study area (Additional file 1: SI 8). Fewer MODIS and Landsat observations occurred from May to October, likely due to seasonal precipitation and cloud cover. CYGNSS daily observation counts were relatively consistent throughout the year due to its usage of GNSS-R.

Comparison water mask

The three bivariate water masks from CYGNSS, MODIS, and Landsat were compared at a pixel-by-pixel level to generate a 1-km comparison water mask. A given pixel was classified as non-inland water in the comparison water mask only if all three bivariate water masks concurred that the pixel was land. If one or more of the bivariate water masks classified a given pixel as water, the comparison water mask designated the pixel as inland water and the observation system(s) that classified it as such were indicated: Landsat only, MODIS only, CYGNSS only, MODIS and Landsat, CYGNSS and MODIS, CYGNSS and Landsat, or all three systems (CYGNSS, MODIS, and Landsat). Because the purpose of the comparison water mask is to spatially investigate the extent by which the three water mask products agree or disagree on the classification of inland water, it is important to note that the comparison water mask is not intended to serve as a stand-alone water mask itself.

As shown in the comparison water mask (Fig. 2A), the observation systems collectively classified 2.3% of the study area as inland water. Of the pixels classified as inland water, 14.2% were classified by all three observation systems. 30.6% were classified by at least two observation systems: 18.2% by Landsat and MODIS, 6.5% by Landsat and CYGNSS, and 5.9% by CYGNSS and MODIS. The remaining 55% of inland water pixels were classified by only one system: 29% by CYGNSS, 19.4% by Landsat, and 6.8% by MODIS. While this indicates a relatively high level of false positives and thus disagreement between the three inland water masks, it is important to note that the disagreements vary spatially across the study area. This is further discussed in “Confusion matrix statistics of bivariate water masks” Section.

Fig. 2
figure 2

Comparison Water Mask derived from CYGNSS, Landsat, and MODIS for the contiguous United States between latitudes of approximately 24–37°N for 2019. A 1-km Comparison Water Mask across the study area for 2019 derived from the CYGNSS, Landsat, and MODIS bivariate water masks. B The 1-km Comparison Water Mask subdivided into USGS Hydrological Unit Code-02 Watersheds. The watersheds are referred to as the South Atlantic Gulf basin (B03), the Tennessee basin (B06), the Lower Mississippi basin (B08), the Arkansas-White River basin (B11), the Texas-Gulf basin (B12), the Rio Grande basin (B13), the Lower Colorado (B15), and the California basin (B18)

In the comparison water mask, red pixels indicate locations where CYGNSS did not classify inland water which both Landsat and MODIS agreed were water (18.2%). High concentrations of red pixels are observed along outlets of waterways to the Gulf of Mexico and Atlantic Ocean and may be explained by the lower dielectric constant of brackish water than freshwater due to its salt content (Lang et al. 2016). Additionally, red pixels are in the middle of expansive lakes, such as Lake Okeechobee in Florida, even though CYGNSS tends to correctly classify the boundary between the lake and land (i.e., the perimeter of the lake). Green pixels indicate locations where CYGNSS and Landsat classified inland water while MODIS did not (6.5%). These instances are primarily concentrated within the Mississippi River basin. Orange pixels indicate locations where CYGNSS and MODIS classified inland water while Landsat did not (5.9%). The highest concentration of orange pixels can be found in wetland regions of southern Louisiana and Florida.

Pixels classified as inland water by only CYGNSS are represented as dark purple (29.0%) and are scattered across the study region but are particularly prevalent in the Mississippi River basin and the Western USA. The abundance of false positive classifications in dry and densely vegetated areas may be due to the high SNR of CYGNSS in these regions. Dry soil or sand could have erroneously high SNR because the individual grains reflect a significant amount of GPS signals due to their rough, irregular surfaces. Additionally, the spaces between the grains allow for L-band signals to penetrate and reflect off the underlying surface, which further contributes to the coherent scatterings. In densely vegetated areas, water on the canopy can reflect signals and increase the SNR, which may not be suitable for detecting waterbodies using the CYGNSS SNR threshold set in the present study. As a result, it may be necessary to consider alternative approaches to detect waterbodies in these areas in future studies. To address this issue, other proxies such as soil moisture or vegetation indices from other microwave satellite systems may be used to more accurately classify these pixels as waterbodies assuming that independence from Landsat and MODIS land cover products is not required. By incorporating thresholds for these values with CYGNSS SNR, dry and densely vegetated areas can be effectively masked out in the future.

Instances of inland water classification solely by MODIS are represented in yellow (6.8%) and are primarily concentrated in the wetlands of Louisiana and southern Florida. Lastly, occurrence of Landsat only inland water classifications is depicted in pink (19.4%) and tend to be located within networks of branching waterways. In these instances, CYGNSS frequently captured portions of the waterways but was discontinuous, which lead to classifications made only by Landsat.

Confusion matrix statistics of bivariate water masks

To further investigate region-specific variability in the comparison water mask performance, the study area was subdivided into 8 smaller regions via the United States Geological Survey (USGS) Hydrologic Unit Code-02 (HUC-02) watershed boundaries (Fig. 2B). For each region, the results of the confusion matrix statistics (Eqs. 1, 2, 3, 4, 5, 6, 7) were visualized as heatmaps which are shown in Fig. 3. F1-score was determined to the be most applicable to this study (Fig. 3G). A high F1-score indicates a high level of agreement between the predicted and reference water mask. A low F1-score may indicate low confidence in the reference dataset as there is a high disagreement between the products. The remaining confusion matrix statistics (R, P, SP, M, FDR, and A) are discussed in detail in Additional file 1.

Fig. 3
figure 3

CYGNSS, Landsat, and MODIS bivariate water mask confusion matrix statistics visualized as heat maps for A Recall (R), B Precision (P), C Specificity (SP), D Miss Rate (M), E False Detection Rate (FDR), F Accuracy, and G F1-score (F1). For each, the total study area (AOI) or sub-basin of interest (USGS HUC-02) is indicated. The assumed reference water mask is indicated as either Landsat or MODIS. The predicted water mask is indicated as either CYGNSS, MODIS, Landsat, a combination of MODIS and CYGNSS, or a combination of Landsat and CYGNSS

A higher F1-score was generally obtained when the SNR threshold was tailored for each basin as opposed to using a singular threshold for the entire study area. This indicates high variability in inland water surface reflectivity due to geographical differences such as vegetation and topography, meaning that a single threshold to define inland water over a large study area introduces bias. The lowest F1-scores were observed in the Rio Grande Basin (B13), the Lower Colorado basin (B15), and the California basin (B18), indicating a high level of disagreement between all three datasets. CYGNSS obtained the highest F1-scores in B03, B06, and B08 which had up to a 34% increase in F1-score compared to the total study area (Fig. 3G). In most scenarios, the F1-score was improved when CYGNSS was combined with either Landsat or MODIS.

Lastly, a ratio of the inland water area to catchment area was calculated for each sub-basin (Additional file 1: SI 10). These results demonstrated that the ratio of inland water to sub-basin area varies across the study area with the highest ratio in the Mississippi River Basin (B08) and the lowest ratio in the Rio Grande Basin (B13). MODIS tended to estimate the lowest ratio, however, all three datasets had comparable ratios within 2.3% of each other regardless of the sub-basin observed.

Comparison water mask with high-resolution commercial satellite imagery

To assess our confidence in the water mask product, the comparison water mask was overlayed with high-resolution commercial satellite imagery for select locations including manmade reservoirs, natural lakes, wetlands, and rivers. A qualitative assessment of the Comparison Water Mask compared to the commercial satellite imagery is shown in Fig. 4. A quantitative assessment of the percentage of commercial image scene classified by each observation system is provided in Additional file 1: SI 12.

Fig. 4
figure 4

Comparison water mask overlayed onto high-resolution commercial satellite imagery from Planet Labs, Inc. and DigitalGlobe for select locations: A Salton Sea, CA, B Lake Maurepas and Pontchartrain, LA, C Lake Hartwell, GA/SC, D Tennessee River, TN, E Lake Kissimmee, FL, F Sam Rayburn Reservoir, TX. G Reference locations for the commercial images are provided on the comparison water mask. Hand drawn water masks are displayed in light blue for visual purposes. For additional information on the high-resolution imagery, including date of acquisition and image identification number(s), see Additional file 1: SI 11

The highest disagreement between classification systems was often observed along the shorelines, particularly since CYGNSS tended to estimate a wider lake extent which encompassed surrounding vegetated areas. Additionally, CYGNSS did not continuously classify large lakes, which may be explained by salinity [such as for the Salton Sea (Fig. 4A) and Lakes Maurepas and Pontchartrain (Fig. 4B)] and/or the expansive nature (such as for the freshwater Lake Kissimmee (Fig. 4E), which concurs with the results of Al-Khaldi et al. 2021).

In Fig. 4A and B, MODIS and CYGNSS concurred on classifications of wetlands, exposed salt deposits, and sediment along lakebeds, whereas Landsat did not. This is likely due to violation of the conditions required to classify inland water by the Landsat product, such as the area must be open to the sky, larger than 30 m, and unobstructed by vegetation. Lake Hartwell (Fig. 4C) and the Tennessee River (Fig. 4D) demonstrate two examples where there were no instances of only MODIS and Landsat classification of a pixel as water (indicated by red pixels). For these locations, a noticeable enhancement in the waterbody continuity was observed when CYGNSS is combined with both Landsat and MODIS.

Future research and limitations

As the utility of CYGNSS data for land-based applications is increasingly realized, it is important to understand its relative strengths and limitations compared to existing Earth observation systems. The comparison water mask reveals various levels of agreement/disagreement across the study area between the observation products of CYGNSS, Landsat, and MODIS. Caution should be exercised when applying these data across varying geographic regions for inland waterbody identification. Additionally, the optical high-resolution commercial satellite imagery revealed numerous instances of FP occurrence over waterbodies. Thus, FP indicated disagreements between the systems should not be dismissed as land but rather as indications of disagreement. Further, the CYGNSS water mask resulted in discontinuous waterways when portions of the waterbody had SNR values outside of the basin’s defined threshold. Gap-filling, random walkers, or other algorithms may be appropriate methods to improve continuity. For example, data collected within 25-km of coastlines were excluded from this study due to the uncertainty of classifying this interface using the MODIS, Landsat, and CYGNSS products. In combination with other data products, fuzzy logic could be used to decrease the uncertainty of CYGNSS-based inland water classification of coastal regions (Demir et al. 2016). Lastly, the study was limited by the temporal confinement of a single annual timestep (2019). Future research should explore the capability of CYGNSS to reliably observe sub-annual inland waterbody dynamics, which is challenging to observe using MODIS and Landsat due to their reliance on cloud-free observations.

In addition to inland waterbody delineation (Al-Khaldi et al. 2021; Gerlein-Safdi and Ruf 2019; Ghasemigoudarzi et al. 2022; Loria et al. 2020; Ruf et al. 2021), CYGNSS has proven to be a useful observation system for other land-based applications including but not limited to soil moisture retrievals (Kim and Lakshmi 2018), enhancement of soil moisture estimates from land surface models through data assimilation (Kim et al. 2021), flooding (Chew et al. 2018; Ghasemigoudarzi et al. 2020; Rajabi et al. 2020; Wan et al. 2019), lake height estimates (Li et al. 2018), and wetland dynamics (Downs et al. 2021; Morris et al. 2019). Temporally and spatially accurate inland waterbody mapping using CYGNSS will support future research in these CYGNSS land-based application areas as well.

Conclusions

A 1-km CYGNSS-based bivariate water mask was compared with two widely accepted Earth observation water masks derived from MODIS and Landsat for 2019 over the contiguous United States between latitudes of approximately 24–38°N. A mosaic of binary thresholds using sub-basins defined by the USGS HUC-02 codes was used to classify inland water with CYGNSS SNR values. This approach accounted for the varying thresholds required in different regions, such as the dry areas of the Midwest USA versus the wet Southeast USA, and performed better than a singular binary threshold for the entire study area. This approach of using a mosaic of binary thresholds increased F1-score up to 34% for sub-basins within the study area. Confusion matrices and related statistics revealed that the performance of the comparison water mask varied regionally, with particularly high disagreements along the Lower Mississippi basin (B08), brackish or saltwater regions, extensive lakes, and wetlands. Additionally, the performance metric of ratio of inland water to catchment area revealed that CYGNSS, MODIS, and Landsat were within 2.3% of each other regardless of the sub-basin observed.

To assess performance of the comparison water mask, high-resolution optical satellite imagery from commercial companies (Planet and DigitalGlobe) were used. This improved understanding of each water mask product’s performance over natural lakes, manmade reservoirs, wetlands, and rivers. In multiple instances, CYGNSS successfully identified inland water which both MODIS and Landsat failed to classify.

Overall, this study contributes a valuable foundational comparison of CYGNSS versus optical-sensor-based inland water masks. It provides a straightforward method for spatially comparing water masks derived from Earth observations, particularly in conjunction with optical high-resolution commercial satellite imagery. This work can guide future exploration of algorithms and data processing techniques to continually improve the performance of inland waterbody delineation using CYGNSS to support water resources management and a variety of hydrological applications.

Availability of data and materials

Hydrology data used in this study are publicly accessible through NASA, the CYGNSS Science Team, and Copernicus. The high-resolution commercial satellite imagery were accessed via the NASA Commercial Satellite Data Acquisition Program (CSDA) and the Cooperative Research and Development Agreement (CRADA) between the National Geospatial-Intelligence Agency and the University of Virginia.

Abbreviations

A:

Accuracy, confusion matrix statistic

CYGNSS:

Cyclone Global Navigation Satellite System

F1:

F1-score, confusion matrix statistic

FDR:

False detection rate, confusion matrix statistic

GNSS-R:

Global Navigation Satellite System Reflectometry

IGBP:

International Geosphere-Biosphere Program

M:

Miss rate, confusion matrix statistic

MODIS:

Moderate Resolution Imaging Spectroradiometer

NASA:

National Aeronautics and Space Administration

NSIDC EASE-Grid:

National Snow and Ice Data Center Equal-Area Scalable Earth Grids

P:

Precision, confusion matrix statistic

R:

Recall, confusion matrix statistic

SNR:

Signal-to-noise ratio

USGS HUC-02 Code:

United States Geological Survey Hydrologic Unit Code

References

Download references

Acknowledgements

We sincerely thank the CYGNSS Science Team for their support by providing the pre-released v3.2 CYGNSS ocean/land merged L1 data for 2019 with us. Daily commercial imagery from Planet Labs, Inc., were accessed via the NASA Commercial Satellite Data Acquisition program (https://www.earthdata.nasa.gov/esds/csda). DigitalGlobe imagery was accessed through a Cooperative Research and Development Agreement (CRADA) between the National Geospatial-Intelligence Agency (NGA) and the University of Virginia via the NextView License. This paper is based on work supported by the National Science Foundation Graduate Research Fellowship Program (NSF GRFP) under Grant No. 182490 and the National Science Foundation Research Traineeship (NSF NRT) program under Grant No. 1829004. Any opinions, findings, conclusions, or recommendations expressed in this work are those of the author(s) and do not necessarily reflect the view of the University of Virginia or the National Science Foundation.

Funding

This paper is based on work supported by the National Science Foundation Graduate Research Fellowship Program (NSF GRFP) under Grant No. 182490 and the National Science Foundation Research Traineeship (NSF NRT) program under Grant No. 1829004.

Author information

Authors and Affiliations

Authors

Contributions

GKP and HK designed the research and discussed the results; BF assisted with pre-processing of data for analysis; GKP conducted the analysis and wrote the manuscript; VL provided feedback throughout. All authors contributed to and approved the final manuscript.

Corresponding author

Correspondence to G. K. Pavur.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

SI 11 Summary Table of Commercial High-Resolution Satellite Imagery Information for the commercial satellite data used in this study is provided in the table below. This includes the name of the location, the commercial company provider, the date of acquisition of the image scenes, and the image identification number(s). For additional details of the commercial data, please consult the meta-data associated with the image identification number(s) via the respective commercial data provider.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pavur, G.K., Kim, H., Fang, B. et al. Spatial comparison of inland water observations from CYGNSS, MODIS, Landsat, and commercial satellite imagery. Geosci. Lett. 11, 12 (2024). https://doi.org/10.1186/s40562-024-00321-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40562-024-00321-1

Keywords