- Review
- Open Access
Copula–entropy theory for multivariate stochastic modeling in water engineering
- Vijay P. Singh†^{1, 2}Email author and
- Lan Zhang†^{1}
https://doi.org/10.1186/s40562-018-0105-z
© The Author(s) 2018
- Received: 7 September 2017
- Accepted: 5 February 2018
- Published: 24 February 2018
Abstract
The copula–entropy theory combines the entropy theory and the copula theory. The entropy theory has been extensively applied to derive the most probable univariate distribution subject to specified constraints by applying the principle of maximum entropy. With the flexibility to model nonlinear dependence structure, parametric copulas (e.g., Archimedean, extreme value, meta-elliptical, etc.) have been applied to multivariate modeling in water engineering. This study evaluates the copula–entropy theory using a sample dataset with known population information and a flood dataset from the experimental watershed at the Walnut Gulch, Arizona. The study finds the following: (1) both univariate and joint distributions can be derived using the entropy theory. (2) The parametric copula fits the true copula better using empirical marginals than using fitted parametric/entropy-based marginals. This suggests that marginals and copula may be identified separately in which the copula is investigated with empirical marginals. (3) For a given set of constraints, the most entropic canonical copula (MECC) is unique and independent of the marginals. This allows the universal solution for the proposed analysis. (4) The MECC successfully models the joint distribution of bivariate random variables. (5) Using the “AND” case return period analysis as an example, the derived MECC captures the change of return period resulting from different marginals.
Keywords
- Copula theory
- Entropy theory
- Multivariate stochastic modeling
- Probability density function
- Most entropic canonical copula
- Return period
Introduction
A multitude of processes in water engineering involve more than one random variable. For example, floods are characterized by peak, duration, volume, and inter-arrival time, which are all random in nature. Droughts are described by their severity, duration, inter-arrival time, and areal extent, which are also random. Extreme precipitation events are represented by their intensity, amount, duration, and inter-arrival time, which are all random. Inter-basin water transfer involves transfer of excess water from one basin (say, donor) to a water deficient basin (say, recipient). The transfer involves the volume of water, availability of water in both donor and recipient basins, duration of transfer, rate of transfer, and time interval between water transfers which are all random variables. Water quality entails pollutant load, duration for which the load is higher than the protection limits, and peak pollutant concentration, which are all random variables. Likewise, erosion in a basin may be characterized by sediment yield, number of erosion events, duration of events, intensity of events, and time interval between two consecutive events. These are all random variables. Flooding in a coastal watershed may be caused by the simultaneous occurrence of high precipitation and high tides where both precipitation and tide are random variables. Examples of processes involving more than one random variable abound in hydrologic, hydraulic, environmental, and water resources engineering. There usually exists some degree of dependence among the random variables or at least among some of the variables. Often we are concerned with multivariate stochastic modeling and risk analysis of the systems and processes that involve the derivation of probability distributions of the random variables considering the dependence structure among them. Nowadays, these stochastic processes can be modeled with the copula–entropy theory that has proven to be more flexible and accurate than the traditional approaches. The objective of this paper therefore is to reflect on some recent advances made in the application of the copula–entropy theory and future challenges.
Methods
Copula–entropy theory
The copula–entropy theory (CET) is an amalgam of the copula theory and the entropy theory. These two theories are now discussed.
Entropy theory
The principle of maximum entropy (POME), propounded by Jaynes (1957), states that of all the distributions that satisfy the given constraints, the distribution yielding the maximum entropy is the least-biased distribution and should hence be preferred. If there are no constraints then POME says that the resulting distribution would be a uniform distribution, which is consistent with the Laplacian principle of insufficient reason.
The theorem of concentration states that POME yields the best constrained probability distribution and is the preferred method for inferring this distribution, and this distribution best represents our state of knowledge about the behavior of the system. This is a consequence of Shannon’s inequality and the relation between entropy and Chi square statistic.
Copula theory
The foundation of the copula theory is the Sklar theorem (Sklar 1959). The theorem states that the joint (multivariate) probability distribution of two or more random variables is a function of the probability distributions of individual variables (also referred to as marginal distributions which are one-dimensional). In other words, the multivariate distribution is coupled to its marginal distributions. It is implied that these random variables are not independent of each other. The copula theory does not specify the way to derive the marginal distributions and does not lead to a unique copula. There are different ways to construct copulas and different ways to select the best copula.
Methodology for application of copula–entropy theory
The copula–entropy theory can be applied in different ways: (1) the marginal distributions are derived using the entropy theory and the joint distribution using the copula theory (e.g., Hao and Singh 2012; Zhang and Singh 2012). Since there can be more than one joint distribution fitted to the multivariate random variables, the best distribution is then selected from either visual goodness-of-fit plot (e.g. Q–Q plot) or formal goodness-of-fit test statistics (Genest et al. 2009). (2) With the marginal distributions derived using the entropy theory, the best copula is selected as the copula function yielding the maximum entropy. (3) Both marginal and joint distributions are derived using the entropy theory (e.g., Chu 2011; Chen et al. 2013; Aghakouchak 2014). The methodology for application of the copula–entropy theory will depend on the way it is applied. Each of the three ways is now outlined. First, the methodology for application of the entropy theory is outlined, since entropy is needed in all three ways.
Methodology for application of entropy theory
Equation (8) shows that C_{ i }, \( i = 1,{ 2}, \ldots ,m, \) are functions of \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots , \, \lambda_{m} \).
Equation (12) shows that maximum entropy is a function of Lagrange multipliers and constraints, such that H_{max} is a concave function. Equation (12) also shows that Lagrange multipliers, \( \lambda_{ 1} , \, \lambda_{ 2} , \ldots ,\lambda_{m} , \) are partial derivatives of H_{max} with respect to constraints C_{ i }, \( i = 1,{ 2}, \ldots ,m, \) respectively.
With the Chi square distribution as the limiting distribution, it is shown that 2NΔH is Chi square distributed. Hence, the Chi square statistic may be applied to assess if the fitted parametric distribution is close to the POME-based distribution (i.e., the reference distribution of random variable).
Methodology for application of copula theory
Definition and main properties for copula
- 1.
\( 0\; \le \;C(u_{1} , \ldots ,u_{d} )\; \le \;1 \);
- 2.
if any \( u_{i} = 0 \), then \( C(u_{1} , \ldots ,u_{d} ) = 0 \);
- 3.
if all \( u_{j} = 1, \, j = 1, \ldots ,d{\text{ and }}j \ne i \); then \( C(1, \ldots ,u_{i} , \ldots ,1) = u_{i} \);
- 4.
C is bounded by the Fréchet–Hoeffding bounds as
$$ {W \le C \le M;} \, \quad W = \hbox{max} \,\left( {1 - d + \sum\limits_{i = 1}^{d} {u_{i} } ,0} \right), \, M = \hbox{min} \,(u_{1} , \ldots ,u_{d} ) $$(16)In Eq. (16), W represents the perfectly negative dependence, while M represents the perfect positive dependence. For independent random variables, the corresponding copula function is simply given as \( \varPi = u_{1} u_{2} \cdot \cdot \cdot u_{d} = F_{1} (x_{1} )F_{2} (x_{2} ) \cdots F_{d} (x_{d} ) \); and
- 5.
C is d-increasing, that is, the \( C\left( {u_{ 1} , \ldots , u_{d} } \right) \) volume for any given d-dimensional interval is non-negative.
Copula families and parameter estimation
Meta-elliptical copulas (Fang et al. 2002), as the name suggests, is derived from the elliptical joint distribution. The popularly applied meta-elliptical copulas are meta-Gaussian and meta-Student t copulas. Unlike the Archimedean copulas, the meta-elliptical copulas can model the entire range of dependence structure and can be easily applied to high-dimensional multivariate modeling. Comparing the two popularly applied meta-elliptical copulas, there exists the symmetric tail dependence for meta-Student t copula, while no tail dependence exists for meta-Gaussian copula (e.g. Genest et al. 2007; Song and Singh 2010).
In Eq. (18), C denotes the extreme value copula, and C_{ F } denotes that the copula fulfills the limiting relation.
In Eq. (19), A denotes the Pickands dependence function (Pickands 1981; Falk and Reiss 2005) that is convex as \( {A:}\;[0,1] \to [1/2, 1 ] {\text{ and max}}\,(t, \, 1 - t) \le A(t) \le 1{\text{ for }}t \in [0, 1 ] \).
The Gumbel–Hougaard copula (Archimedean copula family) is the only Archimedean copula that belongs to the extreme value family. Hence, the Gumbel–Hougaard copula has been popularly applied in bivariate flood frequency analysis, storm analysis, drought analysis, etc.
In Eq. (20), c denotes the copula density function. As seen in Eq. (20), the vine copula is very flexible, since the bivariate copula is applied at all the levels. The vine copula has also been applied in high-dimensional hydrological frequency analysis (e.g., Pham et al. 2016; Arya and Zhang 2017; Verneiuwe et al. 2015)
- (i)
Full-Maximum Likelihood Estimation (Full-MLE): In this method, the parameters of the marginal distributions and copula functions are estimated simultaneously.
- (ii)
Two-Stage Maximum Likelihood Estimation (Two-Stage MLE): In this method, one first estimates the parameters of marginal distributions and then the parameters of the copula function are estimated using MLE with the marginals computed from the previously fitted marginal distributions.
- (iii)
Semi-Parametric (or Pseudo) Maximum Likelihood Estimation (Pseudo-MLE): In this method, the parameters of the copula function are estimated from the empirical marginals (i.e., empirical CDF computed from the plotting position formula or kernel density function).
Of the three estimation methods for parametric copula functions, the Pseudo-MLE is considered least impacted by the possible misidentification of marginal distributions. The advantage of Pseudo-MLE is the separate parameter estimation of marginal distributions and the copula function.
According to the information theory, the mutual information [i.e., \( I({X{;}Y}) \)] is a measure of the total correlation between random variables, that is, the mutual dependence between random variables X and Y. From the copula theory [e.g., Eq. (22) for bivariate random variables], the copula density [i.e., \( c(u,v) \)] also denotes the mutual dependence between variables X and Y. Thus, the information maintained in the copula function is the mutual information (i.e., total correlation) between X and Y which results in the copula entropy being negative. In other words, a higher absolute value of the copula entropy represents higher mutual dependence (or total correlation) among the random variables.
In Eq. (27), Spearman’s rho is commonly applied as the constraint to measure the dependence with \( a_{j} (u,v) = uv \Rightarrow E(uv) = \frac{{\rho_{s} + 3}}{12} \). One can also apply other dependence measures discussed in Nelsen (2006) and Chu (2011).
In Eq. (28), \( \lambda_{0} , \ldots ,\lambda_{n} ,\gamma_{1} , \ldots ,\gamma_{n} ,\lambda_{n + 1} , \ldots ,\lambda_{n + k} \) are the Lagrange multipliers. More specifically for MECC, \( \lambda_{r} = \gamma_{r} , \, r = 1, \ldots ,n \). The Lagrange multipliers \( \lambda_{n + 1} , \ldots ,\lambda_{n + k} \) are pertaining to the constraints in relation to the rank-based dependence measures.
In Eq. (30), \( \varLambda = \left[ {\lambda_{1} , \ldots ,\lambda_{n} ,\gamma_{1} , \ldots ,\gamma_{n} ,\lambda_{n + 1} , \ldots ,\lambda_{n + k} } \right] \).
In Eq. (31), b is a generic constant, \( \tilde{c}(u,v) \) is the given reference copula. It is seen that the MECC is obtained by setting b = 0. In what follows, we will focus on the application of MECC for bivariate cases through examples.
Copula–entropy for multivariate modeling
- (i)
The marginal distributions are derived using the entropy theory, while the joint distribution (i.e., copula function) is modeled through the parametric copula function with its parameter estimated using the Full-MLE, Two-Stage MLE, or Pseudo-MLE. In this approach, the goodness-of-fit of the copula function may be assessed either graphically through the K–K plot or statistically with the formal goodness-of-fit test statistics (Genest et al. 2009).
- (ii)
The difference of this second approach from (i) above is that the parametric copula function is selected such that it yields the maximum entropy among all copula candidates.
- (iii)
The approach (iii) takes full advantage of the entropy theory. Both marginal and joint distributions are derived using the entropy theory. The Lagrange multipliers are estimated by maximizing entropy or minimizing the corresponding objective function which is the dual problem of maximizing entropy. The Lagrange multipliers of the MECC (joint distribution) may be optimized from the fitted POME-based marginal distributions or from the empirical marginal distribution. The approach (iii) is further adopted for the applications.
Application to multivariate data of known population
Study of univariate variates
In Singh (1998), it was shown that \( E[X], \;{\text{and}} \;E[\ln (X)] \) should be applied as constraints to derive the POME-based gamma distribution; while \( E\left[ {\ln \left( x \right)} \right] \;{\text{and}}\; E\left[ {\left( {\ln x} \right)^{2} } \right] \) are the constraints to derive the POME-based lognormal distribution. Following Singh (1998), we have the following:
Gamma distribution
Lognormal distribution
In Eq. (33d), y = ln (x) and s _{ y } ^{2} represents the sample variance of y.
Lagrange multipliers estimated from sample dataset and the true population
Lagrange multipliers | X ~ gamma | Y ~ lognormal | ||||
---|---|---|---|---|---|---|
λ _{0} | λ _{1} | λ _{2} | λ _{0} | λ _{1} | λ _{2} | |
Sample | 7.0881 | 0.0505 | − 1.3190 | 16.9052 | − 6.8616 | 0.9810 |
Population | 12.2919 | 0.0952 | − 3.3000 | 17.4612 | − 7.1633 | 1.0204 |
Lagrange multipliers estimated using the first four moments about origin
λ _{0} | λ _{1} | λ _{2} | λ _{3} | λ _{4} | |
---|---|---|---|---|---|
X _{ s } | 1.6011 | − 29.2444 | 101.8716 | − 125.7947 | 57.5913 |
Y _{ s } | − 1.2604 | − 11.6222 | 103.6613 | − 182.3986 | 101.5606 |
Chi square univariate goodness-of-fit results (comparing to the population parameters)
Type | X | Y | ||||||
---|---|---|---|---|---|---|---|---|
S ^{a} | Cri^{b}. | P value | df | S | Cri. | P value | df | |
Fitted to sample^{c} | 3.74 | 15.51 | 0.88 | 8 | 0.92 | 15.51 | 1.00 | 8 |
Moments about origin^{d} | 5.94 | 12.59 | 0.43 | 6 6 | 6.54 | 12.59 | 0.37 | 6 6 |
Study of dependence
As previously discussed, one may apply three different approaches to study the dependence using the copula–entropy theory. Hereafter, each approach is evaluated. Within the objective of the study, the Gumbel–Hougaard, Clayton, Frank and meta-t copulas (Nelsen 2006) were applied as parametric copulas. The MECC copula was derived with the constraints of \( E\left( U \right),E\left( {U^{2} } \right),E\left( V \right),E\left( {V^{2} } \right) \) and \( E\left( {UV} \right) \). According to the discussion in “Univariate analysis of peak discharge and flood volume” section for univariate analysis, we will simply apply the POME-based distribution derived using the moments about the origin with the use of scaled variables.
POME-based marginals with parametric copulas
Parameters, LogL, and entropy estimated from parametric copula
Copula | GH | Clayton | Frank | T | ||
---|---|---|---|---|---|---|
POME marginals | Parameter | 4.8534 | 4.2251 | 17.2725 | 0.9474 | ν = 4.4479 |
LogL | 1098.3 | 712.6209 | 995.3770 | 1061.7 | ||
Entropy^{a} | − 1.0983 | − 0.7126 | − 0.9954 | − 1.0617 | ||
Empirical marginals | Parameter | 4.5732 | 3.1897 | 16.0426 | 0.9356 | ν = 4.1301 |
LogL | 1106.2 | 653.8323 | 973.6298 | 1040.8 | ||
Entropy | − 1.1062 | − 0.6538 | − 0.9736 | − 1.0408 |
POME-based marginals with parametric copulas selected based on the entropy
The computed entropy is also listed in Table 4. From the computed entropy using Eq. (40), it is seen that the Gumbel–Hougaard copula yielded the highest mutual information (the absolute value of the copula entropy) among all the copula candidates.
Parametric copulas estimated using Pseudo-MLE
In this approach, the parameters of the copula were directly estimated using the empirical distribution (e.g., empirical distribution using the Weibull plotting position formula) which is listed in Table 4. It is seen that with the Pseudo-MLE, the Gumbel–Hougaard copula again yielded the largest MLE and the highest mutual information.
Most entropic canonical copula with POME-based marginals (or empirical marginals)
Parameters estimated for MECC copula
λ _{0} | λ _{1} | λ _{2} | γ _{1} | γ _{2} | λ _{3} | |
---|---|---|---|---|---|---|
With sample Spearman’s rho as the constraints | ||||||
POME marginal | − 1.7581 | 1.2443 | 35.7275 | 1.2443 | 35.7275 | − 73.9435 |
Empirical marginal | − 1.7581 | 1.2443 | 35.7275 | 1.2443 | 35.7275 | − 73.9435 |
With true Spearman’s rho as the constraints (from the true GH-copula) | ||||||
− 1.7628 | 1.2356 | 36.3731 | 1.2356 | 36.3731 | − 75.2173 |
- 1.
Generate random variables [U_{1},U_{2}] with sample N from the MECC derived, where N is greater than the sample size of the observed dataset.
- 2.
Approximate [\( K_{\varvec{\lambda}} (t) \)] using:
Overall, from the bivariate analysis of sample data, MECC may be directly applied to model the dependence structure of the random variables. In the case of the MECC application, the impact of the marginal distributions is eliminated. In the next section, we will use the real watershed data as a case study to further illustrate the copula–entropy theory as well as risk analysis.
Case study with real watershed data
Collected from Flume 1 at Walnut Gulch Watershed in Arizona, the annual maximum flood data [i.e., peak discharge (Q) and flood volume (V)] from 1957 to 2012 were considered for the case study. Based on the findings from analysis of sample data, the case study proceeded as follows: (i) the POME-based univariate distribution was applied to model the univariate peak discharge and flood volume; and (ii) the MECC was applied to model the dependence between peak discharge and flood volume.
Univariate analysis of peak discharge and flood volume
Sample statistics for scaled peak discharge and flood volume
Variable | E(X) | E(X^{2}) | E(X^{3}) | E(X^{4}) | T | P |
---|---|---|---|---|---|---|
Peak discharge | 0.1499 | 0.1712 | 2.5921 | 12.0061 | 12.7843 | ≪ 0.05 |
Flood volume | 0.2004 | 0.1988 | 1.4922 | 5.8259 | 5.0802 | ≪ 0.05 |
In Eqs. (44a)–(44c), γ_{2} and γ _{2} ^{ex} denote the sample kurtosis and excessive kurtosis; n is the sample size; SEK is the standard error of kurtosis; and T is the test statistic with the underlying distribution of standard normal distribution.
Lagrange multipliers for POME-based univariate distribution
Variable | λ _{0} | λ _{1} | λ _{2} | λ _{3} | λ _{4} |
---|---|---|---|---|---|
Peak discharge | − 1.9340 | 5.8624 | 8.3878 | − 10.5178 | 0.0004 |
Flood volume | − 1.6668 | 5.8557 | − 1.7289 | 0.2827 | 0.0003 |
Bivariate flood frequency analysis with MECC
Let U and V represent the univariate marginals for peak discharge and flood volume, the same constraints to construct MECC for sample data [i.e., \( E\left( U \right),E\left( {U^{2} } \right),E\left( V \right),E\left( {V^{2} } \right), E(UV) \)] were applied to model the dependence of peak discharge and flood volume. The Lagrange multipliers were optimized by minimizing the objective function of Eq. (31a) with b = 0.
Risk analysis
Joint CDF and T_{and} estimated from the empirical copula and MECC
C(u,v) | P (discharge) | ||||
---|---|---|---|---|---|
P = 0.8 | P = 0.9 | P = 0.96 | P = 0.98 | ||
P (volume) | |||||
Empirical | P = 0.8 | 0.7500 | 0.7857 | 0.8036 | 0.8036 |
P = 0.9 | 0.8036 | 0.8750 | 0.9107 | 0.9107 | |
P = 0.96 | 0.8036 | 0.9107 | 0.9464 | 0.9643 | |
P = 0.98 | 0.8036 | 0.9107 | 0.9643 | 0.9821 | |
MECC | P = 0.8 | 0.7503 | 0.7861 | 0.7940 | 0.7953 |
P = 0.9 | 0.7861 | 0.8649 | 0.8946 | 0.9013 | |
P = 0.96 | 0.7940 | 0.8946 | 0.9429 | 0.9557 | |
P = 0.98 | 0.7953 | 0.9013 | 0.9557 | 0.9709 |
T_{and} (years) | Discharge (cms) | ||||
---|---|---|---|---|---|
73.20 | 109.90 | 170.66 | 230.59 | ||
Volume (m^{3}) | |||||
Empirical | 3.18 × 10^{5} | 6.6667 | 11.6667 | 22.9508 | 42.4242 |
4.60 × 10^{5} | 9.6552 | 13.3333 | 19.7183 | 32.5581 | |
6.44 × 10^{5} | 22.9508 | 19.7183 | 37.8378 | 41.1765 | |
7.71 × 10^{5} | 42.4242 | 32.5581 | 41.1765 | 45.1613 | |
MECC | 3.18 × 10^{5} | 6.6535 | 11.6146 | 29.4142 | 65.3461 |
4.60 × 10^{5} | 11.6146 | 15.3998 | 28.9062 | 46.8990 | |
6.44 × 10^{5} | 29.4142 | 28.9062 | 43.6433 | 63.4992 | |
7.71 × 10^{5} | 65.3462 | 46.8990 | 63.4992 | 91.8599 |
- (i)
There was a small difference between the joint CDFs computed from empirical copula and the MECC. The absolute relative difference was in the range from 0.96% for C(0.8,0.8) to 2.17% for C(0.8,0.9). Thus, in regard to the joint CDF, the differences were insignificant.
- (ii)
Though the difference with joint CDF estimated may not be significant, it resulted in larger differences in regard to the “AND” case return period. It is seen that with the increased marginal probability, the discrepancy also increased between the T_{and} estimated from empirical copula and the MECC.
- (iii)
There was an interesting finding which was in agreement with T_{and} estimated from empirical copula and MECC. Using volume = 6.44 × 10^{5} m^{3} corresponding to P = 0.96 as an example, the joint return period computed from smaller peak discharge (e.g., Q = 73.2 cms corresponding to P = 0.8) was less than that computed with larger peak discharge (e.g., Q = 109.9 cms). This was true in reality, since it was more likely for (Q ≥ 109.9 cms and V ≥ 6.44 × 10^{5} m^{3}) to occur simultaneously compared to that for (Q ≥ 73.2 cms and V ≥ 6.44 × 10^{5} m^{3}). This finding was also in the agreement that higher discharge was most likely associated higher flood volume. This scenario also happened for large flood volume with relatively low peak discharge and vice versa.
Discussion and conclusions
In this study, we investigate the copula–entropy theory in bivariate analysis. Using the sample data with the known univariate populations (i.e., gamma and lognormal) and known dependence (Gumbel–Hougaard), it is concluded that the POME-based distribution derived may model the univariate distribution well. There is minimal difference for POME-based distribution based on the moment of the observed variable and that derived based on the scaled variable (i.e., scaling the observed variable to [0,1]). To avoid the improper integrals, the scaled variable is suggested to derive the POME-based distribution. Comparing to the true Gumbel–Hougaard copula, the MECC derived using the constraints of E(U), E(U^{2}), E(V), E(V^{2}), and E(UV) can properly model the dependence structure of the sample data. The MECC constructed successfully fulfills the fundamental properties of the copula, i.e., C(u,1) = u; C(1,v) = v. In addition, the derived MECC can well present the true dependence structure represented with the Gumbel–Hougaard copula.
Using the real watershed data (i.e., Flume 1 at Walnut Gulch, Arizona), the case study shows the appropriateness of POME-univariate distribution of scaled variable to model the univariate distribution for the observed variates. With the constraints E(U), E(U^{2}), E(V), and E(V^{2}) converging to the population moments of the uniform distributed variables as \( E(U^{i} ) = E(V^{i} ) = {1 \mathord{\left/ {\vphantom {1 {(i + 1)}}} \right. \kern-0pt} {(i + 1)}} \); the MECC constructed only depends on the rank-based dependence measure (in this case, Spearman’s rho). The derived MECC properly models the dependence of annual peak discharge and flood volume, which is independent of the marginal distributions (non-parametric or parametric). The evaluation of the flood risk (using “AND” case return period) indicates that the MECC copula reasonably represents the change of the return period of “AND” case.
- (i)
For the bivariate random variables investigated, the MECC may be easily and efficiently applied to model the dependence structure. Unlike other copulas, the MECC is uniquely defined for a given set of constraints. Its uniqueness allows one universal solution for the proposed frequency analysis.
- (ii)
Similar to other copula families (e.g., Archimedean copulas, meta-elliptical copulas, vine copulas, etc.), the MECC may be applied for multivariate analysis in hydrology and water engineering, including multivariate rainfall analysis, multivariate drought analysis, spatial analysis of drainage networks, and spatial analysis of water quality as few examples.
- (iii)
The bivariate MECC copula may be easily extended to higher dimensions. For example, for the d-dimensional variables \( \left[ {X_{ 1} ,X_{ 2} , \ldots ,X_{d} } \right] \) with the marginals of \( U_{i} = F_{i} (X_{i} ),i = 1,2, \ldots ,d \); the MECC may be constructed using the set of constraints, i.e., marginal \( E(U_{i}^{r} ) = {1 \mathord{\left/ {\vphantom {1 {(r + 1)}}} \right. \kern-0pt} {(r + 1)}},\quad \, i = 1,2, \ldots ,d \) and pair-wise \( E(U_{i} U_{j} );\quad \, i,j \in [1,d],i \ne j \) estimated from rank-based Spearman’s coefficient of correlation. The same optimization procedure applied for the bivariate case may be applied to construct the MECC for dependence structure in higher dimensions.
Notes
Declarations
Authors’ contributions
VPS conceptualized the paper, helped with data interpretation and crafting the manuscript. LZ did analysis, processed the data, constructed all the graphs and wrote the first draft. Both authors read and approved the final manuscript.
Acknowledgements
No applicable.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
The data used is in public domain and is available for anyone to use.
Consent for publication
We consent for publication.
Ethics approval and consent to participate
Not applicable.
Funding
No funding source is available.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Aas K, Czado C, Frigessi A, Bakken H (2007) Pair-copula constructions of multiple dependence. Insur Math Econ. https://doi.org/10.1016/j.insmatheco.2007.02.001 Google Scholar
- Aghakouchak A (2014) Entropy–copula in hydrology and climatology. J Hydrometeorol 15:2176–2189. https://doi.org/10.1175/jhm-d-13-0207.1 View ArticleGoogle Scholar
- Arya FK, Zhang L (2017) Copula-based Markov process for forecasting and analyzing risk of water quality time series. J Hydrol Eng 22(6):04017005. https://doi.org/10.1061/(asce)he.1943-5584.00001494 View ArticleGoogle Scholar
- Chen L, Singh VP, Guo S (2013) Measure of correlation between river flows using entropy–copula theory. J Hydrol Eng 18(12):1591–1608. https://doi.org/10.1061/(asce)he.1943-5584.0000714 View ArticleGoogle Scholar
- Chu B (2011) Recovering copulas from limited information and an application to asset allocation. J Bank Finance 35:1824–1842. https://doi.org/10.1016/j.jbankfin.2010.12.011 View ArticleGoogle Scholar
- Cobb L, Koppstein P, Chen NH (1983) Estimation and moment recursion relations for multimodal distributions of the exponential family. J Am Stat Assoc 78:124–130View ArticleGoogle Scholar
- Falk M, Reiss R-D (2005) On pickands coordinates in arbitrary dimensions. J Multivar Anal 92:426–453View ArticleGoogle Scholar
- Fang HB, Fang KT, Kotz S (2002) The meta-elliptical distributions with given marginals. J Multivar Anal 82:1–16View ArticleGoogle Scholar
- Genest C, Favre A-C, Béliveau J, Jacques C (2007) Metaelliptical copulas and their use in frequency analysis of multivariate hydrological data. Water Resour Res 43:W09401. https://doi.org/10.1029/2006wr005275 View ArticleGoogle Scholar
- Genest C, Remillard B, Beaudoin D (2009) Goodness-of-fit tests for copulas: a review and a power study. Insur Math Econ 44(2):199–213. https://doi.org/10.1016/j.insmatheco.2007.10.005 View ArticleGoogle Scholar
- Gudendorf G, Segers J (2009) Extreme-value copulas. arXiv:0911.1015v2
- Hao Z, Singh VP (2012) Entropy–copula method for single-site monthly streamflow simulation. Water Resour Res 48:W06604. https://doi.org/10.1029/wr011419 View ArticleGoogle Scholar
- Jaynes ET (1957) Information theory and statistical mechanics. Phys Rev 106:620–630View ArticleGoogle Scholar
- Joe H (2014) Dependence modeling with copulas. CRC Press, Boca RatonGoogle Scholar
- Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86View ArticleGoogle Scholar
- Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer Science + Business Media, Inc., BerlinGoogle Scholar
- Pham MT, Vernieuwe H, Baets BD, Willems P, Verhoest NEC (2016) Stochastic simulation of precipitation-consistent daily reference evapotranspiration using vine copulas. Stoch Environ Res Risk Assess 30:2197–2214. https://doi.org/10.1007/s00477-015-1181-7 View ArticleGoogle Scholar
- Pickands J (1981) Multivariate extreme value distribution. Bull Int Stat Inst 49:859–878Google Scholar
- Renyi A (1951) On measure of entropy and information. In: Proceedings, 4th Berkeley symposium, mathematics, statistics, and probability, Berkeley, California, pp 547–561Google Scholar
- Requena AI, Chebana F, Mediero L (2016a) A complete procedure for multivariate index-flood model application. J Hydrol 535:559–580. https://doi.org/10.1016/j.jhydrol.2016.02.004 View ArticleGoogle Scholar
- Requena AI, Flores I, Mediero L, Garrote L (2016b) Extension of observed flood series by combining a distributed hydro-meteorological model and a copula-based model. Stoch Environ Res Risk Assess 30:1363–1378. https://doi.org/10.1007/200477-015-1138-x View ArticleGoogle Scholar
- Salvadori G, Michele CD (2015) Multivariate real-time assessment of droughts via copula-based multi-site hazard trajectories and fans. J Hydrol 526:101–115. https://doi.org/10.1016/j.jhydrol.2014.11.056 View ArticleGoogle Scholar
- Shannon CE (1948) A mathematical theory of communication. Bell Syst Technol J 27:379–423View ArticleGoogle Scholar
- Singh VP (1998) Entropy-based parameter estimation in hydrology. Springer, DordrechtView ArticleGoogle Scholar
- Singh VP, Rajagopal AK (1986) A new method of parameter estimation for hydrologic frequency analysis. Hydrol Sci Technol 2(3):33–40Google Scholar
- Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Universite Paris, Paris, p 8Google Scholar
- Song S, Singh VP (2010) Meta-elliptical copulas for drought frequency analysis of periodic hydrologic data. Stoch Environ Res Risk Assess 24(3):425–444. https://doi.org/10.1007/s00477-009-0331-1 View ArticleGoogle Scholar
- Sraj M, Bezak N, Brilly M (2015) Bivariate flood frequency analysis using the copula function: a case study of the Litija station on the Sava River. Hydrol Process 29:225–238. https://doi.org/10.1002/hyp.10145 View ArticleGoogle Scholar
- Tsallis C (1988) Possible generalizations of Boltzmann–Gibbs statistics. J Stat Phys 52(1/2):479–487View ArticleGoogle Scholar
- Verneiuwe H, Vandenberghe S, Baets BD, Verhoest NEC (2015) A continuous rainfall model based on vine copulas. Hydrol Earth Syst Sci 19:2685–2699. https://doi.org/10.5194/hess-19-2685-2015 View ArticleGoogle Scholar
- Zellner A, Highfield RA (1988) Calculation of maximum entropy distribution and approximation of marginal posterior distributions. J Econom 37:95–209Google Scholar
- Zhang L, Singh VP (2012) Bivariate rainfall and runoff analysis using entropy and copula theories. Entropy 14:1784–1812. https://doi.org/10.3390/e14091784 View ArticleGoogle Scholar
- Zhang L, Singh VP (2014) Joint conditional probability distributions of runoff depth and peak discharge using entropy theory. J Hydrol Eng 19(6):1150–1159View ArticleGoogle Scholar