1 of 1

Development of a Statistical Predictive Model for Daily Water Table Depth and Important Variables Selection for Inference

Alokesh Mannna1, Devendra M Amatya2, Sushant Mehan3

1-University of Connecticut, Department of Statistics, Storrs, CT 06279 ; 2- Center for Forest Watershed Research, Southern Research Station, USDA Forest Service, Cordesville, SC 29434; 3- Department of Agricultural and Biosystems Engineering, South Dakota State University, Brookings, SD – 57006;

Contacts: Alokesh Manna (alokesh.manna@uconn.edu), Devendra Amatya (devendra.m.amatya@usda.gov), and Sushat Mehan (Sushant.mehan@sdstate.edu). Code:<https://github.com/alokesh17/Water_Table_Depth->. Here is our arxiv paper link <https://arxiv.org/abs/2410.01001>.

D1

D3

D2

Acknowledgements: This research was partly supported by an appointment with the National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI) Program. This program is administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and NSF. ORISE is managed for DOE by ORAU. All opinions expressed in this paper are the author's and do not necessarily reflect the policies and views of NSF, ORAU/ORISE, or DOE. The authors also acknowledge Andy Harrison, Hydrology Technician, at US Forest Service Santee Experimental Forest for providing related data for South Carolina sites and the Weyerhaeuser for providing the data for D1 site in North Carolina.

Abstract: Accurately predicting water table dynamics is vital for sustaining groundwater resources supporting ecological functions and anthropogenic activities. This study evaluates a statistical model (BigVAR) with three major flexibilities: a) prediction under sparsity assumption within model coefficients, b) considers a time series autoregression framework, and c) allows lags present in both dependent and independent variables for estimating daily water table depth using daily hydroclimatic data from the USDA Forest Service Santee Experimental Forest (SC) and a site in NC. Data from 2006–2019 (SC) and 1988–2008 (NC) were used, with key predictors including soil and air temperature, precipitation, wind, and radiation. For WS80, RMSE during the dormant season was 10.09 cm, with a daily testing phase RMSE of 14.94 cm. The model achieved an R2 of 0.93 for 2019 (dry year) and 0.96 for 2016 (wet year). Solar radiation, rainfall, and wind direction were among the most influential variables. This predictive model can aid forest managers and hydrologists in using water table for assessing wetland hydrology and related ecosystem functions in management decisions.

Fig. 1. Location map of study watersheds WS77 and WS80 with Met 5 and Met 25, satellite stations, respectively, WS78 (Turkey Creek (TC)) watershed in green boundary with TC Met - a complete weather station, and another complete weather station (SanteeHQ Met) at the Santee headquarters (HQ) office.

Fig. 2. Location map of study watershed (D1) only, among two other adjacent watersheds D2 and D3, with hydrometeorological stations at the Carteret site in Coastal NC.

OBJECTIVES:

  1. Develop a statistical model to predict daily water table depth at groundwater wells across four experimental watersheds—WS77, WS78, WS80 (in South Carolina, Fig. 1), and D1 (in North Carolina, Fig. 2)—using daily hydrometeorological data. Predictions will be assessed at daily, monthly, growing, and dormant season scales, using variables from rain gauges, weather stations, and streamflow gauging stations located near the wells (see Methods for more details).
  2. Perform statistical inference to identify key variables influencing water table depth across different temporal scales.
  3. Evaluate the impact of excluding daily streamflow data, which may be unavailable in certain regions, on the accuracy of water table depth predictions.

Conclusions:

1. BigVAR is a tool for modeling sparse high-dimensional multivariate time series. VARX are technical approaches that can help to predict water table depth.

2. Bigvar can be utilized as a predictive model for water table depth with reasonably good performance as shown by the graphical plots in Figs. 6, 7, and 8 and statistics of the residual analyses in Tables 1 and 2 and interpretability of the variable selection, although additional field testing is highly recommended.

3. Combination of multiple climate variables (publically available) can describe water table dynamics, with the lag structure in the predictive model playing a key role - particularly the nearest previous water table depth estimates. Additionally, incorporating important weather variables with their lag structure, such as rainfall, solar radiation, net radiation, and likely wind direction, enhances the predictive performance.

4. When the water table is above the surface the predictions are highly uncertain even in process-based models like DRAINMOD and MIKESHE. We mainly focused on the conditions with water table depth below the ground surface.

Fig. 4. Photos of various hydro-meteorologic measuring equipment on the study watersheds. Met 5 and Met 25 are satellite stations measuring air temperature, soil

temperature, and precipitation on WS77 and WS80, respectively. TC Met is the complete weather station on the WS78 watershed. SHQ is another full-weather station at the Santee headquarters office.

Final model selection and implementation:

A time series model that considers an auto-regressive process in the presence of different covariates with their lag structure required for the analysis. It incorporates p number lags of y variable (water table depth) and s many lags for each hydroclimatic variable (x). The hydroclimatic variables are streamflow - mm, water table depth - cm, precipitation - mm, air and soil temperature - deg C, relative humidity - %, solar and net radiation, Mj/m2/day, Wind speed, m sec-1; wind direction - deg, and vapor pressure deficit – kPa. An additive Gaussian error is also incorporated in this model. We selected p and s based on BIC (Bayesian Information Criterion).

Variable selection: which variables are important? We used penalized regression, a combination of L1 (lasso) and L2 (ridge) penalty, called elastic net. We shrink all the coefficients towards zero. The most important coefficients which explain the model, will decrease less in comparison with the coefficients corresponding to less important variables.

Fig. 3. Correlation plots for different hydroclimatic variables for WS77, WS78, and WS80

Fig. 6. Time series of predicted versus measured daily water table depths including their 95% confidence intervals (hatched area) for the wells (a) at WS 77, (b) at WS 78, (c) at WS 80 for the 2016-2019, and (d) D1 in NC for the 2004-2008 testing periods.

Fig. 7. Scatter plots of predicted and measured daily water table depths in cm for watersheds WS 77, WS78, WS80 in SC and D1 in NC

DATA ANALYSES and RESULTS

  • Model: water table depth can be explained by its previous four different lag values
  • Exogenous important variables: rainfall, solar and net radiation, wind direction with previous 2 day’s lag are important variables which can explain water table depth
  • With no daily flow data included, model performance did not change (based on evaluation statistics) significantly possibly due to the presence of rainfall, as dominant variable.

Fig. 8. Time series plots of predicted vs measured in cm for dry year 2007 and wet year 2005 D1 in NC

Fig. 5. Histogram of daily water table depth (cm) for a watershed D1 (average from 2 wells) in NC and across three different well locations in watersheds WS77, WS78, and WS80 in SC.

1:

2:

a)

b)

c)

d)

3:

4:

Y1L1, Y1L2, Y1L3, and Y1L4 are lagged values of the dependent variable. Air Temp C1 is the coefficient for air temperature at time lag 1 and Air Temp C2 is the coefficient for air temperature at time lag 2. This notation applies to each of the other variables.

D1 NC

Arxiv link

GitHUB code

a)

b)

c)

d)