Development of a Statistical Predictive Model for Daily Water Table Depth and Important Variables Selection for Inference
Alokesh Mannna1, Devendra M Amatya2, Sushant Mehan3
1-University of Connecticut, Department of Statistics, Storrs, CT 06279 ; 2- Center for Forest Watershed Research, Southern Research Station, USDA Forest Service, Cordesville, SC 29434; 3- Department of Agricultural and Biosystems Engineering, South Dakota State University, Brookings, SD – 57006;
Contacts: Alokesh Manna (alokesh.manna@uconn.edu), Devendra Amatya (devendra.m.amatya@usda.gov), and Sushat Mehan (Sushant.mehan@sdstate.edu). Code:<https://github.com/alokesh17/Water_Table_Depth->. Here is our arxiv paper link <https://arxiv.org/abs/2410.01001>.
D1
D3
D2
Acknowledgements: This research was partly supported by an appointment with the National Science Foundation (NSF) Mathematical Sciences Graduate Internship (MSGI) Program. This program is administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and NSF. ORISE is managed for DOE by ORAU. All opinions expressed in this paper are the author's and do not necessarily reflect the policies and views of NSF, ORAU/ORISE, or DOE. The authors also acknowledge Andy Harrison, Hydrology Technician, at US Forest Service Santee Experimental Forest for providing related data for South Carolina sites and the Weyerhaeuser for providing the data for D1 site in North Carolina.
Abstract: Accurately predicting water table dynamics is vital for sustaining groundwater resources supporting ecological functions and anthropogenic activities. This study evaluates a statistical model (BigVAR) with three major flexibilities: a) prediction under sparsity assumption within model coefficients, b) considers a time series autoregression framework, and c) allows lags present in both dependent and independent variables for estimating daily water table depth using daily hydroclimatic data from the USDA Forest Service Santee Experimental Forest (SC) and a site in NC. Data from 2006–2019 (SC) and 1988–2008 (NC) were used, with key predictors including soil and air temperature, precipitation, wind, and radiation. For WS80, RMSE during the dormant season was 10.09 cm, with a daily testing phase RMSE of 14.94 cm. The model achieved an R2 of 0.93 for 2019 (dry year) and 0.96 for 2016 (wet year). Solar radiation, rainfall, and wind direction were among the most influential variables. This predictive model can aid forest managers and hydrologists in using water table for assessing wetland hydrology and related ecosystem functions in management decisions.
Fig. 1. Location map of study watersheds WS77 and WS80 with Met 5 and Met 25, satellite stations, respectively, WS78 (Turkey Creek (TC)) watershed in green boundary with TC Met - a complete weather station, and another complete weather station (SanteeHQ Met) at the Santee headquarters (HQ) office.
Fig. 2. Location map of study watershed (D1) only, among two other adjacent watersheds D2 and D3, with hydrometeorological stations at the Carteret site in Coastal NC.
OBJECTIVES:
Conclusions:
1. BigVAR is a tool for modeling sparse high-dimensional multivariate time series. VARX are technical approaches that can help to predict water table depth.
2. Bigvar can be utilized as a predictive model for water table depth with reasonably good performance as shown by the graphical plots in Figs. 6, 7, and 8 and statistics of the residual analyses in Tables 1 and 2 and interpretability of the variable selection, although additional field testing is highly recommended.
3. Combination of multiple climate variables (publically available) can describe water table dynamics, with the lag structure in the predictive model playing a key role - particularly the nearest previous water table depth estimates. Additionally, incorporating important weather variables with their lag structure, such as rainfall, solar radiation, net radiation, and likely wind direction, enhances the predictive performance.
4. When the water table is above the surface the predictions are highly uncertain even in process-based models like DRAINMOD and MIKESHE. We mainly focused on the conditions with water table depth below the ground surface.
Fig. 4. Photos of various hydro-meteorologic measuring equipment on the study watersheds. Met 5 and Met 25 are satellite stations measuring air temperature, soil
temperature, and precipitation on WS77 and WS80, respectively. TC Met is the complete weather station on the WS78 watershed. SHQ is another full-weather station at the Santee headquarters office.
Final model selection and implementation:
A time series model that considers an auto-regressive process in the presence of different covariates with their lag structure required for the analysis. It incorporates p number lags of y variable (water table depth) and s many lags for each hydroclimatic variable (x). The hydroclimatic variables are streamflow - mm, water table depth - cm, precipitation - mm, air and soil temperature - deg C, relative humidity - %, solar and net radiation, Mj/m2/day, Wind speed, m sec-1; wind direction - deg, and vapor pressure deficit – kPa. An additive Gaussian error is also incorporated in this model. We selected p and s based on BIC (Bayesian Information Criterion).
Variable selection: which variables are important? We used penalized regression, a combination of L1 (lasso) and L2 (ridge) penalty, called elastic net. We shrink all the coefficients towards zero. The most important coefficients which explain the model, will decrease less in comparison with the coefficients corresponding to less important variables.
Fig. 3. Correlation plots for different hydroclimatic variables for WS77, WS78, and WS80
Fig. 6. Time series of predicted versus measured daily water table depths including their 95% confidence intervals (hatched area) for the wells (a) at WS 77, (b) at WS 78, (c) at WS 80 for the 2016-2019, and (d) D1 in NC for the 2004-2008 testing periods.
Fig. 7. Scatter plots of predicted and measured daily water table depths in cm for watersheds WS 77, WS78, WS80 in SC and D1 in NC
DATA ANALYSES and RESULTS
Fig. 8. Time series plots of predicted vs measured in cm for dry year 2007 and wet year 2005 D1 in NC
Fig. 5. Histogram of daily water table depth (cm) for a watershed D1 (average from 2 wells) in NC and across three different well locations in watersheds WS77, WS78, and WS80 in SC.
1:
2:
a)
b)
c)
d)
3:
4:
Y1L1, Y1L2, Y1L3, and Y1L4 are lagged values of the dependent variable. Air Temp C1 is the coefficient for air temperature at time lag 1 and Air Temp C2 is the coefficient for air temperature at time lag 2. This notation applies to each of the other variables.
D1 NC
Arxiv link
GitHUB code
a)
b)
c)
d)