Recurrent Neural Networks (RNN’s) for Time Series Prediction#
Recurrent neural networks are machine learning models that were developed in the context of natural language processing and work well for sequential data. In the context of environmental data sets, this means that they can work well for predicting time-series data sets.
In this tutorial, we will look at training an RNN, LSTM, and GRU model to predict discharge from streams using the CAMELS data set (Newman et al. 2014). The CAMELS data set provides 35 years of daily meteorological forcings and discharge observations from 671 basins across the contiguous United States. As input to the model, we will use meteorological data (total daily precipitation, daily min/max temperature, average solar radiation, and vapor pressure), and we will predict the total discharge from the basin.
This is based off of several recent papers that looked at the potential for LSTM’s to be used for rainfall-runoff modeling.
This notebook follows a similar example here by Frederik Kratzert, implemented in PyTorch
and whihc is based off the following sources:
[1] Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005-6022, https://doi.org/10.5194/hess-22-6005-2018, 2018a.
[2] Kratzert F., Klotz D., Herrnegger M., Hochreiter S.: A glimpse into the Unobserved: Runoff simulation for ungauged catchments with LSTMs, Workshop on Modeling and Decision-Making in the Spatiotemporal Domain, 32nd Conference on Neural Information Processing Systems (NeuRIPS 2018), Montréal, Canada. https://openreview.net/forum?id=Bylhm72oKX, 2018b.
[3] A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D.
# Imports
from pathlib import Path
from typing import Tuple, List
import gcsfs
import matplotlib.pyplot as plt
from numba import njit
import numpy as np
import pandas as pd
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 5
2 from pathlib import Path
3 from typing import Tuple, List
----> 5 import gcsfs
6 import matplotlib.pyplot as plt
7 from numba import njit
ModuleNotFoundError: No module named 'gcsfs'
# Globals
FILE_SYSTEM = gcsfs.core.GCSFileSystem(requester_pays=True)
CAMELS_ROOT = Path('pangeo-ncar-camels/basin_dataset_public_v1p2')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 2
1 # Globals
----> 2 FILE_SYSTEM = gcsfs.core.GCSFileSystem(requester_pays=True)
3 CAMELS_ROOT = Path('pangeo-ncar-camels/basin_dataset_public_v1p2')
NameError: name 'gcsfs' is not defined
import warnings
warnings.filterwarnings('ignore')
This is a helper function that will load in the meteorological data for any specific basin from the CAMELS data set. From the header of the forcing file, we can also extract the catchment area, to normalize the discharge (to mm/day)
def load_forcing(basin: str) -> Tuple[pd.DataFrame, int]:
"""Load the meteorological forcing data of a specific basin.
:param basin: 8-digit code of basin as string.
:return: pd.DataFrame containing the meteorological forcing data and the
area of the basin as integer.
"""
# root directory of meteorological forcings
forcing_path = CAMELS_ROOT / 'basin_mean_forcing' / 'daymet'
# get path of forcing file
files = list(FILE_SYSTEM.glob(f"{str(forcing_path)}/**/{basin}_*.txt"))
if len(files) == 0:
raise RuntimeError(f'No forcing file file found for Basin {basin}')
else:
file_path = files[0]
# read-in data and convert date to datetime index
with FILE_SYSTEM.open(file_path) as fp:
df = pd.read_csv(fp, sep=r'\s+', header=3)
dates = (df.Year.map(str) + "/" + df.Mnth.map(str) + "/"
+ df.Day.map(str))
df.index = pd.to_datetime(dates, format="%Y/%m/%d")
# load area from header
with FILE_SYSTEM.open(file_path) as fp:
content = fp.readlines()
area = int(content[2])
return df, area
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 def load_forcing(basin: str) -> Tuple[pd.DataFrame, int]:
2 """Load the meteorological forcing data of a specific basin.
3
4 :param basin: 8-digit code of basin as string.
(...)
7 area of the basin as integer.
8 """
9 # root directory of meteorological forcings
NameError: name 'pd' is not defined
This is a helper function that loads in the discharge time series for a specific streamflow basin.
def load_discharge(basin: str, area: int) -> pd.Series:
"""Load the discharge time series for a specific basin.
:param basin: 8-digit code of basin as string.
:param area: int, area of the catchment in square meters
:return: A pd.Series containng the catchment normalized discharge.
"""
# root directory of the streamflow data
discharge_path = CAMELS_ROOT / 'usgs_streamflow'
# get path of streamflow file file
files = list(FILE_SYSTEM.glob(f"{str(discharge_path)}/**/{basin}_*.txt"))
if len(files) == 0:
raise RuntimeError(f'No discharge file found for Basin {basin}')
else:
file_path = files[0]
# read-in data and convert date to datetime index
col_names = ['basin', 'Year', 'Mnth', 'Day', 'QObs', 'flag']
with FILE_SYSTEM.open(file_path) as fp:
df = pd.read_csv(fp, sep=r'\s+', header=None, names=col_names)
dates = (df.Year.map(str) + "/" + df.Mnth.map(str) + "/"
+ df.Day.map(str))
df.index = pd.to_datetime(dates, format="%Y/%m/%d")
# normalize discharge from cubic feet per second to mm per day
df.QObs = 28316846.592 * df.QObs * 86400 / (area * 10 ** 6)
return df.QObs
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 def load_discharge(basin: str, area: int) -> pd.Series:
2 """Load the discharge time series for a specific basin.
3
4 :param basin: 8-digit code of basin as string.
(...)
7 :return: A pd.Series containng the catchment normalized discharge.
8 """
9 # root directory of the streamflow data
NameError: name 'pd' is not defined
We’ll load in the data for a single basin and visualize the time series.
basin = '01022500'
df, area = load_forcing(basin)
df['QObs(mm/d)'] = load_discharge(basin, area)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 df, area = load_forcing(basin)
2 df['QObs(mm/d)'] = load_discharge(basin, area)
NameError: name 'load_forcing' is not defined
df.head()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 df.head()
NameError: name 'df' is not defined
df.columns
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 df.columns
NameError: name 'df' is not defined
df.info()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 df.info()
NameError: name 'df' is not defined
import matplotlib.pyplot as plt
df["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
plt.ylabel("Discharge (mm/day)")
plt.xlabel("Year")
plt.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 df["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
2 plt.ylabel("Discharge (mm/day)")
3 plt.xlabel("Year")
NameError: name 'df' is not defined
df["1995":"2005"]["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
plt.ylabel("Discharge (mm/day)")
plt.xlabel("Year")
plt.show()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 df["1995":"2005"]["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
2 plt.ylabel("Discharge (mm/day)")
3 plt.xlabel("Year")
NameError: name 'df' is not defined
Preprocess data sets for machine learning#
We want to predict the stream discharge rate for the future, given the past meteorological data sets. We will first preprocess the data using the StandardScalar
from sci-kit learn
.
targets = df[["QObs(mm/d)"]].copy()
features = df.drop(columns = ['Year', 'Mnth', 'Day', 'Hr', 'dayl(s)','swe(mm)','QObs(mm/d)']).copy()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 targets = df[["QObs(mm/d)"]].copy()
2 features = df.drop(columns = ['Year', 'Mnth', 'Day', 'Hr', 'dayl(s)','swe(mm)','QObs(mm/d)']).copy()
NameError: name 'df' is not defined
features.hist()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 features.hist()
NameError: name 'features' is not defined
targets.hist()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 targets.hist()
NameError: name 'targets' is not defined
from sklearn.preprocessing import StandardScaler, MinMaxScaler
target_scaler = StandardScaler()
feature_scaler = StandardScaler()
scaled_targets = target_scaler.fit_transform(targets)
scaled_features = feature_scaler.fit_transform(features)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 scaled_targets = target_scaler.fit_transform(targets)
2 scaled_features = feature_scaler.fit_transform(features)
NameError: name 'targets' is not defined
plt.hist(targets)
plt.hist(scaled_targets)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 1
----> 1 plt.hist(targets)
2 plt.hist(scaled_targets)
NameError: name 'targets' is not defined
fig, axs = plt.subplots(2, 3,figsize=(8,6))
ax = axs.ravel()
for i in range(0,5):
ax[i].hist(scaled_features[:,i])
ax[i].set_ylabel(features.columns[i])
plt.tight_layout()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[21], line 5
3 ax = axs.ravel()
4 for i in range(0,5):
----> 5 ax[i].hist(scaled_features[:,i])
6 ax[i].set_ylabel(features.columns[i])
7 plt.tight_layout()
NameError: name 'scaled_features' is not defined

Reshaping data for RNN training#
We next need to reshape data so that it is in the correct format to train an RNN, LSTM, or GRU model. Recurrent neural networks expect sequential input of the shape (sequence length, number of features)
. We want to train a model to predict a single day of discharge from n
days of previous meteorological observations. For example, if n = 365
, then a single training sample should be of the shape (365, number of features)
. Here we use 5 input features, so the shape would be `(365, 5).
However, the time series data is currently stored in a matrix, where the number of rows correspons to the total number of days in the training data set, and the number of columns is the number of features. We need to slide over this matrix and cut out small samples to act as training samples for the RNN models that we are going to train. Keras
and Tensorflow
have a couple of different utility functions that can help us with this task.
import tensorflow as tf
from tensorflow.keras.models import Model, load_model
my_series = [0, 1, 2, 3, 4, 5]
my_dataset = tf.keras.utils.timeseries_dataset_from_array(
my_series,
targets=my_series[3:], # the targets are 3 steps into the future
sequence_length=3,
batch_size=2
)
list(my_dataset)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[22], line 1
----> 1 import tensorflow as tf
2 from tensorflow.keras.models import Model, load_model
4 my_series = [0, 1, 2, 3, 4, 5]
ModuleNotFoundError: No module named 'tensorflow'
my_dataset
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 my_dataset
NameError: name 'my_dataset' is not defined
for window_dataset in tf.data.Dataset.range(6).window(4, shift=1):
for element in window_dataset:
print(f"{element}", end=" ")
print()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[24], line 1
----> 1 for window_dataset in tf.data.Dataset.range(6).window(4, shift=1):
2 for element in window_dataset:
3 print(f"{element}", end=" ")
NameError: name 'tf' is not defined
dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window_dataset: window_dataset.batch(4))
for window_tensor in dataset:
print(f"{window_tensor}")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[25], line 1
----> 1 dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)
2 dataset = dataset.flat_map(lambda window_dataset: window_dataset.batch(4))
3 for window_tensor in dataset:
NameError: name 'tf' is not defined
def to_windows(dataset, length):
dataset = dataset.window(length, shift=1, drop_remainder=True)
return dataset.flat_map(lambda window_ds: window_ds.batch(length))
Split data into training, validation, and test data#
In addition to setting up the sequence data sets for training the model, we need to designate part of our time series for training, part for validation, and part for testing. We will use 1980 - 1995 for training, 1995 - 2000 for validation, and 2000 to 2010 as our independent test data set.
trainmask = (df.index >="1980-10-01") & (df.index <="1995-09-30")
valmask = (df.index >="1995-10-01") & (df.index <="2000-09-30")
testmask = (df.index >="2000-10-01") & (df.index <="2010-09-30")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[27], line 1
----> 1 trainmask = (df.index >="1980-10-01") & (df.index <="1995-09-30")
2 valmask = (df.index >="1995-10-01") & (df.index <="2000-09-30")
3 testmask = (df.index >="2000-10-01") & (df.index <="2010-09-30")
NameError: name 'df' is not defined
trainidx = np.where(trainmask)[0]
validx = np.where(valmask)[0]
testidx = np.where(testmask)[0]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[28], line 1
----> 1 trainidx = np.where(trainmask)[0]
2 validx = np.where(valmask)[0]
3 testidx = np.where(testmask)[0]
NameError: name 'np' is not defined
plt.plot(scaled_targets,color="k")
plt.plot(trainidx,scaled_targets[trainidx],color="g",label="train")
plt.plot(validx,scaled_targets[validx],color="r",label="val")
plt.plot(testidx,scaled_targets[testidx],color="b",label="test")
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[29], line 1
----> 1 plt.plot(scaled_targets,color="k")
2 plt.plot(trainidx,scaled_targets[trainidx],color="g",label="train")
3 plt.plot(validx,scaled_targets[validx],color="r",label="val")
NameError: name 'scaled_targets' is not defined
We will use an entire year of meteorological data as input to predict the next time step.
sequence_length = 365 # Length of the meteorological record provided to the network
tf.random.set_seed(42) # ensures reproducibility
train_ds = tf.keras.utils.timeseries_dataset_from_array(
scaled_features[trainidx],
targets=scaled_targets[trainidx][sequence_length - 1:],
sequence_length=sequence_length,
batch_size=256,
shuffle=True,
seed=42
)
valid_ds = tf.keras.utils.timeseries_dataset_from_array(
scaled_features[validx],
targets=scaled_targets[validx][sequence_length - 1:],
sequence_length=sequence_length,
batch_size=2048
)
test_ds = tf.keras.utils.timeseries_dataset_from_array(
scaled_features[testidx],
targets=scaled_targets[testidx][sequence_length - 1:],
sequence_length=sequence_length,
batch_size=len(testidx)
)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[31], line 1
----> 1 tf.random.set_seed(42) # ensures reproducibility
3 train_ds = tf.keras.utils.timeseries_dataset_from_array(
4 scaled_features[trainidx],
5 targets=scaled_targets[trainidx][sequence_length - 1:],
(...)
9 seed=42
10 )
12 valid_ds = tf.keras.utils.timeseries_dataset_from_array(
13 scaled_features[validx],
14 targets=scaled_targets[validx][sequence_length - 1:],
15 sequence_length=sequence_length,
16 batch_size=2048
17 )
NameError: name 'tf' is not defined
for x, y in train_ds.take(1):
print("Input shape:", x.shape)
print("Target shape:", y.shape)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[32], line 1
----> 1 for x, y in train_ds.take(1):
2 print("Input shape:", x.shape)
3 print("Target shape:", y.shape)
NameError: name 'train_ds' is not defined
Train a Simple RNN#
We’ll first try training an RNN model.
import os
cwd = os.getcwd()
model_path = os.path.join(cwd,'saved_model')
# set some hyperparameters
n_hidden = 10
patience = 20
epochs = 100
learning_rate = 1e-3
This code creates the RNN model using tf.keras.Sequential
.
tf.random.set_seed(42) # ensures reproducibility
rnn_model = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(n_hidden, input_shape=[None, 5]),
tf.keras.layers.Dense(1)
])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[35], line 1
----> 1 tf.random.set_seed(42) # ensures reproducibility
2 rnn_model = tf.keras.Sequential([
3 tf.keras.layers.SimpleRNN(n_hidden, input_shape=[None, 5]),
4 tf.keras.layers.Dense(1)
5 ])
NameError: name 'tf' is not defined
rnn_model.summary()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[36], line 1
----> 1 rnn_model.summary()
NameError: name 'rnn_model' is not defined
We’ll include a custom metric, the Nash Sutcliffe Efficiency, which is a widely used metric in hydrology for assessing how well a model predicts the observed data.
class NashSutcliffeEfficiency(tf.keras.metrics.Metric):
def __init__(self, name='nse', scaler=None, **kwargs):
super().__init__(name=name, **kwargs)
self.sse = self.add_weight(name='sse', initializer='zeros')
self.sst = self.add_weight(name='sst', initializer='zeros')
self.scaler = scaler
def update_state(self, y_true, y_pred, sample_weight=None):
if self.scaler is not None:
u = self.scaler.mean_
s = self.scaler.var_
y_true = y_true*s+u
y_pred = y_pred*s+u
y_true = tf.cast(y_true, tf.float32)
y_pred = tf.cast(y_pred, tf.float32)
sse = tf.reduce_sum(tf.square(y_true - y_pred))
sst = tf.reduce_sum(tf.square(y_true - tf.reduce_mean(y_true)))
self.sse.assign_add(sse)
self.sst.assign_add(sst)
def result(self):
return 1.0 - self.sse / self.sst
def reset_states(self):
self.sse.assign(0.0)
self.sst.assign(0.0)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[37], line 1
----> 1 class NashSutcliffeEfficiency(tf.keras.metrics.Metric):
2 def __init__(self, name='nse', scaler=None, **kwargs):
3 super().__init__(name=name, **kwargs)
NameError: name 'tf' is not defined
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[38], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
2 monitor="val_loss", patience=patience, restore_best_weights=True)
3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
NameError: name 'tf' is not defined
rnn_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history = rnn_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
rnn_model.save(os.path.join(model_path,'RNN_timeseries_model.keras'))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[39], line 1
----> 1 rnn_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
2 history = rnn_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
3 rnn_model.save(os.path.join(model_path,'RNN_timeseries_model.keras'))
NameError: name 'rnn_model' is not defined
valid_loss, valid_mae = rnn_model.evaluate(valid_ds)
valid_mae
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[40], line 1
----> 1 valid_loss, valid_mae = rnn_model.evaluate(valid_ds)
2 valid_mae
NameError: name 'rnn_model' is not defined
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss')
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[41], line 4
2 plt.xlabel('Epoch')
3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss')
5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss')
6 plt.legend()
NameError: name 'history' is not defined

out = rnn_model.predict(test_ds)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[42], line 1
----> 1 out = rnn_model.predict(test_ds)
NameError: name 'rnn_model' is not defined
out.shape
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[43], line 1
----> 1 out.shape
NameError: name 'out' is not defined
for x, y in test_ds.take(1):
yvals = y.numpy()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[44], line 1
----> 1 for x, y in test_ds.take(1):
2 yvals = y.numpy()
NameError: name 'test_ds' is not defined
plt.plot(yvals,label="true")
plt.plot(out[:,0],label="prediction")
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[45], line 1
----> 1 plt.plot(yvals,label="true")
2 plt.plot(out[:,0],label="prediction")
3 plt.legend()
NameError: name 'yvals' is not defined
Train an LSTM model#
from tensorflow.keras.initializers import Orthogonal
dropout_rate = 0.0
tf.random.set_seed(42) # extra code – ensures reproducibility
lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(n_hidden, input_shape=[None, 5], return_sequences=False),
tf.keras.layers.Dropout(dropout_rate),
tf.keras.layers.Dense(1)
])
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[46], line 1
----> 1 from tensorflow.keras.initializers import Orthogonal
3 dropout_rate = 0.0
4 tf.random.set_seed(42) # extra code – ensures reproducibility
ModuleNotFoundError: No module named 'tensorflow'
lstm_model.summary()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[47], line 1
----> 1 lstm_model.summary()
NameError: name 'lstm_model' is not defined
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[48], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
2 monitor="val_loss", patience=patience, restore_best_weights=True)
3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
NameError: name 'tf' is not defined
lstm_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history_lstm = lstm_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
lstm_model.save(os.path.join(model_path,'LSTM_timeseries_model.keras'))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[49], line 1
----> 1 lstm_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
2 history_lstm = lstm_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
3 lstm_model.save(os.path.join(model_path,'LSTM_timeseries_model.keras'))
NameError: name 'lstm_model' is not defined
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['val_loss']),label = 'Val loss - LSTM')
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[50], line 4
2 plt.xlabel('Epoch')
3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
6 plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
NameError: name 'history' is not defined

out_lstm = lstm_model.predict(test_ds)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[51], line 1
----> 1 out_lstm = lstm_model.predict(test_ds)
NameError: name 'lstm_model' is not defined
plt.plot(yvals,label="true")
plt.plot(out_lstm[:,0],label="prediction - LSTM")
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[52], line 1
----> 1 plt.plot(yvals,label="true")
2 plt.plot(out_lstm[:,0],label="prediction - LSTM")
3 plt.legend()
NameError: name 'yvals' is not defined
Train a GRU model#
tf.random.set_seed(42) # ensures reproducibility
gru_model = tf.keras.Sequential([
tf.keras.layers.GRU(n_hidden, return_sequences=False, input_shape=[None, 5]),
tf.keras.layers.Dense(1)
])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[53], line 1
----> 1 tf.random.set_seed(42) # ensures reproducibility
2 gru_model = tf.keras.Sequential([
3 tf.keras.layers.GRU(n_hidden, return_sequences=False, input_shape=[None, 5]),
4 tf.keras.layers.Dense(1)
5 ])
NameError: name 'tf' is not defined
gru_model.summary()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[54], line 1
----> 1 gru_model.summary()
NameError: name 'gru_model' is not defined
early_stopping_cb = tf.keras.callbacks.EarlyStopping(
monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[55], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
2 monitor="val_loss", patience=patience, restore_best_weights=True)
3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
NameError: name 'tf' is not defined
gru_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history_gru = gru_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
gru_model.save(os.path.join(model_path,'GRU_timeseries_model.keras'))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[56], line 1
----> 1 gru_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
2 history_gru = gru_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
3 gru_model.save(os.path.join(model_path,'GRU_timeseries_model.keras'))
NameError: name 'gru_model' is not defined
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['val_loss']),label = 'Val loss - LSTM')
plt.plot(history_gru.epoch, np.array(history_gru.history['loss']),label='Train Loss - GRU')
plt.plot(history_gru.epoch, np.array(history_gru.history['val_loss']),label = 'Val loss - GRU')
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[57], line 4
2 plt.xlabel('Epoch')
3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
6 plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
NameError: name 'history' is not defined

out_gru = gru_model.predict(test_ds)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[58], line 1
----> 1 out_gru = gru_model.predict(test_ds)
NameError: name 'gru_model' is not defined
plt.plot(yvals,label="true")
plt.plot(out_lstm[:,0],label="prediction - LSTM")
plt.plot(out_gru[:,0],label="prediction - GRU")
plt.plot(out[:,0],label="prediction - RNN")
plt.legend()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[59], line 1
----> 1 plt.plot(yvals,label="true")
3 plt.plot(out_lstm[:,0],label="prediction - LSTM")
4 plt.plot(out_gru[:,0],label="prediction - GRU")
NameError: name 'yvals' is not defined