Recurrent Neural Networks (RNN’s) for Time Series Prediction

Recurrent Neural Networks (RNN’s) for Time Series Prediction#

Recurrent neural networks are machine learning models that were developed in the context of natural language processing and work well for sequential data. In the context of environmental data sets, this means that they can work well for predicting time-series data sets.

In this tutorial, we will look at training an RNN, LSTM, and GRU model to predict discharge from streams using the CAMELS data set (Newman et al. 2014). The CAMELS data set provides 35 years of daily meteorological forcings and discharge observations from 671 basins across the contiguous United States. As input to the model, we will use meteorological data (total daily precipitation, daily min/max temperature, average solar radiation, and vapor pressure), and we will predict the total discharge from the basin.

This is based off of several recent papers that looked at the potential for LSTM’s to be used for rainfall-runoff modeling.

This notebook follows a similar example here by Frederik Kratzert, implemented in PyTorch and whihc is based off the following sources:

[1] Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005-6022, https://doi.org/10.5194/hess-22-6005-2018, 2018a.

[2] Kratzert F., Klotz D., Herrnegger M., Hochreiter S.: A glimpse into the Unobserved: Runoff simulation for ungauged catchments with LSTMs, Workshop on Modeling and Decision-Making in the Spatiotemporal Domain, 32nd Conference on Neural Information Processing Systems (NeuRIPS 2018), Montréal, Canada. https://openreview.net/forum?id=Bylhm72oKX, 2018b.

[3] A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D.

# Imports
from pathlib import Path
from typing import Tuple, List

import gcsfs
import matplotlib.pyplot as plt
from numba import njit
import numpy as np
import pandas as pd

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 5
      2 from pathlib import Path
      3 from typing import Tuple, List
----> 5 import gcsfs
      6 import matplotlib.pyplot as plt
      7 from numba import njit

ModuleNotFoundError: No module named 'gcsfs'

# Globals
FILE_SYSTEM = gcsfs.core.GCSFileSystem(requester_pays=True)
CAMELS_ROOT = Path('pangeo-ncar-camels/basin_dataset_public_v1p2')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 2
      1 # Globals
----> 2 FILE_SYSTEM = gcsfs.core.GCSFileSystem(requester_pays=True)
      3 CAMELS_ROOT = Path('pangeo-ncar-camels/basin_dataset_public_v1p2')

NameError: name 'gcsfs' is not defined

import warnings
warnings.filterwarnings('ignore')

This is a helper function that will load in the meteorological data for any specific basin from the CAMELS data set. From the header of the forcing file, we can also extract the catchment area, to normalize the discharge (to mm/day)

def load_forcing(basin: str) -> Tuple[pd.DataFrame, int]:
    """Load the meteorological forcing data of a specific basin.

    :param basin: 8-digit code of basin as string.
    
    :return: pd.DataFrame containing the meteorological forcing data and the
        area of the basin as integer.
    """
    # root directory of meteorological forcings
    forcing_path = CAMELS_ROOT / 'basin_mean_forcing' / 'daymet'
    
    # get path of forcing file
    files = list(FILE_SYSTEM.glob(f"{str(forcing_path)}/**/{basin}_*.txt"))
    if len(files) == 0:
        raise RuntimeError(f'No forcing file file found for Basin {basin}')
    else:
        file_path = files[0]
    
    # read-in data and convert date to datetime index
    with FILE_SYSTEM.open(file_path) as fp:
        df = pd.read_csv(fp, sep=r'\s+', header=3)
    dates = (df.Year.map(str) + "/" + df.Mnth.map(str) + "/"
             + df.Day.map(str))
    df.index = pd.to_datetime(dates, format="%Y/%m/%d")

    # load area from header
    with FILE_SYSTEM.open(file_path) as fp:
        content = fp.readlines()
        area = int(content[2])

    return df, area

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 def load_forcing(basin: str) -> Tuple[pd.DataFrame, int]:
      2     """Load the meteorological forcing data of a specific basin.
      3 
      4     :param basin: 8-digit code of basin as string.
   (...)
      7         area of the basin as integer.
      8     """
      9     # root directory of meteorological forcings

NameError: name 'pd' is not defined

This is a helper function that loads in the discharge time series for a specific streamflow basin.

def load_discharge(basin: str, area: int) ->  pd.Series:
    """Load the discharge time series for a specific basin.

    :param basin: 8-digit code of basin as string.
    :param area: int, area of the catchment in square meters
    
    :return: A pd.Series containng the catchment normalized discharge.
    """
    # root directory of the streamflow data
    discharge_path = CAMELS_ROOT / 'usgs_streamflow'
    
    # get path of streamflow file file
    files = list(FILE_SYSTEM.glob(f"{str(discharge_path)}/**/{basin}_*.txt"))
    if len(files) == 0:
        raise RuntimeError(f'No discharge file found for Basin {basin}')
    else:
        file_path = files[0]

    # read-in data and convert date to datetime index
    col_names = ['basin', 'Year', 'Mnth', 'Day', 'QObs', 'flag']
    with FILE_SYSTEM.open(file_path) as fp:
        df = pd.read_csv(fp, sep=r'\s+', header=None, names=col_names)
    dates = (df.Year.map(str) + "/" + df.Mnth.map(str) + "/"
             + df.Day.map(str))
    df.index = pd.to_datetime(dates, format="%Y/%m/%d")

    # normalize discharge from cubic feet per second to mm per day
    df.QObs = 28316846.592 * df.QObs * 86400 / (area * 10 ** 6)

    return df.QObs

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 def load_discharge(basin: str, area: int) ->  pd.Series:
      2     """Load the discharge time series for a specific basin.
      3 
      4     :param basin: 8-digit code of basin as string.
   (...)
      7     :return: A pd.Series containng the catchment normalized discharge.
      8     """
      9     # root directory of the streamflow data

NameError: name 'pd' is not defined

We’ll load in the data for a single basin and visualize the time series.

basin = '01022500'

df, area = load_forcing(basin)
df['QObs(mm/d)'] = load_discharge(basin, area)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 df, area = load_forcing(basin)
      2 df['QObs(mm/d)'] = load_discharge(basin, area)

NameError: name 'load_forcing' is not defined

df.head()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 df.head()

NameError: name 'df' is not defined

df.columns

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 df.columns

NameError: name 'df' is not defined

df.info()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 df.info()

NameError: name 'df' is not defined

import matplotlib.pyplot as plt

df["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
plt.ylabel("Discharge (mm/day)")
plt.xlabel("Year")
plt.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 df["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
      2 plt.ylabel("Discharge (mm/day)")
      3 plt.xlabel("Year")

NameError: name 'df' is not defined

df["1995":"2005"]["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
plt.ylabel("Discharge (mm/day)")
plt.xlabel("Year")
plt.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 df["1995":"2005"]["QObs(mm/d)"].plot(grid=True,marker=".",figsize = (8,3.5))
      2 plt.ylabel("Discharge (mm/day)")
      3 plt.xlabel("Year")

NameError: name 'df' is not defined

Preprocess data sets for machine learning#

We want to predict the stream discharge rate for the future, given the past meteorological data sets. We will first preprocess the data using the StandardScalar from sci-kit learn.

targets = df[["QObs(mm/d)"]].copy()
features = df.drop(columns = ['Year', 'Mnth', 'Day', 'Hr', 'dayl(s)','swe(mm)','QObs(mm/d)']).copy()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 targets = df[["QObs(mm/d)"]].copy()
      2 features = df.drop(columns = ['Year', 'Mnth', 'Day', 'Hr', 'dayl(s)','swe(mm)','QObs(mm/d)']).copy()

NameError: name 'df' is not defined

features.hist()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 features.hist()

NameError: name 'features' is not defined

targets.hist()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 targets.hist()

NameError: name 'targets' is not defined

from sklearn.preprocessing import StandardScaler, MinMaxScaler

target_scaler = StandardScaler()
feature_scaler = StandardScaler()

scaled_targets = target_scaler.fit_transform(targets)
scaled_features = feature_scaler.fit_transform(features)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 scaled_targets = target_scaler.fit_transform(targets)
      2 scaled_features = feature_scaler.fit_transform(features)

NameError: name 'targets' is not defined

plt.hist(targets)
plt.hist(scaled_targets)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 plt.hist(targets)
      2 plt.hist(scaled_targets)

NameError: name 'targets' is not defined

fig, axs = plt.subplots(2, 3,figsize=(8,6))

ax = axs.ravel()
for i in range(0,5):
    ax[i].hist(scaled_features[:,i])
    ax[i].set_ylabel(features.columns[i])
plt.tight_layout()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 5
      3 ax = axs.ravel()
      4 for i in range(0,5):
----> 5     ax[i].hist(scaled_features[:,i])
      6     ax[i].set_ylabel(features.columns[i])
      7 plt.tight_layout()

NameError: name 'scaled_features' is not defined

../../_images/19e89e56e1aa84fb6ee7c2612f1949dd3dc40a50c00992f5ac1f9a0f39c019a9.png

Reshaping data for RNN training#

We next need to reshape data so that it is in the correct format to train an RNN, LSTM, or GRU model. Recurrent neural networks expect sequential input of the shape (sequence length, number of features). We want to train a model to predict a single day of discharge from n days of previous meteorological observations. For example, if n = 365, then a single training sample should be of the shape (365, number of features). Here we use 5 input features, so the shape would be `(365, 5).

However, the time series data is currently stored in a matrix, where the number of rows correspons to the total number of days in the training data set, and the number of columns is the number of features. We need to slide over this matrix and cut out small samples to act as training samples for the RNN models that we are going to train. Keras and Tensorflow have a couple of different utility functions that can help us with this task.

import tensorflow as tf
from tensorflow.keras.models import Model, load_model

my_series = [0, 1, 2, 3, 4, 5]
my_dataset = tf.keras.utils.timeseries_dataset_from_array(
    my_series,
    targets=my_series[3:],  # the targets are 3 steps into the future
    sequence_length=3,
    batch_size=2
)
list(my_dataset)

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[22], line 1
----> 1 import tensorflow as tf
      2 from tensorflow.keras.models import Model, load_model
      4 my_series = [0, 1, 2, 3, 4, 5]

ModuleNotFoundError: No module named 'tensorflow'

my_dataset

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 my_dataset

NameError: name 'my_dataset' is not defined

for window_dataset in tf.data.Dataset.range(6).window(4, shift=1):
    for element in window_dataset:
        print(f"{element}", end=" ")
    print()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 for window_dataset in tf.data.Dataset.range(6).window(4, shift=1):
      2     for element in window_dataset:
      3         print(f"{element}", end=" ")

NameError: name 'tf' is not defined

dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window_dataset: window_dataset.batch(4))
for window_tensor in dataset:
    print(f"{window_tensor}")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[25], line 1
----> 1 dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)
      2 dataset = dataset.flat_map(lambda window_dataset: window_dataset.batch(4))
      3 for window_tensor in dataset:

NameError: name 'tf' is not defined

def to_windows(dataset, length):
    dataset = dataset.window(length, shift=1, drop_remainder=True)
    return dataset.flat_map(lambda window_ds: window_ds.batch(length))

Split data into training, validation, and test data#

In addition to setting up the sequence data sets for training the model, we need to designate part of our time series for training, part for validation, and part for testing. We will use 1980 - 1995 for training, 1995 - 2000 for validation, and 2000 to 2010 as our independent test data set.

trainmask = (df.index >="1980-10-01") & (df.index <="1995-09-30")
valmask = (df.index >="1995-10-01") & (df.index <="2000-09-30")
testmask = (df.index >="2000-10-01") & (df.index <="2010-09-30")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 trainmask = (df.index >="1980-10-01") & (df.index <="1995-09-30")
      2 valmask = (df.index >="1995-10-01") & (df.index <="2000-09-30")
      3 testmask = (df.index >="2000-10-01") & (df.index <="2010-09-30")

NameError: name 'df' is not defined

trainidx = np.where(trainmask)[0]
validx = np.where(valmask)[0]
testidx = np.where(testmask)[0]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[28], line 1
----> 1 trainidx = np.where(trainmask)[0]
      2 validx = np.where(valmask)[0]
      3 testidx = np.where(testmask)[0]

NameError: name 'np' is not defined

plt.plot(scaled_targets,color="k")
plt.plot(trainidx,scaled_targets[trainidx],color="g",label="train")
plt.plot(validx,scaled_targets[validx],color="r",label="val")
plt.plot(testidx,scaled_targets[testidx],color="b",label="test")
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[29], line 1
----> 1 plt.plot(scaled_targets,color="k")
      2 plt.plot(trainidx,scaled_targets[trainidx],color="g",label="train")
      3 plt.plot(validx,scaled_targets[validx],color="r",label="val")

NameError: name 'scaled_targets' is not defined

We will use an entire year of meteorological data as input to predict the next time step.

sequence_length = 365 # Length of the meteorological record provided to the network

tf.random.set_seed(42)  # ensures reproducibility

train_ds = tf.keras.utils.timeseries_dataset_from_array(
    scaled_features[trainidx],
    targets=scaled_targets[trainidx][sequence_length - 1:],
    sequence_length=sequence_length,
    batch_size=256,
    shuffle=True,
    seed=42
)

valid_ds = tf.keras.utils.timeseries_dataset_from_array(
    scaled_features[validx],
    targets=scaled_targets[validx][sequence_length - 1:],
    sequence_length=sequence_length,
    batch_size=2048
)
test_ds = tf.keras.utils.timeseries_dataset_from_array(
    scaled_features[testidx],
    targets=scaled_targets[testidx][sequence_length - 1:],
    sequence_length=sequence_length,
    batch_size=len(testidx)
)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[31], line 1
----> 1 tf.random.set_seed(42)  # ensures reproducibility
      3 train_ds = tf.keras.utils.timeseries_dataset_from_array(
      4     scaled_features[trainidx],
      5     targets=scaled_targets[trainidx][sequence_length - 1:],
   (...)
      9     seed=42
     10 )
     12 valid_ds = tf.keras.utils.timeseries_dataset_from_array(
     13     scaled_features[validx],
     14     targets=scaled_targets[validx][sequence_length - 1:],
     15     sequence_length=sequence_length,
     16     batch_size=2048
     17 )

NameError: name 'tf' is not defined

for x, y in train_ds.take(1):
    print("Input shape:", x.shape) 
    print("Target shape:", y.shape) 

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[32], line 1
----> 1 for x, y in train_ds.take(1):
      2     print("Input shape:", x.shape) 
      3     print("Target shape:", y.shape)

NameError: name 'train_ds' is not defined

Train a Simple RNN#

We’ll first try training an RNN model.

import os

cwd = os.getcwd()

model_path = os.path.join(cwd,'saved_model')

# set some hyperparameters
n_hidden = 10
patience = 20
epochs = 100
learning_rate = 1e-3

This code creates the RNN model using tf.keras.Sequential.

tf.random.set_seed(42)  # ensures reproducibility
rnn_model = tf.keras.Sequential([
    tf.keras.layers.SimpleRNN(n_hidden, input_shape=[None, 5]),
    tf.keras.layers.Dense(1)
])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[35], line 1
----> 1 tf.random.set_seed(42)  # ensures reproducibility
      2 rnn_model = tf.keras.Sequential([
      3     tf.keras.layers.SimpleRNN(n_hidden, input_shape=[None, 5]),
      4     tf.keras.layers.Dense(1)
      5 ])

NameError: name 'tf' is not defined

rnn_model.summary()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[36], line 1
----> 1 rnn_model.summary()

NameError: name 'rnn_model' is not defined

We’ll include a custom metric, the Nash Sutcliffe Efficiency, which is a widely used metric in hydrology for assessing how well a model predicts the observed data.

class NashSutcliffeEfficiency(tf.keras.metrics.Metric):
    def __init__(self, name='nse', scaler=None, **kwargs):
        super().__init__(name=name, **kwargs)
        self.sse = self.add_weight(name='sse', initializer='zeros')
        self.sst = self.add_weight(name='sst', initializer='zeros')
        self.scaler = scaler

    def update_state(self, y_true, y_pred, sample_weight=None):
        if self.scaler is not None:
            u = self.scaler.mean_
            s = self.scaler.var_
            y_true = y_true*s+u
            y_pred = y_pred*s+u

        y_true = tf.cast(y_true, tf.float32)
        y_pred = tf.cast(y_pred, tf.float32)
        sse = tf.reduce_sum(tf.square(y_true - y_pred))
        sst = tf.reduce_sum(tf.square(y_true - tf.reduce_mean(y_true)))
        self.sse.assign_add(sse)
        self.sst.assign_add(sst)

    def result(self):
        return 1.0 - self.sse / self.sst

    def reset_states(self):
        self.sse.assign(0.0)
        self.sst.assign(0.0)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[37], line 1
----> 1 class NashSutcliffeEfficiency(tf.keras.metrics.Metric):
      2     def __init__(self, name='nse', scaler=None, **kwargs):
      3         super().__init__(name=name, **kwargs)

NameError: name 'tf' is not defined

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[38], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
      2     monitor="val_loss", patience=patience, restore_best_weights=True)
      3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

NameError: name 'tf' is not defined

rnn_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history = rnn_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
rnn_model.save(os.path.join(model_path,'RNN_timeseries_model.keras'))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[39], line 1
----> 1 rnn_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
      2 history = rnn_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
      3 rnn_model.save(os.path.join(model_path,'RNN_timeseries_model.keras'))

NameError: name 'rnn_model' is not defined

valid_loss, valid_mae = rnn_model.evaluate(valid_ds)
valid_mae

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[40], line 1
----> 1 valid_loss, valid_mae = rnn_model.evaluate(valid_ds)
      2 valid_mae

NameError: name 'rnn_model' is not defined

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss')
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[41], line 4
      2 plt.xlabel('Epoch')
      3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss')
      5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss')
      6 plt.legend()

NameError: name 'history' is not defined

../../_images/cedc057ada3bb0e7f6b398a3e163b322d5ef0ba06cea3ae7f1975c79a4d3eba7.png

out = rnn_model.predict(test_ds)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[42], line 1
----> 1 out = rnn_model.predict(test_ds)

NameError: name 'rnn_model' is not defined

out.shape

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[43], line 1
----> 1 out.shape

NameError: name 'out' is not defined

for x, y in test_ds.take(1):
    yvals = y.numpy()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[44], line 1
----> 1 for x, y in test_ds.take(1):
      2     yvals = y.numpy()

NameError: name 'test_ds' is not defined

plt.plot(yvals,label="true")
plt.plot(out[:,0],label="prediction")
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[45], line 1
----> 1 plt.plot(yvals,label="true")
      2 plt.plot(out[:,0],label="prediction")
      3 plt.legend()

NameError: name 'yvals' is not defined

Train an LSTM model#

from tensorflow.keras.initializers import Orthogonal

dropout_rate = 0.0
tf.random.set_seed(42)  # extra code – ensures reproducibility
lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(n_hidden, input_shape=[None, 5], return_sequences=False),
    tf.keras.layers.Dropout(dropout_rate),
    tf.keras.layers.Dense(1)
])

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[46], line 1
----> 1 from tensorflow.keras.initializers import Orthogonal
      3 dropout_rate = 0.0
      4 tf.random.set_seed(42)  # extra code – ensures reproducibility

ModuleNotFoundError: No module named 'tensorflow'

lstm_model.summary()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[47], line 1
----> 1 lstm_model.summary()

NameError: name 'lstm_model' is not defined

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[48], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
      2     monitor="val_loss", patience=patience, restore_best_weights=True)
      3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

NameError: name 'tf' is not defined

lstm_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history_lstm = lstm_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
lstm_model.save(os.path.join(model_path,'LSTM_timeseries_model.keras'))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[49], line 1
----> 1 lstm_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
      2 history_lstm = lstm_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
      3 lstm_model.save(os.path.join(model_path,'LSTM_timeseries_model.keras'))

NameError: name 'lstm_model' is not defined

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['val_loss']),label = 'Val loss - LSTM')
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[50], line 4
      2 plt.xlabel('Epoch')
      3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
      5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
      6 plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')

NameError: name 'history' is not defined

out_lstm = lstm_model.predict(test_ds)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[51], line 1
----> 1 out_lstm = lstm_model.predict(test_ds)

NameError: name 'lstm_model' is not defined

plt.plot(yvals,label="true")
plt.plot(out_lstm[:,0],label="prediction - LSTM")
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[52], line 1
----> 1 plt.plot(yvals,label="true")
      2 plt.plot(out_lstm[:,0],label="prediction - LSTM")
      3 plt.legend()

NameError: name 'yvals' is not defined

Train a GRU model#

tf.random.set_seed(42)  # ensures reproducibility
gru_model = tf.keras.Sequential([
    tf.keras.layers.GRU(n_hidden, return_sequences=False, input_shape=[None, 5]),
    tf.keras.layers.Dense(1)
])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[53], line 1
----> 1 tf.random.set_seed(42)  # ensures reproducibility
      2 gru_model = tf.keras.Sequential([
      3     tf.keras.layers.GRU(n_hidden, return_sequences=False, input_shape=[None, 5]),
      4     tf.keras.layers.Dense(1)
      5 ])

NameError: name 'tf' is not defined

gru_model.summary()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[54], line 1
----> 1 gru_model.summary()

NameError: name 'gru_model' is not defined

early_stopping_cb = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", patience=patience, restore_best_weights=True)
opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[55], line 1
----> 1 early_stopping_cb = tf.keras.callbacks.EarlyStopping(
      2     monitor="val_loss", patience=patience, restore_best_weights=True)
      3 opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)

NameError: name 'tf' is not defined

gru_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
history_gru = gru_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
gru_model.save(os.path.join(model_path,'GRU_timeseries_model.keras'))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[56], line 1
----> 1 gru_model.compile(loss='mse', optimizer=opt, metrics=[NashSutcliffeEfficiency(scaler=target_scaler)])
      2 history_gru = gru_model.fit(train_ds, validation_data=valid_ds, epochs=epochs,callbacks=[early_stopping_cb])
      3 gru_model.save(os.path.join(model_path,'GRU_timeseries_model.keras'))

NameError: name 'gru_model' is not defined

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('Mean squared error')
plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')
plt.plot(history_lstm.epoch, np.array(history_lstm.history['val_loss']),label = 'Val loss - LSTM')
plt.plot(history_gru.epoch, np.array(history_gru.history['loss']),label='Train Loss - GRU')
plt.plot(history_gru.epoch, np.array(history_gru.history['val_loss']),label = 'Val loss - GRU')
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[57], line 4
      2 plt.xlabel('Epoch')
      3 plt.ylabel('Mean squared error')
----> 4 plt.plot(history.epoch, np.array(history.history['loss']),label='Train Loss - RNN')
      5 plt.plot(history.epoch, np.array(history.history['val_loss']),label = 'Val loss - RNN')
      6 plt.plot(history_lstm.epoch, np.array(history_lstm.history['loss']),label='Train Loss - LSTM')

NameError: name 'history' is not defined

out_gru = gru_model.predict(test_ds)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[58], line 1
----> 1 out_gru = gru_model.predict(test_ds)

NameError: name 'gru_model' is not defined

plt.plot(yvals,label="true")

plt.plot(out_lstm[:,0],label="prediction - LSTM")
plt.plot(out_gru[:,0],label="prediction - GRU")
plt.plot(out[:,0],label="prediction - RNN")
plt.legend()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[59], line 1
----> 1 plt.plot(yvals,label="true")
      3 plt.plot(out_lstm[:,0],label="prediction - LSTM")
      4 plt.plot(out_gru[:,0],label="prediction - GRU")

NameError: name 'yvals' is not defined