Assignment 4: Flood Risk Prediction with Neural Network Regression#
For this assignment you will develop a machine learning model to predict the probability of a flood given both environmental and social factors. The data set we will use for this assignment comes from the Kaggle Flood Prediction Dataset.
# To facilitate downloading data from Kaggle, we can install this python package
!pip install kagglehub
Requirement already satisfied: kagglehub in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (0.2.9)
Requirement already satisfied: packaging in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from kagglehub) (24.1)
Requirement already satisfied: requests in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from kagglehub) (2.32.3)
Requirement already satisfied: tqdm in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from kagglehub) (4.66.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from requests->kagglehub) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from requests->kagglehub) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from requests->kagglehub) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/envs/ML4Climate2025/lib/python3.8/site-packages (from requests->kagglehub) (2024.8.30)
import kagglehub
# Download latest version
path = kagglehub.dataset_download("naiyakhalid/flood-prediction-dataset")
print("Path to dataset files:", path)
Warning: Looks like you're using an outdated `kagglehub` version, please consider updating (latest version: 0.3.12)
Path to dataset files: /Users/karalamb/.cache/kagglehub/datasets/naiyakhalid/flood-prediction-dataset/versions/1
import os
os.listdir(path)
['flood.csv']
Part 1: Load and Preprocess the Flood Data Set#
Load in the flood data set using
pandas
Make a histogram plot of the different numerical variables in the dataset
Create the feature matrix and the targets vector. The target will be the flood probability, and the predictors will be all of the other variables in the data frame. Put the flood probability into a
numpy
array calledy
and the other variables intonumpy
array calledX
.
Part 2: Preprocessing#
Use the
StandardScaler
method to scale the numerical variables in theX
andy
arrays.
Part 3: Training, validation, and test splits#
Split the data into training, validation, and test data sets, with 80% of the data used for training, and 10% each for validation and testing.
Part 4: Train a Neural Network#
Using
tensorflow
, create a fully connected neural network that takes as input a feature matrix of size 20, and has 3 dense layers with 100 neurons andReLU
activation and a final dense layer (with no activation) to get to a single output value.
Print off a summary of the model. How many total trainable parameters does it have?
Compile the model with MSE loss and use the
Adam
optimizer. As a metric, include the Root Mean Squared Error.
Train and Evaluate the Model#
Train the model for 30 epochs on the training data set you created earlier, using the validation data set to validate the model.
Make a plot of the training and validation loss vs. epoch number.
Save the trained model.
Evaluate the model on the test data set (i.e. print off the MSE loss and the Root Mean Squared Error).
Get the model predictions for the test data set.
Unscale both
y_test
and the model predictions for the flood probabilities by using the inverse transformation for the standard scalar.
Make a scatter plot of the predicted flood probabilities compared with the true flood probabilities for the test data set.