Time Series Forecasting: LSTM-based model for metro traffic prediction (PyTorch)

Maria Zorkaltseva··19 min read

This tutorial contains brief overview of statistical and machine learning methods for time series forecasting, experiments and comparative analysis of Long short-term memory (LSTM) based architectures for solving above mentioned problem. Single layer, two layer and bidirectional single layer LSTM cases are considered. Metro Interstate Traffic Volume Data Set from UCI Machine Learning Repository and PyTorch deep learning framework are used for analysis.

• • •

Time series forecasting methods overview

A time series is a set of observations, each one is being recorded at the specific time tt. It can be weather observations, for example, a records of the temperature for a month, it can be observations of currency quotes during the day or any other process aggregated by time. Time series forecasting can be determened as the act of predicting the future by understanding the past. The model for forecasting can rely on one variable and this is a univariate case or when more than one variable taken into consideration it will be multivariate case.

Stochastic Linear Models:

  • Autoregressive (AR);
  • Moving Average (MA);
  • Autoregressive Moving Average (ARMA);
  • Autoregressive Integrated Moving Average (ARIMA);
  • Seasonal ARIMA (SARIMA).

For above family of models, the stationarity condition must be satisfied. Loosely speaking, a stochastic process is stationary, if its statistical properties do not change with time.

Stochastic Non-Linear Models:

  • nonlinear autoregressive exogenous models (NARX);
  • autoregressive conditional heteroskedasticity (ARCH);
  • generalized autoregressive conditional heteroskedasticity (GARCH).

Machine Learning Models

  • Hidden Markov Model;
  • Least-square SVM (LS-SVM);
  • Dynamic Least-square SVM (DLS-SVM);
  • Feed Forward Network (FNN);
  • Time Lagged Neural Network (TLNN);
  • Seasonal Artificial Neural Network (SANN);
  • Recurrent Neural Networks (RNN).

Dataset

We will use Metro Interstate Traffic Volume Data Set from UC Irvine Machine Learning Repository, which contains large number of datasets for various tasks in machine learning. We will investigate how weather and holiday features influence the metro traffic in US.

Attribute Information:

holiday — Categorical US National holidays plus regional holiday, Minnesota State;
temp — Numeric, average temp in kelvin;
rain_1h — Numeric, amount in mm of rain that occurred in the hour;
snow_1h — Numeric, amount in mm of snow that occurred in the hour;
clouds_all — Numeric, percentage of cloud cover;
weather_main — Categorical, short textual description of the current weather;
weather_description — Categorical, longer textual description of the current weather;
date_time — DateTime, hour of the data collected in local CST time;
traffic_volume — Numeric, hourly I-94 ATR 301 reported westbound traffic volume.

Our target variable will be traffic_volume. Here we will consider multivariate multi-step prediction case with LSTM-based recurrent neural network architecture.

Metro Traffic Prediction using LSTM-based recurrent neural network

Open In Colab

I used Google Colab Notebooks to calculate experiments. Here, for convenience, I mounted my Google Drive where I stored the files.

In [1]:python
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
Out [1]:
Mounted at /content/drive
In [2]:python
import os
import copy
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
from torch.autograd import Variable

import warnings
warnings.filterwarnings('ignore')
In [3]:python
random.seed(0)
np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed(0)
torch.backends.cudnn.deterministic = True
In [4]:python
data = pd.read_csv(
  '/content/drive/My Drive/time series prediction/Metro_Interstate_Traffic_Volume.csv',
  parse_dates=True
)

Exploratory Data Analysis (EDA) and Scaling

In [5]:python
data.head()
Out [6]:
holiday temp rain_1h snow_1h clouds_all weather_main weather_description date_time traffic_volume
0 None 288.28 0.0 0.0 40 Clouds scattered clouds 2012-10-02 09:00:00 5545
1 None 289.36 0.0 0.0 75 Clouds broken clouds 2012-10-02 10:00:00 4516
2 None 289.58 0.0 0.0 90 Clouds overcast clouds 2012-10-02 11:00:00 4767
3 None 290.13 0.0 0.0 90 Clouds overcast clouds 2012-10-02 12:00:00 5026
4 None 291.14 0.0 0.0 75 Clouds broken clouds 2012-10-02 13:00:00 4918

Сategorical features: holiday, weather_main, weather_description.
Continious features: temp, rain_1h, show_1h, clouds_all.
Target variable: traffic_volume

In [7]:python
data.info()
Out [7]:
  <class 'pandas.core.frame.DataFrame'>
  RangeIndex: 48204 entries, 0 to 48203
  Data columns (total 9 columns):
   #   Column               Non-Null Count  Dtype
  ---  ------               --------------  -----
   0   holiday              48204 non-null  object
   1   temp                 48204 non-null  float64
   2   rain_1h              48204 non-null  float64
   3   snow_1h              48204 non-null  float64
   4   clouds_all           48204 non-null  int64
   5   weather_main         48204 non-null  object
   6   weather_description  48204 non-null  object
   7   date_time            48204 non-null  object
   8   traffic_volume       48204 non-null  int64
  dtypes: float64(3), int64(2), object(4)
  memory usage: 3.3+ MB

Checking for Nan values

In [8]:python
for column in data.columns:
  print(column, sum(data[column].isna()))
Out [8]:
  holiday 0
  temp 0
  rain_1h 0
  snow_1h 0
  clouds_all 0
  weather_main 0
  weather_description 0
  date_time 0
  traffic_volume 0

Here I take into consideration two categorical variables, such as holiday and weather_main and then one-hot encode them.

In [9]:python
data = pd.get_dummies(data, columns = ['holiday', 'weather_main'], drop_first=True)
In [10]:python
data.head()
Out [10]:
temp rain_1h snow_1h clouds_all weather_description date_time traffic_volume holiday_Columbus Day holiday_Independence Day holiday_Labor Day holiday_Martin Luther King Jr Day holiday_Memorial Day holiday_New Years Day holiday_None holiday_State Fair holiday_Thanksgiving Day holiday_Veterans Day holiday_Washingtons Birthday weather_main_Clouds weather_main_Drizzle weather_main_Fog weather_main_Haze weather_main_Mist weather_main_Rain weather_main_Smoke weather_main_Snow weather_main_Squall weather_main_Thunderstorm
0 288.28 0.0 0.0 40 scattered clouds 2012-10-02 09:00:00 5545 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
1 289.36 0.0 0.0 75 broken clouds 2012-10-02 10:00:00 4516 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
2 289.58 0.0 0.0 90 overcast clouds 2012-10-02 11:00:00 4767 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 290.13 0.0 0.0 90 overcast clouds 2012-10-02 12:00:00 5026 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
4 291.14 0.0 0.0 75 broken clouds 2012-10-02 13:00:00 4918 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
In [11]:python
data.shape
Out [11]:

    (48204, 28)

Here we can see outliers in temp variable, lets filter outliers with Interquartile range (IQR) method

In [12]:python
data['temp'].plot()
plt.show()
data['traffic_volume'].plot()
plt.show()
Out [12]:
Temperature plotTraffic volume plot
In [13]:python
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1
print(IQR)
Out [13]:
temp                                   19.646
rain_1h                                 0.000
snow_1h                                 0.000
clouds_all                             89.000
traffic_volume                       3740.000
holiday_Columbus Day                    0.000
holiday_Independence Day                0.000
holiday_Labor Day                       0.000
holiday_Martin Luther King Jr Day       0.000
holiday_Memorial Day                    0.000
holiday_New Years Day                   0.000
holiday_None                            0.000
holiday_State Fair                      0.000
holiday_Thanksgiving Day                0.000
holiday_Veterans Day                    0.000
holiday_Washingtons Birthday            0.000
weather_main_Clouds                     1.000
weather_main_Drizzle                    0.000
weather_main_Fog                        0.000
weather_main_Haze                       0.000
weather_main_Mist                       0.000
weather_main_Rain                       0.000
weather_main_Smoke                      0.000
weather_main_Snow                       0.000
weather_main_Squall                     0.000
weather_main_Thunderstorm               0.000
dtype: float64
In [14]:python
data = data[~((data['temp'] < (Q1['temp'] - 1.5 * IQR['temp'])) |(data['temp'] > (Q3['temp'] + 1.5 * IQR['temp'])))]
In [15]:python
data['temp'].plot()
plt.show()
data['traffic_volume'].plot()
plt.show()
Out [15]:
Temperature plot after outlier removalTraffic volume plot after outlier removal
In [16]:python
data.drop(columns=['rain_1h', 'snow_1h', 'weather_description', 'date_time'], inplace=True)

Normalizing the features

In [17]:python
features_to_norm = ['temp', 'clouds_all', 'traffic_volume']
In [18]:python
TRAIN_SPLIT = 40000
STEP = 6

past_history = 720
future_target = 32
target_index = 2

tmp = data[features_to_norm].values
data_mean = tmp[:TRAIN_SPLIT].mean(axis=0)
data_std = tmp[:TRAIN_SPLIT].std(axis=0)
In [19]:python
data[features_to_norm] = (data[features_to_norm]-data_mean)/data_std
In [20]:python
data
Out [20]:
Indextempclouds_alltraffic_volumeholiday_Columbus Day...
00.581321-0.2616921.1453830...
10.6687890.6387490.6287290...
..................
482030.0824321.024652-1.1597300...

48194 rows × 24 columns

Preparing training dataset and Visualizations

Here we will consider multiple future points prediction case given a past history.

In [21]:python
def multivariate_data(dataset, target, start_index, end_index, history_size,
                    target_size, step, single_step=False):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
      end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
      indices = range(i-history_size, i, step)
      data.append(dataset[indices])

      if single_step:
          labels.append(target[i+target_size])
      else:
          labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)
In [22]:python
trainX, trainY = multivariate_data(data.values, data.iloc[:, target_index].values, 0,
                                 TRAIN_SPLIT, past_history,
                                 future_target, STEP)

valX, valY = multivariate_data(data.values, data.iloc[:, target_index].values,
                             TRAIN_SPLIT, None, past_history,
                             future_target, STEP)
In [23]:python
testX, testY = valX[:len(valX)//4], valY[:len(valY)//4]
valX, valY = valX[len(valX)//4:], valY[len(valY)//4:]
In [24]:python
print('Train input features shape : {}'.format(trainX.shape))
print('\nTrain output shape : {}'.format(trainY.shape))
print('\nValidation input features shape : {}'.format(valX.shape))
print('\nValidation output shape : {}'.format(valY.shape))
print('\nTest input features shape : {}'.format(testX.shape))
print('\nTest output shape : {}'.format(testY.shape))
Out [24]:
Train input features shape : (39280, 120, 24)

Train output shape : (39280, 32)

Validation input features shape : (5582, 120, 24)

Validation output shape : (5582, 32)

Test input features shape : (1860, 120, 24)

Test output shape : (1860, 32)
In [25]:python
def create_time_steps(length):
  return list(range(-length, 0))

def multi_step_plot(history, true_future, prediction, pic_name):
  plt.figure(figsize=(12, 6))
  num_in = create_time_steps(len(history))
  num_out = len(true_future)

  plt.plot(num_in, np.array(history), label='History', linewidth=2.5)
  plt.plot(np.arange(num_out)/STEP, np.array(true_future), 'b', label='True Future', linewidth=2.5)
  if prediction.any():
      plt.plot(np.arange(num_out)/STEP, np.array(prediction), 'ro', label='Predicted Future', linewidth=2.5)
  plt.legend(loc='upper left')
  plt.title(pic_name)
  plt.show()

multi_step_plot(trainX[0,:,target_index], trainY[0,:], np.array([0]), 'Train case')
multi_step_plot(valX[0,:,target_index], valY[0,:], np.array([0]), 'Val case')
multi_step_plot(testX[0,:,target_index], testY[0,:], np.array([0]), 'Test case')
Out [25]:
Train caseVal caseTest case

LSTM Time Series Predictor Model

For more efficient storing of long sequences, here I chose the LSTM architecture. LSTM layer consists from cells which is shown below. Two vectors are inputs to the LSTM-cell: the new input vector from the data xtx_t and the vector of the hidden state ht1h_{t-1}, which is obtained from the hidden state of this cell in the previous time step.

LSTM Cell Architecture

Where:

  • iti_t - input gate
  • ftf_t - forget gate
  • oto_t - output gate
  • Ct~\tilde{C_t} - new candidate cell state
  • CtC_t - cell state
  • hth_t - block output
In [26]:python
class LSTMTimeSeriesPredictor(nn.Module):

  def __init__(self,
               num_classes,
               n_features,
               hidden_size,
               num_layers,
               bidirectional=False,
               dp_rate=0.5):

      super(LSTMTimeSeriesPredictor, self).__init__()

      self.num_layers = num_layers
      self.hidden_size = hidden_size
      self.bidirectional = bidirectional

      self.lstm = nn.LSTM(input_size=n_features,
                          hidden_size=hidden_size,
                          num_layers=num_layers,
                          dropout=dp_rate,
                          batch_first=True,
                          bidirectional=self.bidirectional)

      self.fc = nn.Linear(in_features=hidden_size, out_features=num_classes)

  def forward(self, x):
     # dim(x) = (batch, seq_len, input_size)
     # dim(h_0) = (num_layers *num_directions, batch, hidden_size)
      if self.bidirectional:
          h_0 = Variable(torch.zeros(self.num_layers*2, x.size(0), self.hidden_size))
          c_0 = Variable(torch.zeros(self.num_layers*2, x.size(0), self.hidden_size))
      else:
          h_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))
          c_0 = Variable(torch.zeros(self.num_layers, x.size(0), self.hidden_size))

      o, (h_out,_) = self.lstm(x, (h_0, c_0))

      if self.bidirectional:
          h_out = h_out.view(self.num_layers, 2, x.size(0), self.hidden_size)
          # taking last hidden state
          # dim(h_out) = (num_directions, batch, hidden_size)
          h_out = h_out[-1]

      # in bidectional case we sum vectors over num_directions
      # in the case of multi-layer LSTM we sum over num_layers direction
      # dim(h_out) = (batch, hidden_size)
      h_out = h_out.sum(dim=0)

      # dim(out) = (batch, num_classes)
      out = self.fc(h_out)
      return out
In [27]:python
trainX = torch.from_numpy(trainX).float()
trainY = torch.from_numpy(trainY).float()
valX = torch.from_numpy(valX).float()
valY = torch.from_numpy(valY).float()
testX = torch.from_numpy(testX).float()
testY = torch.from_numpy(testY).float()
In [28]:python
train_data = torch.utils.data.TensorDataset(trainX, trainY)
val_data = torch.utils.data.TensorDataset(valX, valY)
test_data =  torch.utils.data.TensorDataset(testX, testY)

Train and Evaluate Helping Functions

In [29]:python
def train_model(model, train_data, loss_criterion, optimizer, batch_size):

  running_loss = 0.
  model.train()

  train_loader = torch.utils.data.DataLoader(train_data,
                                            batch_size=batch_size,
                                            shuffle=True,
                                            num_workers=0)

  for batch_x, batch_y in train_loader:
      optimizer.zero_grad()

      outputs = model(batch_x)

      loss = loss_criterion(outputs, batch_y)

      loss.backward()

      optimizer.step()

      running_loss += loss.item()

  epoch_loss = running_loss / (len(train_data) // batch_size)

  print("Epoch: %d, train loss: %1.5f" % (epoch+1, epoch_loss))

  return model, epoch_loss
In [30]:python
def evaluate_model(model, val_data, loss_criterion, optimizer, batch_size):
  running_loss = 0.
  model.eval()

  val_loader = torch.utils.data.DataLoader(val_data,
                                           batch_size=batch_size,
                                           shuffle=True,
                                           num_workers=0)

  for batch_x, batch_y in val_loader:

      optimizer.zero_grad()

      with torch.set_grad_enabled(False):
          outputs = model(batch_x)
          loss = loss_criterion(outputs, batch_y)

      running_loss += loss.item()

  epoch_loss = running_loss / (len(val_data) // batch_size)

  print("Epoch: %d, val loss: %1.5f" % (epoch+1, epoch_loss))

  return model, epoch_loss, best_model_wts

1-Layer LSTM model

In [31]:python
learning_rate = 0.001
input_size = trainX.shape[2] # number of input features, multivariate case
hidden_size = 20
num_layers = 1
num_classes = trainY.shape[1] # future time window length
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers, bidirectional=False)

config = {'batch_size': 128, 'num_epochs': 10, 'checkpoints_dir': '/content/drive/My Drive/time series prediction/', 'model_filename': '1-layer-lstm-best_model.pth.tar'}

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(lstm_model.parameters(), lr=learning_rate)

train_history, val_history = [], []

best_model_wts = copy.deepcopy(lstm_model.state_dict())

best_loss = 10e10  # for validation phase

if not os.path.exists(config['checkpoints_dir']):
  os.mkdir(config['checkpoints_dir'])

for epoch in range(config['num_epochs']):
  print("="*20 + str(epoch+1) + "="*20)
  _, train_loss = train_model(lstm_model, train_data, criterion, optimizer, config['batch_size'])
  train_history.append(train_loss)

  _, val_loss, best_model_wts = evaluate_model(lstm_model, val_data, criterion, optimizer, config['batch_size'])
  val_history.append(val_loss)

  if val_loss < best_loss:
      best_loss = val_loss
      print("Saving model for best loss")
      checkpoint = {
          'state_dict': best_model_wts
      }
      torch.save(checkpoint, config['checkpoints_dir'] + config['model_filename'])
      best_model_wts = copy.deepcopy(lstm_model.state_dict())
Out [31]:
====================1====================
Epoch: 1, train loss: 0.90311
Epoch: 1, val loss: 0.78379
Saving model for best loss
====================2====================
Epoch: 2, train loss: 0.77797
Epoch: 2, val loss: 0.73462
Saving model for best loss
====================3====================
Epoch: 3, train loss: 0.74900
Epoch: 3, val loss: 0.70943
Saving model for best loss
====================4====================
Epoch: 4, train loss: 0.73103
Epoch: 4, val loss: 0.69784
Saving model for best loss
====================5====================
Epoch: 5, train loss: 0.71739
Epoch: 5, val loss: 0.68612
Saving model for best loss
====================6====================
Epoch: 6, train loss: 0.70665
Epoch: 6, val loss: 0.67945
Saving model for best loss
====================7====================
Epoch: 7, train loss: 0.69868
Epoch: 7, val loss: 0.67938
Saving model for best loss
====================8====================
Epoch: 8, train loss: 0.69284
Epoch: 8, val loss: 0.67938
====================9====================
Epoch: 9, train loss: 0.68811
Epoch: 9, val loss: 0.66802
Saving model for best loss
====================10====================
Epoch: 10, train loss: 0.68397
Epoch: 10, val loss: 0.67317
In [32]:python
plt.plot(np.arange(config['num_epochs']), train_history, label='Train loss')
plt.plot(np.arange(config['num_epochs']), val_history, label='Val loss')
plt.xlabel("num of epochs")
plt.ylabel("train loss")
plt.show()
Out [32]:
Training and validation loss

Test

In [33]:python
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers)
lstm_model.load_state_dict(torch.load(os.path.join(config['checkpoints_dir'], config['model_filename']))['state_dict'])
lstm_model.eval()
for param in lstm_model.parameters():
  param.requires_grad = False
In [34]:python
i = 0
while i < len(testX):
  test_x = testX[i].unsqueeze(0)
  test_y = testY[i].unsqueeze(0)
  multi_step_plot(test_x.numpy()[0, :, target_index], test_y.detach().numpy()[0, :], lstm_model(test_x).detach().numpy()[0, :], 'Test case')
  i += 100
Out [34]:
Test case 0Test case 1Test case 2Test case 3Test case 4Test case 5Test case 6Test case 7Test case 8Test case 9Test case 10Test case 11Test case 12Test case 13Test case 14Test case 15Test case 16Test case 17Test case 18

2-Layer LSTM

In [35]:python
learning_rate = 0.001
input_size = trainX.shape[2] # number of input features, multivariate case
hidden_size = 20
num_layers = 2
num_classes = trainY.shape[1] # future time window length
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers, bidirectional=False)

config = {'batch_size': 128, 'num_epochs': 10, 'checkpoints_dir': '/content/drive/My Drive/time series prediction/', 'model_filename': '2-layer-lstm-best_model.pth.tar'}

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(lstm_model.parameters(), lr=learning_rate)

train_history, val_history = [], []

best_model_wts = copy.deepcopy(lstm_model.state_dict())

best_loss = 10e10

if not os.path.exists(config['checkpoints_dir']):
  os.mkdir(config['checkpoints_dir'])

for epoch in range(config['num_epochs']):
  print("="*20 + str(epoch+1) + "="*20)
  _, train_loss = train_model(lstm_model, train_data, criterion, optimizer, config['batch_size'])
  train_history.append(train_loss)

  _, val_loss, best_model_wts = evaluate_model(lstm_model, val_data, criterion, optimizer, config['batch_size'])
  val_history.append(val_loss)

  if val_loss < best_loss:
      best_loss = val_loss
      print("Saving model for best loss")
      checkpoint = {
          'state_dict': best_model_wts
      }
      torch.save(checkpoint, config['checkpoints_dir'] + config['model_filename'])
      best_model_wts = copy.deepcopy(lstm_model.state_dict())
Out [35]:
====================1====================
Epoch: 1, train loss: 0.89874
Epoch: 1, val loss: 0.77490
Saving model for best loss
====================2====================
Epoch: 2, train loss: 0.77985
Epoch: 2, val loss: 0.73741
Saving model for best loss
====================3====================
Epoch: 3, train loss: 0.75368
Epoch: 3, val loss: 0.71219
Saving model for best loss
====================4====================
Epoch: 4, train loss: 0.73283
Epoch: 4, val loss: 0.69299
Saving model for best loss
====================5====================
Epoch: 5, train loss: 0.71830
Epoch: 5, val loss: 0.68409
Saving model for best loss
====================6====================
Epoch: 6, train loss: 0.70771
Epoch: 6, val loss: 0.67732
Saving model for best loss
====================7====================
Epoch: 7, train loss: 0.69788
Epoch: 7, val loss: 0.67322
Saving model for best loss
====================8====================
Epoch: 8, train loss: 0.69050
Epoch: 8, val loss: 0.67194
Saving model for best loss
====================9====================
Epoch: 9, train loss: 0.68391
Epoch: 9, val loss: 0.66734
Saving model for best loss
====================10====================
Epoch: 10, train loss: 0.67748
Epoch: 10, val loss: 0.66578
Saving model for best loss
In [36]:python
plt.plot(np.arange(config['num_epochs']), train_history, label='Train loss')
plt.plot(np.arange(config['num_epochs']), val_history, label='Val loss')
plt.xlabel("num of epochs")
plt.ylabel("train loss")
plt.show()
Out [36]:
2-Layer LSTM training and validation loss

Test

In [37]:python
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers)
lstm_model.load_state_dict(torch.load(os.path.join(config['checkpoints_dir'], config['model_filename']))['state_dict'])
lstm_model.eval()
for param in lstm_model.parameters():
  param.requires_grad = False
In [38]:python
i = 0
while i < len(testX):
  test_x = testX[i].unsqueeze(0)
  test_y = testY[i].unsqueeze(0)
  multi_step_plot(test_x.numpy()[0, :, target_index], test_y.detach().numpy()[0, :], lstm_model(test_x).detach().numpy()[0, :], 'Test case')
  i += 100
Out [38]:
2-Layer LSTM test case 02-Layer LSTM test case 12-Layer LSTM test case 22-Layer LSTM test case 32-Layer LSTM test case 42-Layer LSTM test case 52-Layer LSTM test case 62-Layer LSTM test case 72-Layer LSTM test case 82-Layer LSTM test case 92-Layer LSTM test case 102-Layer LSTM test case 112-Layer LSTM test case 122-Layer LSTM test case 132-Layer LSTM test case 142-Layer LSTM test case 152-Layer LSTM test case 162-Layer LSTM test case 172-Layer LSTM test case 18

Bidirectional 1-Layer LSTM

In [39]:python
learning_rate = 0.001
input_size = trainX.shape[2] # number of input features, multivariate case
hidden_size = 20
num_layers = 1
num_classes = trainY.shape[1] # future time window length
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers, bidirectional=True)

config = {'batch_size': 128, 'num_epochs': 10, 'checkpoints_dir': '/content/drive/My Drive/time series prediction/', 'model_filename': 'bidir-1-layer-lstm-best_model.pth.tar'}

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(lstm_model.parameters(), lr=learning_rate)

train_history, val_history = [], []

best_model_wts = copy.deepcopy(lstm_model.state_dict())

best_loss = 10e10  # for validation phase

if not os.path.exists(config['checkpoints_dir']):
  os.mkdir(config['checkpoints_dir'])

for epoch in range(config['num_epochs']):
  print("="*20 + str(epoch+1) + "="*20)
  _, train_loss = train_model(lstm_model, train_data, criterion, optimizer, config['batch_size'])
  train_history.append(train_loss)

  _, val_loss, best_model_wts = evaluate_model(lstm_model, val_data, criterion, optimizer, config['batch_size'])
  val_history.append(val_loss)

  if val_loss < best_loss:
      best_loss = val_loss
      print("Saving model for best loss")
      checkpoint = {
          'state_dict': best_model_wts
      }
      torch.save(checkpoint, config['checkpoints_dir'] + config['model_filename'])
      best_model_wts = copy.deepcopy(lstm_model.state_dict())
Out [39]:
====================1====================
Epoch: 1, train loss: 0.92054
Epoch: 1, val loss: 0.78858
Saving model for best loss
====================2====================
Epoch: 2, train loss: 0.77843
Epoch: 2, val loss: 0.73070
Saving model for best loss
====================3====================
Epoch: 3, train loss: 0.75058
Epoch: 3, val loss: 0.71063
Saving model for best loss
====================4====================
Epoch: 4, train loss: 0.73297
Epoch: 4, val loss: 0.69804
Saving model for best loss
====================5====================
Epoch: 5, train loss: 0.71834
Epoch: 5, val loss: 0.68801
Saving model for best loss
====================6====================
Epoch: 6, train loss: 0.70628
Epoch: 6, val loss: 0.67993
Saving model for best loss
====================7====================
Epoch: 7, train loss: 0.69642
Epoch: 7, val loss: 0.67463
Saving model for best loss
====================8====================
Epoch: 8, train loss: 0.68780
Epoch: 8, val loss: 0.67683
====================9====================
Epoch: 9, train loss: 0.68003
Epoch: 9, val loss: 0.67213
Saving model for best loss
====================10====================
Epoch: 10, train loss: 0.67364
Epoch: 10, val loss: 0.66894
Saving model for best loss
In [40]:python
plt.plot(np.arange(config['num_epochs']), train_history, label='Train loss')
plt.plot(np.arange(config['num_epochs']), val_history, label='Val loss')
plt.xlabel("num of epochs")
plt.ylabel("train loss")
plt.show()
Out [40]:
Bidirectional LSTM training and validation loss

Test

In [41]:python
from collections import OrderedDict

state_dict = torch.load(os.path.join(config['checkpoints_dir'], config['model_filename']))['state_dict']

new_state_dict = OrderedDict()
for k,v in state_dict.items():
  if k not in ["lstm.weight_ih_l0_reverse", "lstm.weight_hh_l0_reverse", "lstm.bias_ih_l0_reverse", "lstm.bias_hh_l0_reverse"]:
      new_state_dict[k] = v
In [42]:python
lstm_model = LSTMTimeSeriesPredictor(num_classes, input_size, hidden_size, num_layers)
lstm_model.load_state_dict(new_state_dict)
lstm_model.eval()
for param in lstm_model.parameters():
  param.requires_grad = False
In [43]:python
i = 0
while i < len(testX):
  test_x = testX[i].unsqueeze(0)
  test_y = testY[i].unsqueeze(0)
  multi_step_plot(test_x.numpy()[0, :, target_index], test_y.detach().numpy()[0, :], lstm_model(test_x).detach().numpy()[0, :], 'Test case')
  i += 100
Out [43]:
Bidirectional LSTM test case 0Bidirectional LSTM test case 1Bidirectional LSTM test case 2Bidirectional LSTM test case 3Bidirectional LSTM test case 4Bidirectional LSTM test case 5Bidirectional LSTM test case 6Bidirectional LSTM test case 7Bidirectional LSTM test case 8Bidirectional LSTM test case 9Bidirectional LSTM test case 10Bidirectional LSTM test case 11Bidirectional LSTM test case 12Bidirectional LSTM test case 13Bidirectional LSTM test case 14Bidirectional LSTM test case 15Bidirectional LSTM test case 16Bidirectional LSTM test case 17Bidirectional LSTM test case 18

Experimental results on val data

ModelBest MSE loss
1-layer LSTM0.66802
2-layer LSTM0.66578
Bidirectional 1-layer LSTM0.66894

Summary

At the first glance, all three models based on LSTM showed approximately the same results, so you can choose a single-layer LSTM without loss of quality. In cases where computational efficiency is a crucial part you may chose GRU model, which has less parameters to optimize than LSTM model.

The ways you can improve existing deep learning model:

  • Work on data cleanliness, look for essential features, which are strongly related to the target variable
  • Manual and automatic feature engineering
  • Optimization of hyperparameters (length of hidden state vector, batch size, number of epochs, learning rate, number of layers)
  • Try different optimization algorithm
  • Try different loss function which will be differentiable and adequate to task you're solving
  • Use ensembles of prediction vectors
Share this article:TwitterLinkedIn