Deep learning examples

1f72d705c007435e9cf68192a92c6060

This notebooks contains examples with neural network models.

Table of Contents

[1]:
import torch
import random

import pandas as pd
import numpy as np

from etna.datasets.tsdataset import TSDataset
from etna.pipeline import Pipeline
from etna.transforms import DateFlagsTransform
from etna.transforms import LagTransform
from etna.transforms import LinearTrendTransform
from etna.metrics import SMAPE, MAPE, MAE
from etna.analysis import plot_backtest
from etna.models import SeasonalMovingAverageModel

import warnings
warnings.filterwarnings("ignore")

1. Creating TSDataset

We are going to take transformed [Household Electric Power Consumption] dataset. Let’s load and look at it.

[2]:
original_df = pd.read_csv("data/example_dataset.csv")
original_df.head()
[2]:
timestamp segment target
0 2019-01-01 segment_a 170
1 2019-01-02 segment_a 243
2 2019-01-03 segment_a 267
3 2019-01-04 segment_a 287
4 2019-01-05 segment_a 279

Our library works with the spacial data structure TSDataset. Let’s create it as it was done in “Get started” notebook.

[3]:
df = TSDataset.to_dataset(original_df)
ts = TSDataset(df, freq="D")
ts.head(5)
[3]:
segment segment_a segment_b segment_c segment_d
feature target target target target
timestamp
2019-01-01 170 102 92 238
2019-01-02 243 123 107 358
2019-01-03 267 130 103 366
2019-01-04 287 138 103 385
2019-01-05 279 137 104 384

2. Architecture

Our library uses PyTorch Forecasting to work with time series neural networks. To include it in our current architecture we use PytorchForecastingTransform class.

Let’s look at it closer.

[4]:
from etna.transforms import PytorchForecastingTransform
[5]:
?PytorchForecastingTransform
"""
Init signature:
PytorchForecastingTransform(
    max_encoder_length: int = 30,
    min_encoder_length: int = None,
    min_prediction_idx: int = None,
    min_prediction_length: int = None,
    max_prediction_length: int = 1,
    static_categoricals: List[str] = [],
    static_reals: List[str] = [],
    time_varying_known_categoricals: List[str] = [],
    time_varying_known_reals: List[str] = [],
    time_varying_unknown_categoricals: List[str] = [],
    time_varying_unknown_reals: List[str] = [],
    variable_groups: Dict[str, List[int]] = {},
    constant_fill_strategy: Dict[str, Union[str, float, int, bool]] = {},
    allow_missing_timesteps: bool = True,
    lags: Dict[str, List[int]] = {},
    add_relative_time_idx: bool = True,
    add_target_scales: bool = True,
    add_encoder_length: Union[bool, str] = True,
    target_normalizer: Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer, str, List[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]], Tuple[Union[pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.NaNLabelEncoder, pytorch_forecasting.data.encoders.EncoderNormalizer]]] = 'auto',
    categorical_encoders: Dict[str, pytorch_forecasting.data.encoders.NaNLabelEncoder] = None,
    scalers: Dict[str, Union[sklearn.preprocessing._data.StandardScaler, sklearn.preprocessing._data.RobustScaler, pytorch_forecasting.data.encoders.TorchNormalizer, pytorch_forecasting.data.encoders.EncoderNormalizer]] = {},
)
Docstring:      Transform for models from PytorchForecasting library.
Init docstring:
Parameters for TimeSeriesDataSet object.

Reference
---------
https://github.com/jdb78/pytorch-forecasting/blob/v0.8.5/pytorch_forecasting/data/timeseries.py#L117
"""

We can see a pretty scary signature, but don’t panic, we will look at the most important parameters.

  • time_varying_known_reals — known real values that change across the time (real regressors), now it it necessary to add “time_idx” variable to the list;

  • time_varying_unknown_reals — our real value target, set it to ["target"];

  • max_prediction_length — our horizon for forecasting;

  • max_encoder_length — length of past context to use;

  • static_categoricals — static categorical values, for example, if we use multiple segments it can be some its characteristics including identifier: “segment”;

  • time_varying_known_categoricals — known categorical values that change across the time (categorical regressors);

  • target_normalizer — class for normalization targets across different segments.

Our library currently supports these models: * DeepAR, * TFT.

3. Testing models

In this section we will test our models on example.

3.1 DeepAR

Before training let’s fix seeds for reproducibility.

[6]:
torch.manual_seed(42)
random.seed(42)
np.random.seed(42)

Creating transforms for DeepAR.

[7]:
from pytorch_forecasting.data import GroupNormalizer

HORIZON = 7

transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False, out_column="dateflag")
num_lags = 10
transform_lag = LagTransform(in_column="target", lags=[HORIZON+i for i in range(num_lags)], out_column="target_lag")
lag_columns = [f"target_lag_{HORIZON+i}" for i in range(num_lags)]

transform_deepar = PytorchForecastingTransform(
    max_encoder_length=HORIZON,
    max_prediction_length=HORIZON,
    time_varying_known_reals=["time_idx"]+lag_columns,
    time_varying_unknown_reals=["target"],
    time_varying_known_categoricals=["dateflag_day_number_in_week"],
    target_normalizer=GroupNormalizer(groups=["segment"]),
)

Now we are going to start backtest.

[8]:
from etna.models.nn import DeepARModel


model_deepar = DeepARModel(max_epochs=150, learning_rate=[0.01], gpus=0, batch_size=64)
metrics = [SMAPE(), MAPE(), MAE()]

pipeline_deepar = Pipeline(model=model_deepar,
                           horizon=HORIZON,
                           transforms=[transform_lag, transform_date, transform_deepar])
[9]:
metrics_deepar, forecast_deepar, fold_info_deepar = pipeline_deepar.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name                   | Type                   | Params
------------------------------------------------------------------
0 | loss                   | NormalDistributionLoss | 0
1 | logging_metrics        | ModuleList             | 0
2 | embeddings             | MultiEmbedding         | 35
3 | rnn                    | LSTM                   | 2.2 K
4 | distribution_projector | Linear                 | 22
------------------------------------------------------------------
2.3 K     Trainable params
0         Non-trainable params
2.3 K     Total params
0.009     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  1.9min remaining:    0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name                   | Type                   | Params
------------------------------------------------------------------
0 | loss                   | NormalDistributionLoss | 0
1 | logging_metrics        | ModuleList             | 0
2 | embeddings             | MultiEmbedding         | 35
3 | rnn                    | LSTM                   | 2.2 K
4 | distribution_projector | Linear                 | 22
------------------------------------------------------------------
2.3 K     Trainable params
0         Non-trainable params
2.3 K     Total params
0.009     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  3.9min remaining:    0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name                   | Type                   | Params
------------------------------------------------------------------
0 | loss                   | NormalDistributionLoss | 0
1 | logging_metrics        | ModuleList             | 0
2 | embeddings             | MultiEmbedding         | 35
3 | rnn                    | LSTM                   | 2.2 K
4 | distribution_projector | Linear                 | 22
------------------------------------------------------------------
2.3 K     Trainable params
0         Non-trainable params
2.3 K     Total params
0.009     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  6.0min remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  6.0min finished

Let’s compare results across different segments.

[10]:
metrics_deepar
[10]:
segment SMAPE MAPE MAE fold_number
0 segment_a 6.496923 6.262331 33.577118 0
0 segment_a 3.110433 3.115227 16.091409 1
0 segment_a 3.247720 3.195171 17.676841 2
3 segment_b 6.794060 6.527645 17.035167 0
3 segment_b 4.234161 4.282767 10.278190 1
3 segment_b 4.020539 4.129931 9.492033 2
1 segment_c 3.464639 3.436801 5.854686 0
1 segment_c 5.573871 5.410155 9.875741 1
1 segment_c 3.951570 3.798992 6.880014 2
2 segment_d 6.403644 6.308158 54.982370 0
2 segment_d 6.815848 6.975395 55.210536 1
2 segment_d 3.138582 3.075240 26.546858 2

To summarize it we will take mean value of SMAPE metric because it is scale tolerant.

[11]:
score = metrics_deepar["SMAPE"].mean()
print(f"Average SMAPE for DeepAR: {score:.3f}")
Average SMAPE for DeepAR: 4.771

Visualize results.

[12]:
plot_backtest(forecast_deepar, ts, history_len=20)
../_images/tutorials_NN_examples_30_0.png

3.2 TFT

Let’s move to the next model.

[13]:
torch.manual_seed(42)
random.seed(42)
np.random.seed(42)
[14]:
transform_date = DateFlagsTransform(day_number_in_week=True, day_number_in_month=False, out_column="dateflag")
num_lags = 10
transform_lag = LagTransform(in_column="target", lags=[HORIZON+i for i in range(num_lags)], out_column="target_lag")
lag_columns = [f"target_lag_{HORIZON+i}" for i in range(num_lags)]

transform_tft = PytorchForecastingTransform(
    max_encoder_length=HORIZON,
    max_prediction_length=HORIZON,
    time_varying_known_reals=["time_idx"],
    time_varying_unknown_reals=["target"],
    time_varying_known_categoricals=["dateflag_day_number_in_week"],
    static_categoricals=["segment"],
    target_normalizer=GroupNormalizer(groups=["segment"]),
)
[15]:
from etna.models.nn import TFTModel

model_tft = TFTModel(max_epochs=200, learning_rate=[0.01], gpus=0, batch_size=64)

pipeline_tft = Pipeline(model=model_tft,
                        horizon=HORIZON,
                        transforms=[transform_lag, transform_date, transform_tft])
[16]:
metrics_tft, forecast_tft, fold_info_tft = pipeline_tft.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0
1  | logging_metrics                    | ModuleList                      | 0
2  | input_embeddings                   | MultiEmbedding                  | 47
3  | prescalers                         | ModuleDict                      | 96
4  | static_variable_selection          | VariableSelectionNetwork        | 1.8 K
5  | encoder_variable_selection         | VariableSelectionNetwork        | 1.9 K
6  | decoder_variable_selection         | VariableSelectionNetwork        | 1.3 K
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
11 | lstm_encoder                       | LSTM                            | 2.2 K
12 | lstm_decoder                       | LSTM                            | 2.2 K
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm                | GateAddNorm                     | 576
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
19 | pre_output_gate_norm               | GateAddNorm                     | 576
20 | output_layer                       | Linear                          | 119
----------------------------------------------------------------------------------------
18.9 K    Trainable params
0         Non-trainable params
18.9 K    Total params
0.075     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.7min remaining:    0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0
1  | logging_metrics                    | ModuleList                      | 0
2  | input_embeddings                   | MultiEmbedding                  | 47
3  | prescalers                         | ModuleDict                      | 96
4  | static_variable_selection          | VariableSelectionNetwork        | 1.8 K
5  | encoder_variable_selection         | VariableSelectionNetwork        | 1.9 K
6  | decoder_variable_selection         | VariableSelectionNetwork        | 1.3 K
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
11 | lstm_encoder                       | LSTM                            | 2.2 K
12 | lstm_decoder                       | LSTM                            | 2.2 K
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm                | GateAddNorm                     | 576
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
19 | pre_output_gate_norm               | GateAddNorm                     | 576
20 | output_layer                       | Linear                          | 119
----------------------------------------------------------------------------------------
18.9 K    Trainable params
0         Non-trainable params
18.9 K    Total params
0.075     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed: 10.0min remaining:    0.0s
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

   | Name                               | Type                            | Params
----------------------------------------------------------------------------------------
0  | loss                               | QuantileLoss                    | 0
1  | logging_metrics                    | ModuleList                      | 0
2  | input_embeddings                   | MultiEmbedding                  | 47
3  | prescalers                         | ModuleDict                      | 96
4  | static_variable_selection          | VariableSelectionNetwork        | 1.8 K
5  | encoder_variable_selection         | VariableSelectionNetwork        | 1.9 K
6  | decoder_variable_selection         | VariableSelectionNetwork        | 1.3 K
7  | static_context_variable_selection  | GatedResidualNetwork            | 1.1 K
8  | static_context_initial_hidden_lstm | GatedResidualNetwork            | 1.1 K
9  | static_context_initial_cell_lstm   | GatedResidualNetwork            | 1.1 K
10 | static_context_enrichment          | GatedResidualNetwork            | 1.1 K
11 | lstm_encoder                       | LSTM                            | 2.2 K
12 | lstm_decoder                       | LSTM                            | 2.2 K
13 | post_lstm_gate_encoder             | GatedLinearUnit                 | 544
14 | post_lstm_add_norm_encoder         | AddNorm                         | 32
15 | static_enrichment                  | GatedResidualNetwork            | 1.4 K
16 | multihead_attn                     | InterpretableMultiHeadAttention | 676
17 | post_attn_gate_norm                | GateAddNorm                     | 576
18 | pos_wise_ff                        | GatedResidualNetwork            | 1.1 K
19 | pre_output_gate_norm               | GateAddNorm                     | 576
20 | output_layer                       | Linear                          | 119
----------------------------------------------------------------------------------------
18.9 K    Trainable params
0         Non-trainable params
18.9 K    Total params
0.075     Total estimated model params size (MB)
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 14.7min remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed: 14.7min finished
[17]:
metrics_tft
[17]:
segment SMAPE MAPE MAE fold_number
0 segment_a 4.003736 3.939512 20.710637 0
0 segment_a 6.366790 6.099132 31.872040 1
0 segment_a 2.728972 2.656444 15.460903 2
3 segment_b 5.884539 5.649675 14.540667 0
3 segment_b 5.377515 5.408940 12.965875 1
3 segment_b 4.069312 4.167871 9.468142 2
1 segment_c 3.423244 3.487631 5.776851 0
1 segment_c 3.581810 3.489848 6.513208 1
1 segment_c 6.113726 6.112950 10.985452 2
2 segment_d 9.495453 9.456682 79.086417 0
2 segment_d 3.744241 3.850278 32.086225 1
2 segment_d 2.347024 2.286240 20.012992 2
[18]:
score = metrics_tft["SMAPE"].mean()
print(f"Average SMAPE for TFT: {score:.3f}")
Average SMAPE for TFT: 4.761
[19]:
plot_backtest(forecast_tft, ts, history_len=20)
../_images/tutorials_NN_examples_39_0.png

3.3 Simple model

For comparison let’s train a much more simpler model.

[20]:
model_sma = SeasonalMovingAverageModel(window=5, seasonality=7)
linear_trend_transform = LinearTrendTransform(in_column='target')

pipeline_sma = Pipeline(model=model_sma,
                        horizon=HORIZON,
                        transforms=[linear_trend_transform])
[21]:
metrics_sma, forecast_sma, fold_info_sma = pipeline_sma.backtest(ts, metrics=metrics, n_folds=3, n_jobs=1)
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s finished
[22]:
metrics_sma
[22]:
segment SMAPE MAPE MAE fold_number
0 segment_a 6.343943 6.124296 33.196532 0
0 segment_a 5.346946 5.192455 27.938101 1
0 segment_a 7.510347 7.189999 40.028565 2
3 segment_b 7.178822 6.920176 17.818102 0
3 segment_b 5.672504 5.554555 13.719200 1
3 segment_b 3.327846 3.359712 7.680919 2
1 segment_c 6.430429 6.200580 10.877718 0
1 segment_c 5.947090 5.727531 10.701336 1
1 segment_c 6.186545 5.943679 11.359563 2
2 segment_d 4.707899 4.644170 39.918646 0
2 segment_d 5.403426 5.600978 43.047332 1
2 segment_d 2.505279 2.543719 19.347565 2
[23]:
score = metrics_sma["SMAPE"].mean()
print(f"Average SMAPE for Seasonal MA: {score:.3f}")
Average SMAPE for Seasonal MA: 5.547
[24]:
plot_backtest(forecast_sma, ts, history_len=20)
../_images/tutorials_NN_examples_46_0.png

As we can see, neural networks are a bit better in this particular case.