mercredi, août 12, 2020

Deep Learning For Time Series Forecasting - Jason Brownlee


The aim of this post is to provide a review of the book "Deep Learning for Time Series Forecasting" from Jason Brownlee. This kind of post is useful for me as an online reminder of the key concepts of this book so I am able to quickly spot where I can retrieve information related to a subject. It might also be useful for the anonymous reader in order to make his/her mind about the book.
Reading this book will give you a sense of mastery, achievement or control. Practice and achievement are one and the same, and in the book you get both practice and achievement.

Chapter 3: How to Develop a Skillful ForecastingModel


  • Given the iterative nature of modeling and evaluating performance.
  • The forecasting method is applied only to a subset of the series.
  • Descriptive modeling = time series analysis
  • Predictive modeling = time series forecasting

Chapter 4: How to Transform Time Series to a Supervised Learning Problem

  • Sliding window method = lag method
  • Supervised learning is where you have input variables (X) and an output variable (y) and you can use an algorithm to learn the mapping fit from the input to the output.
  • Multivariate and multi-steps forecasting can be framed as supervised learning using the sliding window method.
  • Multivariate time series data means data where there is more than one observation for each time step.

Chapter 5: Review of Simple and Classical Forecasting Methods

  • ARIMA: AutoRegressive Integrated Moving Average. A model where the prediction is a weighted linear sum of recent past observations or lags.
  • SARIMA: Seasonal ARIMA
  • Exponential smoothing forecasting methods are similar to ARIMA in that a prediction is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations. In other words, the more recent the observations the higher the associated weights.

Chapter 6: How to prepare Time Series Data for CNNs & LSTMs

  • shape[] #refers too the numbers of rows
  • print(data[:5,:]) #print he first 5 rows for an array of more of 1 column
  • numpy.reshape() # reformat an array by keeping its data

Chapter 7: How to develop MLPs for Time Series forecasting

  • sample: multiple input/output patterns
  • The model will view each time steps as a separate feature instead of separate time steps.
  • hstack() # horizontal stack

Chapter 8 - How to Develop CNNs for Time Series forecasting

  • Univariate time series are datasets comprised of a single series of observations with a temporal ordering and a model is required to learn from the series of past observations to predict the next value in the sequence.

Chapter 9 - How to develop LSTMs for Time Series Forecasting

  • "Key to LSTM is that they offer native support for sequences. Unlike a CNN that reads across the entire input vector, the LSTM model reads one time step of the sequence at a time and builds up an internal state representation that can be read as a learned context for making a prediction."
  • "The CNN can be very effective at automatically extracting and learning features from one-dimensional sequence such as univariate time series data."
  • "Encoder-Decoder Model: The model was designed for prediction problems where there are both input and output sequences, so-called sequence-to-sequence, or seq2seq problems, such as translating text from one language to another."

Chapter 10: Review of Top Methods For Univariate Time Series Forecasting

  • "Classical methods like Theta and ARIMA out-perform machine learning and deep learning methods for multi-step forecasting on univariate datasets."
  • "Machine learning and deep learning methods do not yet deliver on their promise for univariate time series forecasting and there is much work to do."

Chapter 11: How to develop simple methods for univariate forecasting

  • Median: when the distribution of information is not Gaussian.
  • # split a univariate dataset into train/test sets:
  • def train_test_split(data, n_test):
    • return data[:-n_test], data[-n_test:]

Chapter 12: How to develop ETS models for univariate Forecasting

  • ETS : Exponential smoothing for Time Series
  • "Exponential smoothing is a time series forecasting method for univariate data that can be extended to support data with a systematic trend or seasonal component.

Chapter 13: How to develop SARIMA models for univariate forecasting
Chapter 14: How to develop MLPs, CNNs & LSTM for univariate forecasting

  • Walk forward validation "is an approach where the model makes a forecast for each observation in the test dataset one at a time. After each forecast is made for a time step in the test dataset, the true observation for the forecast is added to the test dataset and made available to the model."
  • batch size: "how often the weights are updated within each epoch"
  • RNN: "Recurrent Neural Network use an output of the network from a prior step as an input in attempt to automatically learn across sequence data. LSTM is a type of RNN."

Chapter 15: How to grid search Deep Learning Models. for Univariate forecasting

  • Differencing is the transform of a data such that a value of a prior observation is subtracted from the current observation, removing trend or seasonality structure.
  • In this chapter, the author uses the following time series (univariate with trend and seasonality) and search for the hyper parameters needed for the best forecast. The named "grid search" is applied on a naive persistent method, MLP, CNN and LSTM.
An univariate time series with trend and seasonality


RMSE results with different models

Chapter 16: How to load and explore household energy usage data

  • Are the distribution of Gaussian type?
  • The distribution of active power appears to be bi-modal, meaning it looks like it has two mean groups of observations:
Example of bi-modal distribution

Chapter 17: How to develop naive models for multi-step energy usage forecasting

  • "It is important to test naive forecast models on any new prediction problem. The result provides a baseline performance by which more sophisticated forecast methods can be evaluated"
Naive forecast strategies for household power forecasting 

Chapter 18: How to develop ARIMA models for multi-step energy usage forecasting

  • "The Statsmodel library provides multiple ways of developing an AR model, such as using the AR, ARMA, ARIMA, SARIMAX classes."

Chapter 19: How to develop CNNs for multi-step energy usage forecasting

  • "Unlike other ML algorithms, convolutional neural networks are capable of automatically learning features from sequence data, support multivariate data, and can directly output a vector for multi-step forecasting."
Example of a multi-headed CNN model

Chapter 20: How to develop LSTMs for multi-step energy usage forecasting

  • "The first step in any project is defining your problem."
  • "Perhaps the biggest opportunity for programmers is to put learning machine methods in the application you are developing."
  • "Machine learning methods address a specific decision problem."

Chapter 21: Review of deep learning models for Human Activity Recognition

  • HAR: Human Activity Recognition
  • Sliding window approach
  • "RNN and LSTM are recommended to recognize short activities that have natural order while CNN is better at inferring long term repetitive activities. The reason is that RNN could make use of the time-order relationship between sensor readings, and CNN is more capable of learning deep features contained in recursive patterns." Deep learning for Sensor-based activity recognition: A survey, 2018.

Chapter 22: How to load and explore human activity data

  • One of the interest of the book is that the code of the examples is coming along. If I had to write myself all the lines of code, that would take me a huge amount of time, and might be discouraging. Here you pick up the code and run the examples to see the results. 

Chapter 23: How to develop ML models for Human Activity Recognition

  • A list of machine learning models is evaluated:
    • Non linear algorithms
      • k-Nearest Neighbors
      • Classification and regression tree
      • Support Vector Machine
      • Naive Bayes
    • Ensemble algorithms
      • Bagged decision trees
      • Random Forest
      • Extra trees
      • Gradient Boosting Machine

Chapter 24: How to develop CNNs for Human Activity Recognition

  • "Convolutional neural network models were developed for image classification problems, where the model learns an internal representation of a two-dimensional input, in a process referred to as feature learning. Although we refer to the model as 1D, it supports multiple dimensions of input as separate channels, like the color channels of an image (red, green and blue)."
  • "The benefits of using CNNs for sequence classification is that they can learn from the raw time series data directly, and in turn do not require domain expertise to manually engineer input features."
  • "We must define the CNN model using the Keras deep learning library."
  • "CNNs learn very quickly, so the dropout layer is intended to help slow down the learning process and hopefully result in a better final model. The pooling layer reduces the learned features to 1/4 their size, consolidating them to only the most essential elements. After the CNN and pooling, the learned features are flattened to one long vector and pass through a fully connected layer before the output layer used to make prediction."
  • "The feature maps are the number of times the input is processed or interpreted."
  • "The kernel size is the number of input time steps considered as the input sequence is read or processed onto the feature maps."
  • "The model is fit for a fixed number of epochs, in this case 10, and a batch size of 32 samples, where 32 windows of data will be exposed to the model before the weights of the model are updated."
  • Standardization refers to shifting the distribution of each variable such that it has a mean of zero and a standard deviation of 1. It really makes sense only f the distribution of each variable is Gaussian.
  • The StandardScaler scikit-learn will be used to perform the transform.
1D CNN with and without standardization
  • CNN kernel size: a large kernel size means a less rigorous reading of the data, but may result in a more generalized snapshot of the input.

Holidays


Abondance

Chapter 25: How to develop LTSMs for Human Activity Recognition

  • LTSM network models are a type of Recurrent Neural Network that are able to learn and remember over long sequences of input data.
Human Activity Recognition accuracy with different models

Conclusion

  • Yet another book from Jason Brownlee. This book is helpful for resuming all good practices learned so far: (i) the need for data preparation with plenty of code examples on how to prepare the data, (ii) the mixing of different neural networks for achieving better results at time series forecasting, (iii) the process to achieve always better performance by starting with a baseline based on traditional simple methods and then adding neural network to go beyond.
  • Although the book is rich in term of code examples and good practices, mainly all examples are targeted on how to get better performance on Human Activities prediction. So I am missing here a last step, that is what to do with this? How can I use it for my own ideas? The previous book gave me more insights on how to use the examples on my own data (photo recognition)
  • Once again a very big thanks to Jason Brownlee. All these line didactic lines of codes will certainly be helpful in the future.
  • Now let's start with the next one: "Long Short Term Memory with Python"

1 commentaire:

Jason Brownlee a dit…

Well done Dominique, an excellent write-up!