LSTM Layer

Specialization of the Recurrent Layer for Long Short Term Memory (LSTM). The input sequence entries \(t\in\{0,\dots,T\}\) denoted by \(\mathbf{x}^{t}\in\mathbb{R}^{n_{l-1}}\) is processed to the output state \(\mathbf{h}^{t-1}\in\mathbb{R}^{n_{l}}\) by

\[\begin{split}\mathbf{f}_t &= \sigma_g(W_{f} \mathbf{x}^{t} + U_{f} \mathbf{h}^{t-1} + \mathbf{b}_f) \mathbf{i}_t &= \sigma_g(W_{i} \mathbf{x}^{t} + U_{i} \mathbf{h}^{t-1} + \mathbf{b}_i) \mathbf{o}_t &= \sigma_g(W_{o} \mathbf{x}^{t} + U_{o} \mathbf{h}^{t-1} + \mathbf{b}_o) \tilde{\mathbf{c}}_t &= \sigma_c(W_{c} \mathbf{x}^{t} + U_{c} \mathbf{h}^{t-1} + \mathbf{b}_c) \mathbf{c}_t &= f_t \circ \mathbf{c}_{t-1} + \mathbf{i}_t \circ \tilde{\mathbf{c}}_t \\ \mathbf{h}_t &= \mathbf{o}_t \circ \sigma_h(\mathbf{c}_t)\end{split}\]

We note that for \(t=0\) the needed vectors \(\mathbf{h}_{t-1}\) and \(\mathbf{c}_{t-1}\) are zero vectors. From the input and the hidden state we compute the forget \(\mathbf{f}_t\), the input \(\mathbf{i}_t\), and output \(\mathbf{o}_t\) activation vector using the respective Weights \(W_f,W_i,W_o,U_f,U_i,U_o\) and Biases \(\mathbf{b}_f,\mathbf{b}_i,\mathbf{b}_o\). These are used to compute the cell \(\tilde{\mathbf{c}}_t\) activation vector via the Weights \(W_c\) and Bias \(\mathbf{b}_c\). The cell state \(\mathbf{c}_t\) is now computed by combining the cell state with the forget and input activation vectors. The output state is obtained by combining the cell state with the output activation vector. The used component-wise non-linearitites are sigmoid for \(\sigma_g\) and hyperbolic tangent functions for \(\sigma_c\) and \(\sigma_h\). As an illustration we attach a visual representation of the data-flow through an LSTM layer.

Usage

eLayer/Recurrent/Lstm"

Configuration

These are settings required by this module.

Depth
  • Usage: e[“Depth”] = unsigned integer

  • Description: The number of copies of this layer. This has a better performance than just defining many of these layers manually since it is optimized by the underlying engine.

Output Channels
  • Usage: e[“Output Channels”] = unsigned integer

  • Description: Indicates the size of the output vector produced by the layer.

Weight Scaling
  • Usage: e[“Weight Scaling”] = float

  • Description: Factor that is mutliplied by the layers’ weights.

Engine
  • Usage: e[“Engine”] = string

  • Description: Specifies which Neural Network backend engine to use.

  • Options:

    • Korali”: Uses Korali’s lightweight NN support. (CPU Sequential - Does not require installing third party software other than Eigen)

    • OneDNN”: Uses oneDNN as NN support. (CPU Sequential/Parallel - Requires installing oneDNN)

    • CuDNN”: Uses cuDNN as NN support. (GPU - Requires installing cuDNN)

Mode
  • Usage: e[“Mode”] = string

  • Description: Specifies the execution mode of the Neural Network.

  • Options:

    • Training”: Use for training. Stores data during forward propagation and allows backward propagation.

    • Inference”: Use for inference only. Only runs forward propagation. Faster for inference.

Layers
  • Usage: e[“Layers”] = knlohmann::json

  • Description: Complete description of the NN’s layers.

Timestep Count
  • Usage: e[“Timestep Count”] = unsigned integer

  • Description: Provides the sequence length for the input/output data.

Batch Sizes
  • Usage: e[“Batch Sizes”] = List of unsigned integer

  • Description: Specifies the batch sizes.

Default Configuration

These following configuration will be assigned by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Batch Sizes": [],
"Depth": 1,
"Engine": "Korali",
"Input Values": [],
"Output Channels": 0,
"Uniform Generator": {
    "Maximum": 1.0,
    "Minimum": -1.0,
    "Name": "Neural Network / Uniform Generator",
    "Type": "Univariate/Uniform"
    },
"Weight Scaling": 1.0
}