LSTM Layer

Specialization of the Recurrent Layer for Long Short Term Memory (LSTM). The input sequence entries \(t\in\{0,\dots,T\}\) denoted by \(\mathbf{x}^{t}\in\mathbb{R}^{n_{l-1}}\) is processed to the output state \(\mathbf{h}^{t-1}\in\mathbb{R}^{n_{l}}\) by

\[\begin{split}\mathbf{f}_t &= \sigma_g(W_{f} \mathbf{x}^{t} + U_{f} \mathbf{h}^{t-1} + \mathbf{b}_f) \mathbf{i}_t &= \sigma_g(W_{i} \mathbf{x}^{t} + U_{i} \mathbf{h}^{t-1} + \mathbf{b}_i) \mathbf{o}_t &= \sigma_g(W_{o} \mathbf{x}^{t} + U_{o} \mathbf{h}^{t-1} + \mathbf{b}_o) \tilde{\mathbf{c}}_t &= \sigma_c(W_{c} \mathbf{x}^{t} + U_{c} \mathbf{h}^{t-1} + \mathbf{b}_c) \mathbf{c}_t &= f_t \circ \mathbf{c}_{t-1} + \mathbf{i}_t \circ \tilde{\mathbf{c}}_t \\ \mathbf{h}_t &= \mathbf{o}_t \circ \sigma_h(\mathbf{c}_t)\end{split}\]

We note that for \(t=0\) the needed vectors \(\mathbf{h}_{t-1}\) and \(\mathbf{c}_{t-1}\) are zero vectors. From the input and the hidden state we compute the forget \(\mathbf{f}_t\), the input \(\mathbf{i}_t\), and output \(\mathbf{o}_t\) activation vector using the respective Weights \(W_f,W_i,W_o,U_f,U_i,U_o\) and Biases \(\mathbf{b}_f,\mathbf{b}_i,\mathbf{b}_o\). These are used to compute the cell \(\tilde{\mathbf{c}}_t\) activation vector via the Weights \(W_c\) and Bias \(\mathbf{b}_c\). The cell state \(\mathbf{c}_t\) is now computed by combining the cell state with the forget and input activation vectors. The output state is obtained by combining the cell state with the output activation vector. The used component-wise non-linearitites are sigmoid for \(\sigma_g\) and hyperbolic tangent functions for \(\sigma_c\) and \(\sigma_h\). As an illustration we attach a visual representation of the data-flow through an LSTM layer.

Usage

eLayer/Recurrent/Lstm"

Configuration

These are settings required by this module.

Depth

Usage: e[“Depth”] = unsigned integer
Description: The number of copies of this layer. This has a better performance than just defining many of these layers manually since it is optimized by the underlying engine.

Output Channels

Usage: e[“Output Channels”] = unsigned integer
Description: Indicates the size of the output vector produced by the layer.

Weight Scaling

Usage: e[“Weight Scaling”] = float
Description: Factor that is mutliplied by the layers’ weights.

Engine

Usage: e[“Engine”] = string
Description: Specifies which Neural Network backend engine to use.
Options:
- “Korali”: Uses Korali’s lightweight NN support. (CPU Sequential - Does not require installing third party software other than Eigen)
- “OneDNN”: Uses oneDNN as NN support. (CPU Sequential/Parallel - Requires installing oneDNN)
- “CuDNN”: Uses cuDNN as NN support. (GPU - Requires installing cuDNN)

Mode

Usage: e[“Mode”] = string
Description: Specifies the execution mode of the Neural Network.
Options:
- “Training”: Use for training. Stores data during forward propagation and allows backward propagation.
- “Inference”: Use for inference only. Only runs forward propagation. Faster for inference.

Layers

Usage: e[“Layers”] = knlohmann::json
Description: Complete description of the NN’s layers.

Timestep Count

Usage: e[“Timestep Count”] = unsigned integer
Description: Provides the sequence length for the input/output data.

Batch Sizes

Usage: e[“Batch Sizes”] = List of unsigned integer
Description: Specifies the batch sizes.

Default Configuration

These following configuration will be assigned by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Batch Sizes": [],
"Depth": 1,
"Engine": "Korali",
"Input Values": [],
"Output Channels": 0,
"Uniform Generator": {
    "Maximum": 1.0,
    "Minimum": -1.0,
    "Name": "Neural Network / Uniform Generator",
    "Type": "Univariate/Uniform"
    },
"Weight Scaling": 1.0
}