GRU Layer

Specialization of the Recurrent Layer for Gated Recurrent Units (GRU). The input sequence entry \(t\in\{0,\dots,T\}\) denoted by \(\mathbf{z}^{t}\in\mathbb{R}^{n_{l-1}}\) is processed to the output state \(\mathbf{h}^{t}\in\mathbb{R}^{n_{l}}\) by

\[\begin{split}\mathbf{z}_t &= \sigma_g(W_{z} \mathbf{x}_t + U_{z} \mathbf{h}_{t-1} + \mathbf{b}_z) \\ \mathbf{r}_t &= \sigma_g(W_{r} \mathbf{x}_t + U_{r} \mathbf{h}_{t-1} + \mathbf{b}_r) \\ \hat{\mathbf{h}}_t &= \phi_h(W_{h} \mathbf{x}_t + U_{h} (\mathbf{r}_t \odot \mathbf{h}_{t-1}) + \mathbf{b}_h) \\ \mathbf{h}_t &= (1 - \mathbf{z}_t) \odot \mathbf{h}_{t-1} + \mathbf{z}_t \odot \hat{\mathbf{h}}_t\end{split}\]

We note that for \(t=0\) the needed vector \(\mathbf{h}_{t-1}\) is a zero vector. From the input and the hidden state we compute the gate \(\mathbf{z}_t\), and reset \(\mathbf{r}_t\) vector using the respective Weights \(W_z,W_r,U_z,U_r\) and Biases \(\mathbf{b}_z,\mathbf{b}_r\). These are used to compute the output state via the Weights \(W_h,U_h\) and Bias \(\mathbf{b}_h\). The used component-wise non-linearitites are sigmoid for \(\sigma_g\) and hyperbolic tangent functions for \(\phi_h\). As an illustration we attach a visual representation of the data-flow through an LSTM layer.

Usage

eLayer/Recurrent/Gru"

Configuration

These are settings required by this module.

Depth
  • Usage: e[“Depth”] = unsigned integer

  • Description: The number of copies of this layer. This has a better performance than just defining many of these layers manually since it is optimized by the underlying engine.

Output Channels
  • Usage: e[“Output Channels”] = unsigned integer

  • Description: Indicates the size of the output vector produced by the layer.

Weight Scaling
  • Usage: e[“Weight Scaling”] = float

  • Description: Factor that is mutliplied by the layers’ weights.

Engine
  • Usage: e[“Engine”] = string

  • Description: Specifies which Neural Network backend engine to use.

  • Options:

    • Korali”: Uses Korali’s lightweight NN support. (CPU Sequential - Does not require installing third party software other than Eigen)

    • OneDNN”: Uses oneDNN as NN support. (CPU Sequential/Parallel - Requires installing oneDNN)

    • CuDNN”: Uses cuDNN as NN support. (GPU - Requires installing cuDNN)

Mode
  • Usage: e[“Mode”] = string

  • Description: Specifies the execution mode of the Neural Network.

  • Options:

    • Training”: Use for training. Stores data during forward propagation and allows backward propagation.

    • Inference”: Use for inference only. Only runs forward propagation. Faster for inference.

Layers
  • Usage: e[“Layers”] = knlohmann::json

  • Description: Complete description of the NN’s layers.

Timestep Count
  • Usage: e[“Timestep Count”] = unsigned integer

  • Description: Provides the sequence length for the input/output data.

Batch Sizes
  • Usage: e[“Batch Sizes”] = List of unsigned integer

  • Description: Specifies the batch sizes.

Default Configuration

These following configuration will be assigned by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Batch Sizes": [],
"Depth": 1,
"Engine": "Korali",
"Input Values": [],
"Output Channels": 0,
"Uniform Generator": {
    "Maximum": 1.0,
    "Minimum": -1.0,
    "Name": "Neural Network / Uniform Generator",
    "Type": "Univariate/Uniform"
    },
"Weight Scaling": 1.0
}