Discrete VRACER

This solver implements a discrete version of VRACER (https://arxiv.org/abs/1807.05827)

Usage

e["Solver"]["Type"] = "Agent/Discrete/DVRACER"

Results

These are the results produced by this solver:

Variable-Specific Settings

These are settings required by this module that are added to each of the experiment’s variables when this module is selected.

Configuration

These are settings required by this module.

Initial Inverse Temperature

Usage: e[“Solver”][“Initial Inverse Temperature”] = float
Description: Initial inverse temperature of the softmax distribution. Large values lead to a distribution that is more concentrated around the action with highes Q-value estimate.

Mode

Usage: e[“Solver”][“Mode”] = string
Description: Specifies the operation mode for the agent.
Options:
- “Training”: Learns a policy for the reinforcement learning problem.
- “Testing”: Tests the policy with a learned policy.

Testing / Sample Ids

Usage: e[“Solver”][“Testing”][“Sample Ids”] = List of unsigned integer
Description: A vector with the identifiers for the samples to test the hyperparameters with.

Testing / Current Policies

Usage: e[“Solver”][“Testing”][“Current Policies”] = knlohmann::json
Description: The current hyperparameters of the policies to test.

Training / Average Depth

Usage: e[“Solver”][“Training”][“Average Depth”] = unsigned integer
Description: Specifies the depth of the running training average to report.

Concurrent Workers

Usage: e[“Solver”][“Concurrent Workers”] = unsigned integer
Description: Indicates the number of concurrent environments to use to collect experiences.

Episodes Per Generation

Usage: e[“Solver”][“Episodes Per Generation”] = unsigned integer
Description: Number of reinforcement learning episodes per Korali generation (checkpoints are generated between generations).

Mini Batch / Size

Usage: e[“Solver”][“Mini Batch”][“Size”] = unsigned integer
Description: The number of experiences to randomly select to train the neural network(s) with.

Time Sequence Length

Usage: e[“Solver”][“Time Sequence Length”] = unsigned integer
Description: Indicates the number of contiguous experiences to pass to the NN for learning. This is only useful when using recurrent NNs.

Learning Rate

Usage: e[“Solver”][“Learning Rate”] = float
Description: The initial learning rate to use for the NN hyperparameter optimization.

L2 Regularization / Enabled

Usage: e[“Solver”][“L2 Regularization”][“Enabled”] = True/False
Description: Boolean to determine if l2 regularization will be applied to the neural networks.

L2 Regularization / Importance

Usage: e[“Solver”][“L2 Regularization”][“Importance”] = float
Description: Coefficient for l2 regularization.

Neural Network / Hidden Layers

Usage: e[“Solver”][“Neural Network”][“Hidden Layers”] = knlohmann::json
Description: Indicates the configuration of the hidden neural network layers.

Neural Network / Optimizer

Usage: e[“Solver”][“Neural Network”][“Optimizer”] = string
Description: Indicates the optimizer algorithm to update the NN hyperparameters.

Neural Network / Engine

Usage: e[“Solver”][“Neural Network”][“Engine”] = string
Description: Specifies which Neural Network backend to use.

Discount Factor

Usage: e[“Solver”][“Discount Factor”] = float
Description: Represents the discount factor to weight future experiences.

Importance Weight Truncation Level

Usage: e[“Solver”][“Importance Weight Truncation Level”] = float
Description: Represents the discount factor to weight future experiences.

Experience Replay / Serialize

Usage: e[“Solver”][“Experience Replay”][“Serialize”] = True/False
Description: Indicates whether to serialize and store the experience replay after each generation. Disabling will reduce I/O overheads but will disable the checkpoint/resume function.

Experience Replay / Start Size

Usage: e[“Solver”][“Experience Replay”][“Start Size”] = unsigned integer
Description: The minimum number of experiences before learning starts.

Experience Replay / Maximum Size

Usage: e[“Solver”][“Experience Replay”][“Maximum Size”] = unsigned integer
Description: The size of the replay memory. If this number is exceeded, experiences are deleted.

Experience Replay / Off Policy / Cutoff Scale

Usage: e[“Solver”][“Experience Replay”][“Off Policy”][“Cutoff Scale”] = float
Description: Initial Cut-Off to classify experiences as on- or off-policy. (c_max in https://arxiv.org/abs/1807.05827)

Experience Replay / Off Policy / Target

Usage: e[“Solver”][“Experience Replay”][“Off Policy”][“Target”] = float
Description: Target fraction of off-policy experiences in the replay memory. (D in https://arxiv.org/abs/1807.05827)

Experience Replay / Off Policy / Annealing Rate

Usage: e[“Solver”][“Experience Replay”][“Off Policy”][“Annealing Rate”] = float
Description: Annealing rate for Off Policy Cutoff Scale and Learning Rate. (A in https://arxiv.org/abs/1807.05827)

Experience Replay / Off Policy / REFER Beta

Usage: e[“Solver”][“Experience Replay”][“Off Policy”][“REFER Beta”] = float
Description: Initial value for the penalisation coefficient for off-policiness. (beta in https://arxiv.org/abs/1807.05827)

Experiences Between Policy Updates

Usage: e[“Solver”][“Experiences Between Policy Updates”] = float
Description: The number of experiences to receive before training/updating (real number, may be less than < 1.0, for more than one update per experience).

State Rescaling / Enabled

Usage: e[“Solver”][“State Rescaling”][“Enabled”] = True/False
Description: Determines whether to normalize the states, such that they have mean 0 and standard deviation 1 (done only once after the initial exploration phase).

Reward / Rescaling / Enabled

Usage: e[“Solver”][“Reward”][“Rescaling”][“Enabled”] = True/False
Description: Determines whether to normalize the rewards, such that they have mean 0 and standard deviation 1

Multi Agent Relationship

Usage: e[“Solver”][“Multi Agent Relationship”] = string
Description: Specifies whether we are in an individual setting or collaborator setting.
Options:
- “Individual”: Each Agent learns solely on his own.
- “Cooperation”: Rewards from each agent are averaged.
- “Competition”: Agent learns solely on his own and Experience Sharing is disabled.

Multi Agent Correlation

Usage: e[“Solver”][“Multi Agent Correlation”] = True/False
Description: Specifies whether we take into account the dependencies of the agents or not.

Multi Agent Sampling

Usage: e[“Solver”][“Multi Agent Sampling”] = string
Description: Specifies how to sample the minibatch.
Options:
- “Tuple”: Sample expId and use same value for all agents.
- “Baseline”: Sample expId and agentId.
- “Experience”: Sample expId separately for all agents.

Termination Criteria

These are the customizable criteria that indicates whether the solver should continue or finish execution. Korali will stop when at least one of these conditions are met. The criteria is expressed in C++ since it is compiled and evaluated as seen here in the engine.

Max Episodes

Usage: e[“Solver”][“Max Episodes”] = unsigned integer
Description: The solver will stop when the given number of episodes have been run.
Criteria: (_mode == "Training") && (_maxEpisodes > 0) && (_currentEpisode >= _maxEpisodes)

Max Experiences

Usage: e[“Solver”][“Max Experiences”] = unsigned integer
Description: The solver will stop when the given number of experiences have been gathered.
Criteria: (_mode == "Training") && (_maxExperiences > 0) && (_experienceCount >= _maxExperiences)

Max Policy Updates

Usage: e[“Solver”][“Max Policy Updates”] = unsigned integer
Description: The solver will stop when the given number of optimization steps have been performed.
Criteria: (_mode == "Training") && (_maxPolicyUpdates > 0) && (_policyUpdateCount >= _maxPolicyUpdates)

Max Model Evaluations

Usage: e[“Solver”][“Max Model Evaluations”] = unsigned integer
Description: Specifies the maximum allowed evaluations of the computational model.
Criteria: _maxModelEvaluations <= _modelEvaluationCount

Max Generations

Usage: e[“Solver”][“Max Generations”] = unsigned integer
Description: Determines how many solver generations to run before stopping execution. Execution can be resumed at a later moment.
Criteria: _k->_currentGeneration > _maxGenerations

Default Configuration

These following configuration will be assigned by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Concurrent Workers": 1,
"Discount Factor": 0.995,
"Episodes Per Generation": 1,
"Experience Replay": {
    "Off Policy": {
        "Annealing Rate": 0.0,
        "Cutoff Scale": 4.0,
        "REFER Beta": 0.3,
        "Target": 0.1
        },
    "Serialize": true
    },
"Importance Weight Truncation Level": 1.0,
"Initial Inverse Temperature": 1.0,
"L2 Regularization": {
    "Enabled": false,
    "Importance": 0.0001
    },
"Mini Batch": {
    "Size": 256
    },
"Model Evaluation Count": 0,
"Multi Agent Correlation": false,
"Multi Agent Relationship": "Individual",
"Multi Agent Sampling": "Tuple",
"Reward": {
    "Rescaling": {
        "Enabled": false
        }
    },
"State Rescaling": {
    "Enabled": false
    },
"Termination Criteria": {
    "Max Episodes": 0,
    "Max Experiences": 0,
    "Max Generations": 10000000000,
    "Max Model Evaluations": 1000000000,
    "Max Policy Updates": 0
    },
"Testing": {
    "Best Policies": {    },
    "Current Policies": {    },
    "Sample Ids": []
    },
"Time Sequence Length": 1,
"Training": {
    "Average Depth": 100,
    "Best Policies": {    },
    "Current Policies": {    }
    },
"Uniform Generator": {
    "Maximum": 1.0,
    "Minimum": 0.0,
    "Name": "Agent / Uniform Generator",
    "Type": "Univariate/Uniform"
    },
"Variable Count": 0
}

Variable Defaults

These following configuration will be assigned to each of the experiment variables by default. Any settings defined by the user will override the given settings specified in these defaults.

{    }