Discrete Reinforcement Learning¶

Specialization of the Reinforcement Learning Problem for continuous action domains.

Usage¶

e["Problem"]["Type"] = "ReinforcementLearning/Discrete"

Compatible Solvers¶

This problem can be solved using the following modules:

Agent/Discrete

Agent

Variable-Specific Settings¶

These are settings required by this module that are added to each of the experiment’s variables when this module is selected.

Type

Usage: e[“Variables”][index][“Type”] = string
Description: Indicates if the variable belongs to the state or action vector.
Options:
- “State”: The variable describes a state.
- “Action”: The variable describes an action.

Lower Bound

Usage: e[“Variables”][index][“Lower Bound”] = float
Description: Lower bound for the variable’s value.

Upper Bound

Usage: e[“Variables”][index][“Upper Bound”] = float
Description: Upper bound for the variable’s value.

Name

Usage: e[“Variables”][index][“Name”] = string
Description: Defines the name of the variable.

Configuration¶

These are settings required by this module.

Possible Actions

Usage: e[“Problem”][“Possible Actions”] = List of Lists of float
Description: The set of all possible actions.

Agents Per Environment

Usage: e[“Problem”][“Agents Per Environment”] = unsigned integer
Description: Number of agents in a given environment. All agents share the same policy .

Environment Function

Usage: e[“Problem”][“Environment Function”] = Computational Model
Description: Function to initialize and run an episode in the environment.

Actions Between Policy Updates

Usage: e[“Problem”][“Actions Between Policy Updates”] = unsigned integer
Description: Number of actions to take before requesting a new policy.

Testing Frequency

Usage: e[“Problem”][“Testing Frequency”] = unsigned integer
Description: Number of generations after which the policy will be forcibly tested (even if it does not meet the threshold).

Training Reward Threshold

Usage: e[“Problem”][“Training Reward Threshold”] = float
Description: Minimum value of the episode’s cummulative sum of rewards for a policy to be considered as candidate.

Policy Testing Episodes

Usage: e[“Problem”][“Policy Testing Episodes”] = unsigned integer
Description: Number of test episodes to run the policy (without noise) for, for which the average average sum of rewards will serve to evaluate the termination criteria.

Custom Settings

Usage: e[“Problem”][“Custom Settings”] = knlohmann::json
Description: Any used-defined settings required by the environment.

Default Configuration¶

These following configuration will be assigned by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Actions Between Policy Updates": 0,
"Agents Per Environment": 1,
"Custom Settings": {    },
"Policy Testing Episodes": 5,
"Testing Frequency": 0
}

Variable Defaults¶

These following configuration will be assigned to each of the experiment variables by default. Any settings defined by the user will override the given settings specified in these defaults.

{
"Lower Bound": -Infinity,
"Type": "State",
"Upper Bound": Infinity
}