Environment
Reinforcement learning environment for MLIR RL.
This module implements the RL environment that simulates MLIR code transformations. It manages the state transitions, reward computation, and execution of transformation sequences. The environment tracks operations across benchmarks and evaluates the effectiveness of optimizations.
Env
RL Environment class
Attributes:
| Name | Type | Description |
|---|---|---|
bench_idx |
int
|
Index of the selected benchmark. |
benchmark_data |
BenchmarkFeatures
|
Features of the selected benchmark. |
reset(benchs, bench_idx=None)
Reset the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
benchs
|
Benchmarks
|
The benchmarks dataset. |
required |
bench_idx
|
int | None
|
The index of the benchmark to set the environment to. If None, a random benchmark is selected. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
OperationState
|
The initial state of the environment. |
Source code in mlir_rl_artifact/env.py
step(state, action)
Take a step in the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
OperationState
|
The current state. |
required |
action
|
Action
|
The action to take. |
required |
Returns:
| Type | Description |
|---|---|
OperationState
|
The new state after applying the action. The state's terminal flag is set if the action failed, is terminal, or the truncation step limit is reached. |
Source code in mlir_rl_artifact/env.py
get_next_op_state(state)
Get the state that represents the next operation (None if benchmark is done).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
OperationState
|
The current state. |
required |
Returns:
| Type | Description |
|---|---|
OperationState | None
|
The next state. If None then bench is done. |
Source code in mlir_rl_artifact/env.py
apply_and_run_sequence(seq)
Apply the sequence of actions to the state's code and run it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
list[list[Action]]
|
The sequence of actions to apply. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
The rewards received. |
float
|
The final speedup. |
int | None
|
The execution time. |
bool
|
Whether it was a cache miss. |
Source code in mlir_rl_artifact/env.py
failed_seq(seq)
Generate results for a failed sequence. Typically used for aborted states which never reached the last operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
list[list[Action]]
|
The sequence of actions that failed. |
required |
Returns:
| Type | Description |
|---|---|
list[float]
|
The rewards received. |
float
|
The final speedup. |
int | None
|
The execution time. |
bool
|
Whether it was a cache miss. |
Source code in mlir_rl_artifact/env.py
__init_op_state(operation_idx)
Create a new operation state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
operation_idx
|
int
|
The operation index. |
required |
Returns:
| Type | Description |
|---|---|
OperationState
|
The new operation state. |
Source code in mlir_rl_artifact/env.py
__current_op_index(state)
Get the index of the current operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
OperationState
|
The current state. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The index of the current operation. |
Source code in mlir_rl_artifact/env.py
__bench_is_done(state)
Check if the benchmark is done.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
OperationState
|
The current state. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
A flag indicating if the benchmark is done. |
Source code in mlir_rl_artifact/env.py
__action_reward(trans_succeeded, exec_succeeded=None, new_exec_time=None, old_exec_time=None)
Get the reward of the action based on the transformation and execution results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trans_succeeded
|
bool
|
A flag indicating if the transformation was successful. |
required |
exec_succeeded
|
bool | None
|
A flag indicating if the execution was successful. (required if trans succeeded) |
None
|
new_exec_time
|
int | None
|
The execution time after transformation. (required if exec succeeded) |
None
|
old_exec_time
|
int | None
|
The original execution time. (required if exec succeeded) |
None
|
Returns:
| Type | Description |
|---|---|
float
|
The reward of the action. |
Source code in mlir_rl_artifact/env.py
__speedup_reward(new, old)
Get the reward based on the speedup.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new
|
int
|
The new execution time. |
required |
old
|
int
|
The old execution time. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The calculated reward. |
Source code in mlir_rl_artifact/env.py
__update_state_infos(state, action)
Update state infos after applying a transformation.
Updated fields are:
- transformation_history
- producers features in case of fusion
- operation features (to reflect the transformation)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state
|
OperationState
|
The current state. |
required |
action
|
Action
|
The action taken. |
required |
Source code in mlir_rl_artifact/env.py
__apply_sequence(seq)
Apply the sequence of actions to the state's code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
list[list[Action]]
|
the sequence of actions to apply. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Module, list[float]]
|
The resulting code and rewards received for each action in the sequence. |