Model
Neural network models for MLIR RL policy and value estimation.
This module implements the deep RL components including the policy model, value model, and LSTM-based producer-consumer embedding. The policy model outputs action distributions for different transformation types, while the value model estimates state values for advantage computation.
HiearchyModel()
Bases: Module
Hierarchical reinforcement learning model for MLIR code optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
policy_model |
PolicyModel
|
The policy model. |
value_model |
ValueModel
|
The value model. |
Source code in mlir_rl_artifact/model.py
forward(obs, actions_index)
Forward pass of the hierarchical model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Tensor
|
The input tensor. |
required |
actions_index
|
Tensor
|
The indices of actions. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
The log probabilities of actions. |
Tensor
|
Values. |
Tensor
|
Entropies. |
Source code in mlir_rl_artifact/model.py
sample(obs, greedy=False, eps=None)
Sample an action from the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Tensor
|
The input tensor. |
required |
greedy
|
bool
|
Whether to sample greedily. |
False
|
eps
|
float | None
|
Epsilon value for exploration. Defaults to None. |
None
|
Note
If greedy is True, eps must be None.
Returns:
| Type | Description |
|---|---|
Tensor
|
Sampled actions index. |
Tensor
|
Actions log probability. |
Tensor
|
Resulting entropy. |
Source code in mlir_rl_artifact/model.py
ValueModel()
Bases: Module
Value model for MLIR code optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
lstm |
LSTMEmbedding
|
The LSTM-based producer-consumer embedding. |
network |
Sequential
|
The value network (backbone + value output). |
Source code in mlir_rl_artifact/model.py
forward(obs)
loss(new_values, values, returns)
Calculate the value loss.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_values
|
Tensor
|
The current value tensor. |
required |
values
|
Tensor
|
The old value tensor (for clipping). |
required |
returns
|
Tensor
|
The returns tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
The value loss. |
Source code in mlir_rl_artifact/model.py
PolicyModel()
Bases: Module
Policy model for MLIR code optimization.
Attributes:
| Name | Type | Description |
|---|---|---|
lstm |
LSTMEmbedding
|
The LSTM-based producer-consumer embedding. |
backbone |
Sequential
|
The backbone of the policy model. |
heads |
ModuleList
|
The hierarchical outputs of the policy model (tranformation selection + params selection). |
log_std |
Parameter
|
The log standard deviation parameter (in case of continuous interchange). |
Source code in mlir_rl_artifact/model.py
forward(obs)
Forward pass of the policy model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Tensor
|
The input tensor. |
required |
Returns:
| Type | Description |
|---|---|
list[Distribution | None]
|
The distributions for each action. |
Source code in mlir_rl_artifact/model.py
loss(actions_log_p, actions_bev_log_p, off_policy_rates, advantages, clip_range=0.2)
Calculate the policy loss.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actions_log_p
|
Tensor
|
The log probabilities of the new actions. |
required |
actions_bev_log_p
|
Tensor
|
The log probabilities of the actions under the behavior policy. |
required |
off_policy_rates
|
Tensor
|
The rate between the old policy and the behavioral (mu) policy. |
required |
advantages
|
Tensor
|
The advantages of the actions. |
required |
clip_range
|
float
|
The clipping range for the policy loss. |
0.2
|
Returns:
| Type | Description |
|---|---|
Tensor
|
The policy loss. |
Tensor
|
The ratio clip fraction (for logging purposes) |
Source code in mlir_rl_artifact/model.py
LSTMEmbedding()
Bases: Module
LSTM-based embedding layer for producer-consumer encoding.
Encodes operation features of both the consumer and producre into a dense embedding using LSTM layers.
Attributes:
| Name | Type | Description |
|---|---|---|
output_size |
int
|
The output size of the embedding. |
embedding |
Sequential
|
The embedding layer. |
lstm |
LSTM
|
The LSTM layer. |
Source code in mlir_rl_artifact/model.py
forward(obs)
Forward pass of the LSTM embedding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obs
|
Tensor
|
The input tensor. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
The embedded tensor. |