Trajectory
Trajectory data structures and utilities for RL training.
This module provides classes for collecting and managing trajectory data during RL training, including trajectory storage, data loading, advantage computation, and experience replay. It implements the TrajectoryData dataset interface and trajectory collection utilities.
Attributes:
| Name | Type | Description |
|---|---|---|
DYNAMIC_ATTRS |
list[str]
|
List of dynamic attributes that are computed during trajectory processing. |
TopKAdvantageSampler(data_source, num_samples)
Sampler that yields indices of top-K advantage samples in random order.
Selects the top-K samples with highest absolute advantage values for experience replay. This focuses training on the most impactful samples.
Attributes:
| Name | Type | Description |
|---|---|---|
data_source |
TrajectoryData
|
The trajectory dataset. |
num_samples |
int
|
Maximum number of top samples to include. |
top_k_indices |
Tensor
|
Indices of the top-K samples. |
Source code in mlir_rl_artifact/trajectory.py
__iter__()
Returns an iterator over shuffled indices of the top-k samples. This is called by the DataLoader at the start of each epoch.
Yields:
| Type | Description |
|---|---|
int
|
An iterator over shuffled indices of the top-k samples. |
Source code in mlir_rl_artifact/trajectory.py
__len__()
The total number of samples to be drawn.
Returns:
| Type | Description |
|---|---|
int
|
The total number of samples to be drawn. |
TrajectoryData(num_loops, actions_index, obs, next_obs, actions_bev_log_p, rewards, done)
Bases: Dataset
Dataset to store the trajectory data.
Attributes:
| Name | Type | Description |
|---|---|---|
sizes |
list[int]
|
List of sizes of all the included trajectories |
num_loops |
Tensor
|
Number of loops in the trajectory. |
actions_index |
Tensor
|
Actions in the trajectory. |
obs |
Tensor
|
Observations in the trajectory. |
next_obs |
Tensor
|
Observations of next states in the trajectory. |
actions_bev_log_p |
Tensor
|
Action log probabilities following behavioral policy in the trajectory. |
rewards |
Tensor
|
Rewards in the trajectory. |
done |
Tensor
|
Done flags in the trajectory. |
values |
Tensor
|
Values of actions in the trajectory. |
next_values |
Tensor
|
Values of actions in the trajectory with one additional step (shifted to one step in the future). |
actions_old_log_p |
Tensor
|
Action log probabilities following old policy in the trajectory. |
off_policy_rates |
Tensor
|
Off-policy rates (rho) for the current policy. |
returns |
Tensor
|
Returns in the trajectory. |
advantages |
Tensor
|
Advantages in the trajectory. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_loops
|
Tensor
|
Number of loops in the trajectory. |
required |
actions_index
|
Tensor
|
Actions in the trajectory. |
required |
obs
|
Tensor
|
Observations in the trajectory. |
required |
next_obs
|
Tensor
|
Observations of next states in the trajectory. |
required |
actions_bev_log_p
|
Tensor
|
Action log probabilities following behavioral policy in the trajectory. |
required |
rewards
|
Tensor
|
Rewards in the trajectory. |
required |
done
|
Tensor
|
Done flags in the trajectory. |
required |
Source code in mlir_rl_artifact/trajectory.py
__len__()
__getitem__(idx)
Get a single timestep from the trajectory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
int
|
Index of the timestep to retrieve. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Tensor, ...]
|
A tuple containing the timestep data. |
Source code in mlir_rl_artifact/trajectory.py
__add__(other)
Concatenate this trajectory with another.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
TrajectoryData
|
The other trajectory to concatenate with |
required |
Returns:
| Type | Description |
|---|---|
TrajectoryData
|
The trajectory containing both |
Source code in mlir_rl_artifact/trajectory.py
loader(batch_size, num_trajectories)
Create a DataLoader for the trajectory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch_size
|
int | None
|
Batch size for the DataLoader (None for full trajectory). |
required |
num_trajectories
|
int
|
Number of trajectories to use for training. |
required |
Returns:
| Type | Description |
|---|---|
DataLoader
|
The DataLoader for the trajectory. |
Source code in mlir_rl_artifact/trajectory.py
copy()
Copy the trajectory.
Returns:
| Type | Description |
|---|---|
TrajectoryData
|
The copied trajectory. |
Source code in mlir_rl_artifact/trajectory.py
update_attributes(model)
Update the attributes of the trajectory following the new model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
HiearchyModel
|
The model to use for updating the attributes. |
required |
Source code in mlir_rl_artifact/trajectory.py
__compute_rho()
Compute the off-policy rate (rho) for the current policy.
Returns:
| Type | Description |
|---|---|
Tensor
|
The off-policy rate. |
Source code in mlir_rl_artifact/trajectory.py
__compute_returns(gamma=1.0)
Compute the returns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gamma
|
float
|
discount factor. Defaults to 1. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
The returns. |
Source code in mlir_rl_artifact/trajectory.py
__compute_gae(gamma=1.0, lambda_=0.95)
Compute the Generalized Advantage Estimation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gamma
|
float
|
discount factor. |
1.0
|
lambda_
|
float
|
GAE factor. |
0.95
|
Returns:
| Type | Description |
|---|---|
Tensor
|
The advantages. |
Source code in mlir_rl_artifact/trajectory.py
TrajectoryCollector()
Class that appends timestep data to a trajectory.
Attributes:
| Name | Type | Description |
|---|---|---|
num_loops |
list[int]
|
Number of loops in the trajectory. |
actions_index |
list[Tensor]
|
Actions in the trajectory. |
obs |
list[Tensor]
|
Observations in the trajectory. |
next_obs |
list[Tensor]
|
Observations of next states in the trajectory. |
actions_bev_log_p |
list[float]
|
Action log probabilities following behavioral policy in the trajectory. |
rewards |
list[float]
|
Rewards in the trajectory. |
done |
list[bool]
|
Done flags in the trajectory. |
Source code in mlir_rl_artifact/trajectory.py
__add__(other)
Add another trajectory collector to the current one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
TrajectoryCollector
|
The other trajectory collector to add. |
required |
Returns:
| Type | Description |
|---|---|
TrajectoryCollector
|
The current trajectory collector (after addition). |
Source code in mlir_rl_artifact/trajectory.py
append(num_loops, action_index, obs, next_obs, action_bev_log_p, reward, done)
Append a single timestep to the trajectory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_loops
|
int
|
Number of loops in the timestep. |
required |
action_index
|
Tensor
|
Action index in the timestep. |
required |
obs
|
Tensor
|
Observation in the timestep. |
required |
next_obs
|
Tensor
|
Observation of next state in the timestep. |
required |
action_bev_log_p
|
float
|
Action log probability following behavioral policy in the timestep. |
required |
reward
|
float
|
Reward in the timestep. |
required |
done
|
bool
|
Done flag in the timestep. |
required |
Source code in mlir_rl_artifact/trajectory.py
to_trajectory()
Convert the collected data to a TrajectoryData object.
Returns:
| Type | Description |
|---|---|
TrajectoryData
|
The trajectory containing all collected data. |
Source code in mlir_rl_artifact/trajectory.py
reset()
Reset the trajectory collector.