Home
A deep reinforcement learning system for loop nest optimization in MLIR
Installation
Start by cloning the repo:
Before proceeding, follow the instructions in the Data section to download and extract the benchmark data files.
Method 1: Using Docker
Build and run the Docker container:
Method 2: Without Docker
Prerequisites
- Conda or Miniconda
Steps
- Install system dependencies and Conda packages:
Start by activating a Conda environment, and then install packages as follows:
conda install -y \
python=3.11 \
git=2.51.2 \
unzip=6.0 \
cmake=4.1.2 \
ninja=1.13.1 \
binutils=2.45 \
c-compiler=1.11.0 \
cxx-compiler=1.11.0 \
clang=21.1.5 \
clangxx=21.1.5 \
llvm-openmp=21.1.5 \
lld=21.1.5 \
poetry=2.2.1 \
-c conda-forge
- Clone and build LLVM/MLIR:
git clone --branch release/19.x --depth 1 https://github.com/llvm/llvm-project.git
cd <path/to/llvm-project>
pip install -r mlir/python/requirements.txt
cmake -S llvm -B build -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_TARGETS_TO_BUILD=X86 \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DLLVM_ENABLE_LLD=ON \
-DMLIR_ENABLE_BINDINGS_PYTHON=ON
cmake --build build --target check-mlir -j
cmake --build build --target check-mlir-python -j
- Set environment variables:
export PATH="<path/to/llvm-project>/build/bin:$PATH"
export PYTHONPATH="</path/to/llvm-project>/build/tools/mlir/python_packages/mlir_core"
export LLVM_BUILD_PATH="</path/to/llvm-project>/build"
export MLIR_SHARED_LIBS="</path/to/llvm-project>/build/lib/libmlir_runner_utils.so,</path/to/llvm-project>/build/lib/libmlir_c_runner_utils.so,$CONDA_PREFIX/lib/libomp.so"
- Build custom tools:
cd </path/to/MLIR-RL-artifact>/tools
cmake -S ast_dumper -B ast_dumper/build -G Ninja \
-DMLIR_DIR=$LLVM_BUILD_PATH/lib/cmake/mlir \
-DLLVM_EXTERNAL_LIT=$LLVM_BUILD_PATH/bin/llvm-lit \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
cmake --build ast_dumper/build -j
cmake -S pre_vec -B pre_vec/build -G Ninja \
-DMLIR_DIR=$LLVM_BUILD_PATH/lib/cmake/mlir \
-DLLVM_EXTERNAL_LIT=$LLVM_BUILD_PATH/bin/llvm-lit \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
cmake --build pre_vec/build -j
- Create
.envfile:
AST_DUMPER_BIN_PATH="</path/to/MLIR-RL-artifact>/tools/ast_dumper/build/bin/AstDumper"
PRE_VEC_BIN_PATH="</path/to/MLIR-RL-artifact>/tools/pre_vec/build/bin/PreVec"
MLIR_SHARED_LIBS="</path/to/llvm-project>/build/lib/libmlir_runner_utils.so,</path/to/llvm-project>/build/lib/libmlir_c_runner_utils.so,</path/to/conda-env>/lib/libomp.so"
OMP_NUM_THREADS=12
The path to conda environment can be found by running echo $CONDA_PREFIX.
- Install Python dependencies:
- Enable Execution of Scripts:
Data
Download the benchmarks file from file link and place it in the data/ directory.
Extract the benchmark files:
The data/ directory contains execution time JSON files that specify baseline execution times (in nanoseconds) and determine which benchmarks to use:
execution_times_train.json- Training benchmark setexecution_times_eval.json- Evaluation benchmark setexecution_times_eval_full.json- Full models evaluation benchmarksexecution_times_eval_nn.json- Neural networks single operators evaluation benchmarksexecution_times_eval_lqcd.json- Lattice QCD evaluation benchmarks
These files have the format:
Configuration
Configuration files are located in config/ and control all aspects of training and evaluation. The system uses the config file specified by the CONFIG_FILE_PATH environment variable.
Configuration Parameters
Model Architecture
max_num_stores_loads(int): Maximum number of load/store operations in nested loopsmax_num_loops(int): Maximum number of nested loopsmax_num_load_store_dim(int): Maximum number of dimensions in load/store buffersnum_tile_sizes(int): Number of tile sizes to considervect_size_limit(int): Vectorization size limit to prevent excessive vectorization
Action Space
order(list[list[str]]): Enforced sequence of actions. Each inner list specifies allowed actions at that step:- Action symbols:
I(Interchange),T(Tiling),TP(TiledParallelization),TF(TiledFusion),V(Vectorization),NT(NoTransformation) !: Special symbol meaning "Allow everything, except these actions", e.g.["!", "I", "NT"]means "Allow everything, except Interchange and NoTransformation"interchange_mode("enumerate" | "pointers" | "continuous"): Method for sampling interchange actions
Exploration
exploration(list): List of exploration strategies -["entropy"]or["epsilon"]or bothinit_epsilon(float): Initial epsilon value for epsilon-greedy exploration (decays over training)
Normalization
normalize_bounds("none" | "max" | "log"): How to normalize loop bounds in the inputnormalize_adv("none" | "standard" | "max-abs"): Advantage normalization method for PPO
Experience Replay
reuse_experience("none" | "random" | "topk"): Experience replay strategyreplay_count(int): Number of trajectories to keep in replay buffer
Training Hyperparameters
bench_count(int): Number of collected benchmarks per training iterationnb_iterations(int): Total number of training iterationsppo_epochs(int): Number of PPO update epochs per iterationppo_batch_size(int): Batch size for PPO updatesvalue_epochs(int): Number of value function update epochs (0 to update with policy)value_batch_size(int): Batch size for value function updatesvalue_coef(float): Value loss coefficient in combined lossvalue_clip(bool): Whether to clip value function lossentropy_coef(float): Entropy bonus coefficient for explorationlr(float): Learning rate for Adam optimizertruncate(int): Maximum number of transformation steps per operation
Data Sources
benchmarks_folder_path(str): Path to directory containing.mlirbenchmark filesjson_file(str): Path to training execution times JSON fileeval_json_file(str): Path to evaluation execution times JSON file
Logging
results_dir(str): Directory where results will be savedtags(list[str]): Optional tags for experiment trackingdebug(bool): Enable debug modemain_exec_data_file(str): Path to global execution cache file (optional)
Example Configuration
{
"max_num_stores_loads": 7,
"max_num_loops": 12,
"max_num_load_store_dim": 12,
"num_tile_sizes": 7,
"vect_size_limit": 512,
"order": [["I"], ["!", "I", "NT"], ["!", "I"], ["V", "NT"]],
"interchange_mode": "pointers",
"exploration": ["entropy"],
"init_epsilon": 0.0,
"normalize_bounds": "max",
"normalize_adv": "standard",
"reuse_experience": "none",
"benchmarks_folder_path": "data/code_files",
"bench_count": 64,
"replay_count": 0,
"nb_iterations": 20000,
"ppo_epochs": 4,
"ppo_batch_size": 64,
"value_epochs": 0,
"value_batch_size": 0,
"value_coef": 0.5,
"value_clip": false,
"entropy_coef": 0.01,
"lr": 0.001,
"truncate": 5,
"json_file": "data/execution_times_train.json",
"eval_json_file": "data/execution_times_eval.json",
"tags": [],
"debug": false,
"main_exec_data_file": "",
"results_dir": "results"
}
Usage
Training
Train the RL model:
Results will be saved to results/run_<id>/:
- Model snapshots:
results/run_<id>/models/- Model checkpoints saved every 5 iterations - Logs:
results/run_<id>/log/- Training metrics including speedups, losses, rewards, and entropy values
Evaluation
Evaluate saved models:
This command will evaluate all saved models in the models/ directory.
Results will be saved to results/run_<id>/:
- Evaluation logs:
results/run_<id>/log/- Speedup metrics for each evaluated model
Paper Results
Evaluate the latest model in the models/ directory on the evaluation benchmarks from the paper:
Results will be saved to paper/results/ as a JSON file in the format:
As well as figures similar to those in the paper in the paper/figures/ directory.
Authors
- Mohammed Tirichine (km_tirichine@esi.dz)
- Nassim Ameur (kn_ameur@esi.dz)
- Iheb Nassim Aouadj (nassimiheb.aouadj@gmail.com)
- Nazim Bendib (jn_bendib@esi.dz)
- Bouchama Djad (bouchamadjad@gmail.com)
- Rafik Bouloudene (rafikobouloudene@gmail.com)
- Riyadh Baghdadi (baghdadi@nyu.edu)