Enzyme GitHub Copilot Instructions

Project Overview

Enzyme is a high-performance automatic differentiation (AD) plugin for LLVM and MLIR. It performs AD of statically analyzable LLVM IR and MLIR code through compiler transformations, achieving state-of-the-art performance by working on optimized code.

Key Concepts

  • Automatic Differentiation: Enzyme transforms functions to compute derivatives automatically
  • LLVM Plugin: Works as an LLVM transformation pass on intermediate representation (IR)
  • Activity Analysis: Determines which values are relevant for differentiation
  • Type Analysis: Tracks and propagates type information through the IR
  • Gradient Utilities: Core infrastructure for computing forward and reverse-mode derivatives

Repository Structure

enzyme/
├── Enzyme/          # Core AD transformation logic
│   ├── ActivityAnalysis.cpp/h      # Determines active/inactive values
│   ├── TypeAnalysis/               # Type tracking and propagation
│   ├── EnzymeLogic.cpp/h           # Main AD transformation logic
│   ├── GradientUtils.cpp/h         # Gradient computation utilities
│   ├── AdjointGenerator.h          # Reverse-mode AD code generation
│   ├── CacheUtility.cpp/h          # Caching mechanism for AD
│   ├── FunctionUtils.cpp/h         # Function manipulation utilities
│   ├── MLIR/                       # MLIR dialect and passes
│   └── Clang/                      # Clang plugin integration
├── BCLoad/          # Bitcode loader for runtime library support
├── test/            # Test suite using LLVM lit
│   ├── Enzyme/                     # Core Enzyme tests
│   ├── ActivityAnalysis/           # Activity analysis tests
│   ├── TypeAnalysis/               # Type analysis tests
│   ├── Integration/                # Integration tests (ReverseMode, ForwardMode, etc.)
│   └── MLIR/                       # MLIR-specific tests
├── tools/           # Additional tools
│   └── enzyme-tblgen/              # TableGen tool for derivative rules
└── cmake/           # CMake configuration files

Building the Project

Prerequisites

  • CMake 3.13+
  • LLVM development libraries (check .github/workflows/enzyme-ci.yml for supported versions)
  • Ninja or Make build system
  • Python 3 with lit (for testing)

Standard Build

cd enzyme
mkdir build && cd build
cmake -G Ninja .. \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_DIR=/path/to/llvm/lib/cmake/llvm \
  -DLLVM_EXTERNAL_LIT=$(which lit)
ninja

Build Types

  • Release: Optimized build (default)
  • Debug: Debug symbols, no optimization
  • RelWithDebInfo: Optimized with debug symbols

Build Options

  • ENZYME_ENABLE_PLUGINS: Enable Clang/LLD/Opt plugins (default: ON)
  • ENZYME_BC_LOADER: Enable bitcode loader (default: ON)
  • ENZYME_CLANG: Build enzyme clang plugin (default: ON)
  • ENZYME_MLIR: Build enzyme MLIR plugin (default: OFF)
  • ENZYME_STATIC_LIB: Build static library (default: OFF)

Testing

Enzyme uses LLVM's lit test framework. Tests are written as LLVM IR (.ll), C/C++ (.c, .cpp), MLIR (.mlir), or Fortran (.f90) files with RUN directives.

Running Tests

cd build
make check-enzyme           # Run all Enzyme tests
make check-typeanalysis     # Run Type Analysis tests
make check-activityanalysis # Run Activity Analysis tests

Test File Structure

Tests use lit directives:

; RUN: %opt < %s %loadEnzyme -enzyme -S | FileCheck %s

Tests should verify both correctness and performance properties where applicable.

Code Style and Formatting

C++ Style

Formatting Commands

# Format changed files in last commit
git clang-format HEAD~1

# Format specific file
clang-format -i path/to/file.cpp

Code Organization

  • Header guards: Use #ifndef HEADER_NAME_H format
  • Include order: Local headers, then LLVM headers, then system headers
  • Prefer forward declarations to reduce header dependencies
  • Use anonymous namespaces for file-local helpers

Development Guidelines

Adding New Features

  1. Understand the IR: Enzyme operates on LLVM IR and/or MLIR. Familiarize yourself with the IR being transformed.
  2. Activity Analysis First: Ensure values are properly marked as active/inactive
  3. Type Analysis: Use Type Analysis to understand data flow and pointer types
  4. Test-Driven Development: Write tests before implementing features
  5. Check Existing Derivatives: Look at CallDerivatives.cpp and BlasDerivatives.td for examples

Common Patterns

Pass Registration

New transformation passes should use AnalysisInfoMixin pattern (even though they modify IR):

class MyPass final : public AnalysisInfoMixin<MyPass> {
  friend struct AnalysisInfoMixin<MyPass>;
private:
  static AnalysisKey Key;
public:
  using Result = PreservedAnalyses;
  Result run(Function &F, FunctionAnalysisManager &AM);
  static bool isRequired() { return true; }
};

Type Analysis

  • When GEP has null pointer operand, don't propagate pointer type to result
  • Use TypeTree for tracking type information
  • Always validate type propagation results

Activity Analysis

  • Values are “active” if they depend on differentiated inputs
  • Use isConstantValue and isConstantInstruction carefully
  • Consider both data and control flow dependencies

Memory Safety

  • Avoid unnecessary allocations in hot paths
  • Be careful with pointer ownership and lifetime
  • Use LLVM's memory management patterns (e.g., SmallVector, DenseMap)
  • Never assume pointer validity without checking

Error Handling

  • Prefer EmitFailure over compile-time crashes when possible for better error diagnostics:
    EmitFailure("RemarkName", Loc, CodeRegion, "Error message: ", value);
    
  • Use CustomErrorHandler to provide user-customizable error handling (especially useful for language bindings)
    • CustomErrorHandler is a global variable set by tools that use Enzyme as a library
    • If non-null, it should be checked and used in preference to other error handlers
    • Functions like EmitNoDerivativeError, EmitNoTypeError implement this pattern: they first check if CustomErrorHandler is set and call it, otherwise fall back to EmitFailure
  • Use LLVM's error reporting: llvm::errs(), dbgs(), assertions
  • Provide meaningful diagnostic messages
  • Use llvm_unreachable() for impossible cases

Pull Request Guidelines

Before Submitting

  1. Run clang-format on all changed code
  2. Add tests that cover your changes (required)
  3. Run relevant test suites to ensure no regressions
  4. Keep changes focused: One logical change per PR
  5. Update documentation if adding new features or changing behavior

PR Requirements

  • Include a small unit test demonstrating the change
  • Conform to LLVM Coding Standards
  • No unrelated changes
  • Be an isolated change (split independent changes into separate PRs)

Commit Messages

  • First line: Brief summary (50 chars or less)
  • Blank line
  • Detailed description if needed
  • Reference issues: “Fixes #123” or “Closes #456”

Debugging Tips

Using Compiler Explorer

The easiest way to explore and debug Enzyme: https://enzyme.mit.edu/explorer

Debug Builds

Build with CMAKE_BUILD_TYPE=Debug for better debugging:

  • Full debug symbols
  • No optimization (easier to follow execution)
  • Additional assertions enabled

Useful LLVM Flags

opt -load-pass-plugin=path/to/LLVMEnzyme-<version>.so \
    -enzyme -enzyme-print -debug -debug-only=enzyme \
    input.ll -S -o output.ll

Print Debugging

llvm::errs() << "Debug message: " << value << "\n";
dbgs() << "Debug-only message\n";  // Only with -debug flag

Common Issues

Build Failures

  • Missing LLVM: Ensure LLVM_DIR points to the correct LLVM installation
  • Version support: Enzyme supports LLVM 15 through mainline. However, lit tests currently only pass on LLVM 15 and 16, which support both typed and opaque pointers. LLVM 17+ only supports opaque pointers. Help migrating lit tests to work with both typed and opaque pointers (depending on LLVM version) is appreciated.
  • lit not found: Install with pip install lit and set LLVM_EXTERNAL_LIT

Test Failures

  • Run tests locally before submitting PRs
  • Check if tests are flaky or environment-dependent
  • Ensure test expectations match the actual output format

Type Analysis Issues

  • Null pointer GEP: Don't propagate types from null pointers
  • Conflicting types: Review updateAnalysis calls carefully
  • Missing type information: Check if analysis is run before usage

Additional Resources

Language Integrations

Enzyme can be integrated with any language that compiles to LLVM IR:

  • C/C++: Via Clang plugin
  • Fortran: Via Flang support
  • Julia: Via Enzyme.jl package
  • Rust: Via rust-enzyme bindings

When adding language-specific features, ensure they work correctly with the core AD transformation.

Test Maintenance Guidelines

LLVM IR Tests (.ll)

  • Robustness: Enable tests to pass across multiple LLVM versions (15+).
    • Remove Attribute Groups: Strip attribute group references (e.g., #0) from CHECK lines for function calls and definitions. Attribute numbering is unstable.
    • Remove Trailing Braces: Do not include the opening { in function definition CHECK lines.
    • Capture Variables: Use FileCheck variable captures (e.g., [[VAR:%.+]]) instead of hardcoded SSA values. IMPORTANT: Remember to include the % prefix when using the captured variable (e.g., %[[VAR]]), as the capture usually only matches the numeric ID.
    • No Empty Checks: Do not use CHECK-NEXT: for empty lines. Resume matching with CHECK: for the next basic block or instruction sequence.
    • PHI Nodes: For forward references in PHI nodes (loop backedges), capture the variable in the usage (e.g., [ %[[VAR:[0-9]+]], ... ]) and verify it in the subsequent definition (%[[VAR]] = ...).
  • Integration Tests: Always verify changes to core derivatives (like BLAS) by running relevant integration tests (e.g., test/Integration/ReverseMode/blas.cpp).
  • Run Tests Locally: Always run ninja check-enzyme or llvm-lit on modified tests before submitting to ensure they pass.