| # MLIR Python Bindings | 
 |  | 
 | Current status: Under development and not enabled by default | 
 |  | 
 |  | 
 | ## Building | 
 |  | 
 | ### Pre-requisites | 
 |  | 
 | * [`pybind11`](https://github.com/pybind/pybind11) must be installed and able to | 
 |   be located by CMake. | 
 | * A relatively recent Python3 installation | 
 |  | 
 | ### CMake variables | 
 |  | 
 | * **`MLIR_BINDINGS_PYTHON_ENABLED`**`:BOOL` | 
 |  | 
 |   Enables building the Python bindings. Defaults to `OFF`. | 
 |  | 
 | * **`MLIR_PYTHON_BINDINGS_VERSION_LOCKED`**`:BOOL` | 
 |  | 
 |   Links the native extension against the Python runtime library, which is | 
 |   optional on some platforms. While setting this to `OFF` can yield some greater | 
 |   deployment flexibility, linking in this way allows the linker to report | 
 |   compile time errors for unresolved symbols on all platforms, which makes for a | 
 |   smoother development workflow. Defaults to `ON`. | 
 |  | 
 | * **`PYTHON_EXECUTABLE`**:`STRING` | 
 |  | 
 |   Specifies the `python` executable used for the LLVM build, including for | 
 |   determining header/link flags for the Python bindings. On systems with | 
 |   multiple Python implementations, setting this explicitly to the preferred | 
 |   `python3` executable is strongly recommended. | 
 |  | 
 |  | 
 | ## Design | 
 |  | 
 | ### Use cases | 
 |  | 
 | There are likely two primary use cases for the MLIR python bindings: | 
 |  | 
 | 1. Support users who expect that an installed version of LLVM/MLIR will yield | 
 |    the ability to `import mlir` and use the API in a pure way out of the box. | 
 |  | 
 | 2. Downstream integrations will likely want to include parts of the API in their | 
 |    private namespace or specially built libraries, probably mixing it with other | 
 |    python native bits. | 
 |  | 
 |  | 
 | ### Composable modules | 
 |  | 
 | In order to support use case #2, the Python bindings are organized into | 
 | composable modules that downstream integrators can include and re-export into | 
 | their own namespace if desired. This forces several design points: | 
 |  | 
 | * Separate the construction/populating of a `py::module` from `PYBIND11_MODULE` | 
 |   global constructor. | 
 |  | 
 | * Introduce headers for C++-only wrapper classes as other related C++ modules | 
 |   will need to interop with it. | 
 |  | 
 | * Separate any initialization routines that depend on optional components into | 
 |   its own module/dependency (currently, things like `registerAllDialects` fall | 
 |   into this category). | 
 |  | 
 | There are a lot of co-related issues of shared library linkage, distribution | 
 | concerns, etc that affect such things. Organizing the code into composable | 
 | modules (versus a monolithic `cpp` file) allows the flexibility to address many | 
 | of these as needed over time. Also, compilation time for all of the template | 
 | meta-programming in pybind scales with the number of things you define in a | 
 | translation unit. Breaking into multiple translation units can significantly aid | 
 | compile times for APIs with a large surface area. | 
 |  | 
 | ### Submodules | 
 |  | 
 | Generally, the C++ codebase namespaces most things into the `mlir` namespace. | 
 | However, in order to modularize and make the Python bindings easier to | 
 | understand, sub-packages are defined that map roughly to the directory structure | 
 | of functional units in MLIR. | 
 |  | 
 | Examples: | 
 |  | 
 | * `mlir.ir` | 
 | * `mlir.passes` (`pass` is a reserved word :( ) | 
 | * `mlir.dialect` | 
 | * `mlir.execution_engine` (aside from namespacing, it is important that | 
 |   "bulky"/optional parts like this are isolated) | 
 |  | 
 | In addition, initialization functions that imply optional dependencies should | 
 | be in underscored (notionally private) modules such as `_init` and linked | 
 | separately. This allows downstream integrators to completely customize what is | 
 | included "in the box" and covers things like dialect registration, | 
 | pass registration, etc. | 
 |  | 
 | ### Loader | 
 |  | 
 | LLVM/MLIR is a non-trivial python-native project that is likely to co-exist with | 
 | other non-trivial native extensions. As such, the native extension (i.e. the | 
 | `.so`/`.pyd`/`.dylib`) is exported as a notionally private top-level symbol | 
 | (`_mlir`), while a small set of Python code is provided in `mlir/__init__.py` | 
 | and siblings which loads and re-exports it. This split provides a place to stage | 
 | code that needs to prepare the environment *before* the shared library is loaded | 
 | into the Python runtime, and also provides a place that one-time initialization | 
 | code can be invoked apart from module constructors. | 
 |  | 
 | To start with the `mlir/__init__.py` loader shim can be very simple and scale to | 
 | future need: | 
 |  | 
 | ```python | 
 | from _mlir import * | 
 | ``` | 
 |  | 
 | ### Limited use of globals | 
 |  | 
 | For normal operations, parent-child constructor relationships are realized with | 
 | constructor methods on a parent class as opposed to requiring | 
 | invocation/creation from a global symbol. | 
 |  | 
 | For example, consider two code fragments: | 
 |  | 
 | ```python | 
 |  | 
 | op = build_my_op() | 
 |  | 
 | region = mlir.Region(op) | 
 |  | 
 | ``` | 
 |  | 
 | vs | 
 |  | 
 | ```python | 
 |  | 
 | op = build_my_op() | 
 |  | 
 | region = op.new_region() | 
 |  | 
 | ``` | 
 |  | 
 | For tightly coupled data structures like `Operation`, the latter is generally | 
 | preferred because: | 
 |  | 
 | * It is syntactically less possible to create something that is going to access | 
 |   illegal memory (less error handling in the bindings, less testing, etc). | 
 |  | 
 | * It reduces the global-API surface area for creating related entities. This | 
 |   makes it more likely that if constructing IR based on an Operation instance of | 
 |   unknown providence, receiving code can just call methods on it to do what they | 
 |   want versus needing to reach back into the global namespace and find the right | 
 |   `Region` class. | 
 |  | 
 | * It leaks fewer things that are in place for C++ convenience (i.e. default | 
 |   constructors to invalid instances). | 
 |  | 
 | ### Use the C-API | 
 |  | 
 | The Python APIs should seek to layer on top of the C-API to the degree possible. | 
 | Especially for the core, dialect-independent parts, such a binding enables | 
 | packaging decisions that would be difficult or impossible if spanning a C++ ABI | 
 | boundary. In addition, factoring in this way side-steps some very difficult | 
 | issues that arise when combining RTTI-based modules (which pybind derived things | 
 | are) with non-RTTI polymorphic C++ code (the default compilation mode of LLVM). | 
 |  | 
 |  | 
 | ## Style | 
 |  | 
 | In general, for the core parts of MLIR, the Python bindings should be largely | 
 | isomorphic with the underlying C++ structures. However, concessions are made | 
 | either for practicality or to give the resulting library an appropriately | 
 | "Pythonic" flavor. | 
 |  | 
 | ### Properties vs get*() methods | 
 |  | 
 | Generally favor converting trivial methods like `getContext()`, `getName()`, | 
 | `isEntryBlock()`, etc to read-only Python properties (i.e. `context`). It is | 
 | primarily a matter of calling `def_property_readonly` vs `def` in binding code, | 
 | and makes things feel much nicer to the Python side. | 
 |  | 
 | For example, prefer: | 
 |  | 
 | ```c++ | 
 | m.def_property_readonly("context", ...) | 
 | ``` | 
 |  | 
 | Over: | 
 |  | 
 | ```c++ | 
 | m.def("getContext", ...) | 
 | ``` | 
 |  | 
 | ### __repr__ methods | 
 |  | 
 | Things that have nice printed representations are really great :)  If there is a | 
 | reasonable printed form, it can be a significant productivity boost to wire that | 
 | to the `__repr__` method (and verify it with a [doctest](#sample-doctest)). | 
 |  | 
 | ### CamelCase vs snake_case | 
 |  | 
 | Name functions/methods/properties in `snake_case` and classes in `CamelCase`. As | 
 | a mechanical concession to Python style, this can go a long way to making the | 
 | API feel like it fits in with its peers in the Python landscape. | 
 |  | 
 | If in doubt, choose names that will flow properly with other | 
 | [PEP 8 style names](https://pep8.org/#descriptive-naming-styles). | 
 |  | 
 | ### Prefer pseudo-containers | 
 |  | 
 | Many core IR constructs provide methods directly on the instance to query count | 
 | and begin/end iterators. Prefer hoisting these to dedicated pseudo containers. | 
 |  | 
 | For example, a direct mapping of blocks within regions could be done this way: | 
 |  | 
 | ```python | 
 | region = ... | 
 |  | 
 | for block in region: | 
 |  | 
 |   pass | 
 | ``` | 
 |  | 
 | However, this way is preferred: | 
 |  | 
 | ```python | 
 | region = ... | 
 |  | 
 | for block in region.blocks: | 
 |  | 
 |   pass | 
 |  | 
 | print(len(region.blocks)) | 
 | print(region.blocks[0]) | 
 | print(region.blocks[-1]) | 
 | ``` | 
 |  | 
 | Instead of leaking STL-derived identifiers (`front`, `back`, etc), translate | 
 | them to appropriate `__dunder__` methods and iterator wrappers in the bindings. | 
 |  | 
 | Note that this can be taken too far, so use good judgment. For example, block | 
 | arguments may appear container-like but have defined methods for lookup and | 
 | mutation that would be hard to model properly without making semantics | 
 | complicated. If running into these, just mirror the C/C++ API. | 
 |  | 
 | ### Provide one stop helpers for common things | 
 |  | 
 | One stop helpers that aggregate over multiple low level entities can be | 
 | incredibly helpful and are encouraged within reason. For example, making | 
 | `Context` have a `parse_asm` or equivalent that avoids needing to explicitly | 
 | construct a SourceMgr can be quite nice. One stop helpers do not have to be | 
 | mutually exclusive with a more complete mapping of the backing constructs. | 
 |  | 
 | ## Testing | 
 |  | 
 | Tests should be added in the `test/Bindings/Python` directory and should | 
 | typically be `.py` files that have a lit run line. | 
 |  | 
 | While lit can run any python module, prefer to lay tests out according to these | 
 | rules: | 
 |  | 
 | * For tests of the API surface area, prefer | 
 |   [`doctest`](https://docs.python.org/3/library/doctest.html). | 
 | * For generative tests (those that produce IR), define a Python module that | 
 |   constructs/prints the IR and pipe it through `FileCheck`. | 
 | * Parsing should be kept self-contained within the module under test by use of | 
 |   raw constants and an appropriate `parse_asm` call. | 
 | * Any file I/O code should be staged through a tempfile vs relying on file | 
 |   artifacts/paths outside of the test module. | 
 |  | 
 | ### Sample Doctest | 
 |  | 
 | ```python | 
 | # RUN: %PYTHON %s | 
 |  | 
 | """ | 
 |   >>> m = load_test_module() | 
 | Test basics: | 
 |   >>> m.operation.name | 
 |   "module" | 
 |   >>> m.operation.is_registered | 
 |   True | 
 |   >>> ... etc ... | 
 |  | 
 | Verify that repr prints: | 
 |   >>> m.operation | 
 |   <operation 'module'> | 
 | """ | 
 |  | 
 | import mlir | 
 |  | 
 | TEST_MLIR_ASM = r""" | 
 | func @test_operation_correct_regions() { | 
 |   // ... | 
 | } | 
 | """ | 
 |  | 
 | # TODO: Move to a test utility class once any of this actually exists. | 
 | def load_test_module(): | 
 |   ctx = mlir.ir.Context() | 
 |   ctx.allow_unregistered_dialects = True | 
 |   module = ctx.parse_asm(TEST_MLIR_ASM) | 
 |   return module | 
 |  | 
 |  | 
 | if __name__ == "__main__": | 
 |   import doctest | 
 |   doctest.testmod() | 
 | ``` | 
 |  | 
 | ### Sample FileCheck test | 
 |  | 
 | ```python | 
 | # RUN: %PYTHON %s | mlir-opt -split-input-file | FileCheck | 
 |  | 
 | # TODO: Move to a test utility class once any of this actually exists. | 
 | def print_module(f): | 
 |   m = f() | 
 |   print("// -----") | 
 |   print("// TEST_FUNCTION:", f.__name__) | 
 |   print(m.to_asm()) | 
 |   return f | 
 |  | 
 | # CHECK-LABEL: TEST_FUNCTION: create_my_op | 
 | @print_module | 
 | def create_my_op(): | 
 |   m = mlir.ir.Module() | 
 |   builder = m.new_op_builder() | 
 |   # CHECK: mydialect.my_operation ... | 
 |   builder.my_op() | 
 |   return m | 
 | ``` |