| # Chapter 3: High-level Language-Specific Analysis and Transformation |
| |
| [TOC] |
| |
| Creating a dialect that closely represents the semantics of an input language |
| enables analyses, transformations and optimizations in MLIR that require |
| high-level language information and are generally performed on the language AST. |
| For example, `clang` has a fairly |
| [heavy mechanism](https://clang.llvm.org/doxygen/classclang_1_1TreeTransform.html) |
| for performing template instantiation in C++. |
| |
| We divide compiler transformations into two categories: local and global. In |
| this chapter, we focus on how to leverage the Toy Dialect and its high-level |
| semantics to perform local pattern-match transformations that would be difficult |
| in LLVM. For this, we use MLIR's |
| [Generic DAG Rewriter](../../GenericDAGRewriter.md). |
| |
| There are two methods that can be used to implement pattern-match |
| transformations: 1. Imperative, C++ pattern-match and rewrite 2. Declarative, |
| rule-based pattern-match and rewrite using table-driven |
| [Declarative Rewrite Rules](../../DeclarativeRewrites.md) (DRR). Note that the |
| use of DRR requires that the operations be defined using ODS, as described in |
| [Chapter 2](Ch-2.md). |
| |
| ## Optimize Transpose using C++ style pattern-match and rewrite |
| |
| Let's start with a simple pattern and try to eliminate a sequence of two |
| transposes that cancel out: `transpose(transpose(X)) -> X`. Here is the |
| corresponding Toy example: |
| |
| ```toy |
| def transpose_transpose(x) { |
| return transpose(transpose(x)); |
| } |
| ``` |
| |
| Which corresponds to the following IR: |
| |
| ```mlir |
| func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { |
| %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> |
| %1 = toy.transpose(%0 : tensor<*xf64>) to tensor<*xf64> |
| toy.return %1 : tensor<*xf64> |
| } |
| ``` |
| |
| This is a good example of a transformation that is trivial to match on the Toy |
| IR but that would be quite hard for LLVM to figure. For example, today Clang |
| can't optimize away the temporary array, and the computation with the naive |
| transpose is expressed with these loops: |
| |
| ```c++ |
| #define N 100 |
| #define M 100 |
| |
| void sink(void *); |
| void double_transpose(int A[N][M]) { |
| int B[M][N]; |
| for(int i = 0; i < N; ++i) { |
| for(int j = 0; j < M; ++j) { |
| B[j][i] = A[i][j]; |
| } |
| } |
| for(int i = 0; i < N; ++i) { |
| for(int j = 0; j < M; ++j) { |
| A[i][j] = B[j][i]; |
| } |
| } |
| sink(A); |
| } |
| ``` |
| |
| For a simple C++ approach to rewrite, involving matching a tree-like pattern in |
| the IR and replacing it with a different set of operations, we can plug into the |
| MLIR `Canonicalizer` pass by implementing a `RewritePattern`: |
| |
| ```c++ |
| /// Fold transpose(transpose(x)) -> x |
| struct SimplifyRedundantTranspose : public mlir::OpRewritePattern<TransposeOp> { |
| /// We register this pattern to match every toy.transpose in the IR. |
| /// The "benefit" is used by the framework to order the patterns and process |
| /// them in order of profitability. |
| SimplifyRedundantTranspose(mlir::MLIRContext *context) |
| : OpRewritePattern<TransposeOp>(context, /*benefit=*/1) {} |
| |
| /// This method is attempting to match a pattern and rewrite it. The rewriter |
| /// argument is the orchestrator of the sequence of rewrites. It is expected |
| /// to interact with it to perform any changes to the IR from here. |
| mlir::LogicalResult |
| matchAndRewrite(TransposeOp op, |
| mlir::PatternRewriter &rewriter) const override { |
| // Look through the input of the current transpose. |
| mlir::Value transposeInput = op.getOperand(); |
| TransposeOp transposeInputOp = transposeInput.getDefiningOp<TransposeOp>(); |
| |
| // Input defined by another transpose? If not, no match. |
| if (!transposeInputOp) |
| return failure(); |
| |
| // Otherwise, we have a redundant transpose. Use the rewriter. |
| rewriter.replaceOp(op, {transposeInputOp.getOperand()}); |
| return success(); |
| } |
| }; |
| ``` |
| |
| The implementation of this rewriter is in `ToyCombine.cpp`. The |
| [canonicalization pass](../../Canonicalization.md) applies transformations |
| defined by operations in a greedy, iterative manner. To ensure that the |
| canonicalization pass applies our new transform, we set |
| [hasCanonicalizer = 1](../../OpDefinitions.md#hascanonicalizer) and register the |
| pattern with the canonicalization framework. |
| |
| ```c++ |
| // Register our patterns for rewrite by the Canonicalization framework. |
| void TransposeOp::getCanonicalizationPatterns( |
| OwningRewritePatternList &results, MLIRContext *context) { |
| results.insert<SimplifyRedundantTranspose>(context); |
| } |
| ``` |
| |
| We also need to update our main file, `toyc.cpp`, to add an optimization |
| pipeline. In MLIR, the optimizations are run through a `PassManager` in a |
| similar way to LLVM: |
| |
| ```c++ |
| mlir::PassManager pm(module.getContext()); |
| pm.addNestedPass<mlir::FuncOp>(mlir::createCanonicalizerPass()); |
| ``` |
| |
| Finally, we can run `toyc-ch3 test/Examples/Toy/Ch3/transpose_transpose.toy |
| -emit=mlir -opt` and observe our pattern in action: |
| |
| ```mlir |
| func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { |
| %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64> |
| toy.return %arg0 : tensor<*xf64> |
| } |
| ``` |
| |
| As expected, we now directly return the function argument, bypassing any |
| transpose operation. However, one of the transposes still hasn't been |
| eliminated. That is not ideal! What happened is that our pattern replaced the |
| last transform with the function input and left behind the now dead transpose |
| input. The Canonicalizer knows to clean up dead operations; however, MLIR |
| conservatively assumes that operations may have side-effects. We can fix this by |
| adding a new trait, `NoSideEffect`, to our `TransposeOp`: |
| |
| ```tablegen |
| def TransposeOp : Toy_Op<"transpose", [NoSideEffect]> {...} |
| ``` |
| |
| Let's retry now `toyc-ch3 test/transpose_transpose.toy -emit=mlir -opt`: |
| |
| ```mlir |
| func @transpose_transpose(%arg0: tensor<*xf64>) -> tensor<*xf64> { |
| toy.return %arg0 : tensor<*xf64> |
| } |
| ``` |
| |
| Perfect! No `transpose` operation is left - the code is optimal. |
| |
| In the next section, we use DRR for pattern match optimizations associated with |
| the Reshape op. |
| |
| ## Optimize Reshapes using DRR |
| |
| Declarative, rule-based pattern-match and rewrite (DRR) is an operation |
| DAG-based declarative rewriter that provides a table-based syntax for |
| pattern-match and rewrite rules: |
| |
| ```tablegen |
| class Pattern< |
| dag sourcePattern, list<dag> resultPatterns, |
| list<dag> additionalConstraints = [], |
| dag benefitsAdded = (addBenefit 0)>; |
| ``` |
| |
| A redundant reshape optimization similar to SimplifyRedundantTranspose can be |
| expressed more simply using DRR as follows: |
| |
| ```tablegen |
| // Reshape(Reshape(x)) = Reshape(x) |
| def ReshapeReshapeOptPattern : Pat<(ReshapeOp(ReshapeOp $arg)), |
| (ReshapeOp $arg)>; |
| ``` |
| |
| The automatically generated C++ code corresponding to each of the DRR patterns |
| can be found under `path/to/BUILD/tools/mlir/examples/toy/Ch3/ToyCombine.inc`. |
| |
| DRR also provides a method for adding argument constraints when the |
| transformation is conditional on some properties of the arguments and results. |
| An example is a transformation that eliminates reshapes when they are redundant, |
| i.e. when the input and output shapes are identical. |
| |
| ```tablegen |
| def TypesAreIdentical : Constraint<CPred<"$0.getType() == $1.getType()">>; |
| def RedundantReshapeOptPattern : Pat< |
| (ReshapeOp:$res $arg), (replaceWithValue $arg), |
| [(TypesAreIdentical $res, $arg)]>; |
| ``` |
| |
| Some optimizations may require additional transformations on instruction |
| arguments. This is achieved using NativeCodeCall, which allows for more complex |
| transformations either by calling into a C++ helper function or by using inline |
| C++. An example of such an optimization is FoldConstantReshape, where we |
| optimize Reshape of a constant value by reshaping the constant in place and |
| eliminating the reshape operation. |
| |
| ```tablegen |
| def ReshapeConstant : NativeCodeCall<"$0.reshape(($1.getType()).cast<ShapedType>())">; |
| def FoldConstantReshapeOptPattern : Pat< |
| (ReshapeOp:$res (ConstantOp $arg)), |
| (ConstantOp (ReshapeConstant $arg, $res))>; |
| ``` |
| |
| We demonstrate these reshape optimizations using the following |
| trivial_reshape.toy program: |
| |
| ```c++ |
| def main() { |
| var a<2,1> = [1, 2]; |
| var b<2,1> = a; |
| var c<2,1> = b; |
| print(c); |
| } |
| ``` |
| |
| ```mlir |
| module { |
| func @main() { |
| %0 = toy.constant dense<[1.000000e+00, 2.000000e+00]> : tensor<2xf64> |
| %1 = toy.reshape(%0 : tensor<2xf64>) to tensor<2x1xf64> |
| %2 = toy.reshape(%1 : tensor<2x1xf64>) to tensor<2x1xf64> |
| %3 = toy.reshape(%2 : tensor<2x1xf64>) to tensor<2x1xf64> |
| toy.print %3 : tensor<2x1xf64> |
| toy.return |
| } |
| } |
| ``` |
| |
| We can try to run `toyc-ch3 test/Examples/Toy/Ch3/trivial_reshape.toy -emit=mlir |
| -opt` and observe our pattern in action: |
| |
| ```mlir |
| module { |
| func @main() { |
| %0 = toy.constant dense<[[1.000000e+00], [2.000000e+00]]> : tensor<2x1xf64> |
| toy.print %0 : tensor<2x1xf64> |
| toy.return |
| } |
| } |
| ``` |
| |
| As expected, no reshape operations remain after canonicalization. |
| |
| Further details on the declarative rewrite method can be found at |
| [Table-driven Declarative Rewrite Rule (DRR)](../../DeclarativeRewrites.md). |
| |
| In this chapter, we saw how to use certain core transformations through always |
| available hooks. In the [next chapter](Ch-4.md), we will see how to use generic |
| solutions that scale better through Interfaces. |