mlir/docs/Bufferization.md - rust-lang/llvm-project - Git at Google

 # Bufferization

 [TOC]

 ## Overview

 Bufferization in MLIR is the process of converting the `tensor` type to the
 `memref` type. MLIR provides a composable system that allows dialects to
 systematically bufferize a program. This system is a simple application
 of MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of
 the code related to bufferization is a set of ordinary `ConversionPattern`'s
 that dialect authors write for converting ops that operate on `tensor`'s to ops
 that operate on `memref`'s. A set of conventions and best practices are followed
 that allow these patterns to be run across multiple independent passes (rather
 than requiring a single huge atomic conversion pass), which makes the
 compilation pipelines scalable, robust, and easy to debug.

 This document is targeted at people looking to utilize MLIR's bufferization
 functionality, along with people who want to extend it to cover their own ops.

 <a name="the-talk">**NOTE:**</a> Before reading this document, please watch the
 talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
 Infrastructure"
 ([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
 [recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
 That talk gives a high-level overview of the bufferization infrastructure and
 important conceptual details related to using the MLIR dialect conversion
 infrastructure.

 ## Bufferization's place in a compilation pipeline

 Bufferization itself does not free any of the buffers that have been allocated,
 nor does it do anything particularly intelligent with the placement of buffers
 w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
 of:

 1. Bufferization
 1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
    `promote-buffers-to-stack`, which do optimizations that are only exposed
    after bufferization.
 1. Finally, running the [buffer deallocation](BufferDeallocation.md) pass.

 After buffer deallocation has been completed, the program will be quite
 difficult to transform due to the presence of the deallocation ops. Thus, other
 optimizations such as linalg fusion on memrefs should be done before that stage.

 ## General structure of the bufferization process

 Bufferization consists of running multiple _partial_ bufferization passes,
 followed by one _finalizing_ bufferization pass.

 There is typically one partial bufferization pass per dialect (though other
 subdivisions are possible). For example, for a dialect `X` there will typically
 be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
 By running pass `X-bufferize` for each dialect `X` in the program, all the ops
 in the program are incrementally bufferized.

 Partial bufferization passes create programs where only some ops have been
 bufferized. These passes will create _materializations_ (also sometimes called
 "casts") that convert between the `tensor` and `memref` type, which allows
 bridging between ops that have been bufferized and ops that have not yet been
 bufferized.

 Finalizing bufferizations complete the bufferization process, and guarantee that
 there are no tensors remaining in the program. This involves eliminating the
 materializations. The pass `finalizing-bufferize` provides a minimal pass that
 only eliminates materializations and issues an error if any unbufferized ops
 exist in the program.

 However, it is possible for a finalizing bufferization to do more than just
 eliminate materializations. By adding patterns (just as a partial bufferization
 would), it is possible for a finalizing bufferization pass to simultaneously
 bufferize ops and eliminate materializations. This has a number of disadvantages
 discussed in the talk and should generally be avoided.

 ### Example

 As a concrete example, we will look at the bufferization pipeline from the
 `mlir-npcomp` reference backend
 ([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
 The code, slightly simplified and annotated, is reproduced here:

 ```c++
   // Partial bufferization passes.
   pm.addPass(createTensorConstantBufferizePass());
   pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
   pm.addNestedPass<FuncOp>(createSCFBufferizePass());
   pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
   pm.addNestedPass<FuncOp>(createStdBufferizePass());
   pm.addNestedPass<FuncOp>(createTensorBufferizePass());
   pm.addPass(createFuncBufferizePass());

   // Finalizing bufferization pass.
   pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());
 ```

 Looking first at the partial bufferization passes, we see that there are a
 sequence of `FuncOp` passes (which run in parallel on functions). These function
 passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which
 are module passes (and thus serialize the parallel compilation process). These
 two passes must be module passes because they make changes to the top-level
 module.

 The bulk of the bufferization work is done by the function passes. Most of these
 passes are provided as part of the upstream MLIR distribution and bufferize
 their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect).
 The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
 used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
 the other passes provided upstream.

 The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
 backend has arranged that all ops are bufferized by partial bufferizations, so
 that the upstream `finalizing-bufferize` pass can be used as the finalizing
 bufferization pass. This gives excellent diagnostics when something goes wrong
 with the bufferization process, such as due to an op that wasn't handled by any
 pattern.

 ## How to write a partial bufferization pass

 The contract of a partial bufferization pass is that a subset of ops (or kinds
 of ops, customizable by a ConversionTarget) get bufferized.

 A partial bufferization pass is just a pass that uses the
 [dialect conversion](DialectConversion.md) framework to apply
 `ConversionPattern`s with a `tensor` to `memref` type conversion.

 To describe how to write such a pass, we will walk through an example, the
 `tensor-bufferize` pass
 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
 that bufferizes the `tensor` dialect.

 The bulk of the code in the pass will be a set of conversion patterns, with a
 simple example being
 [BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).

 ```
 class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
 public:
   using OpConversionPattern::OpConversionPattern;
   LogicalResult
   matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands,
                   ConversionPatternRewriter &rewriter) const override {
     auto resultType = getTypeConverter()->convertType(op.getType());
     rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]);
     return success();
   }
 };
 ```

 See [the talk](#the-talk) for more details on how to write these patterns.

 The
 [pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
 is very small, and follows the basic pattern of any dialect conversion pass.

 ```
 void mlir::populateTensorBufferizePatterns(
     MLIRContext *context, BufferizeTypeConverter &typeConverter,
     OwningRewritePatternList &patterns) {
   patterns.insert<BufferizeCastOp, BufferizeExtractOp>(typeConverter, context);
 }

 struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
   void runOnFunction() override {
     auto *context = &getContext();
     BufferizeTypeConverter typeConverter;
     OwningRewritePatternList patterns;
     ConversionTarget target(*context);

     populateTensorBufferizePatterns(context, typeConverter, patterns);
     target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
     target.addLegalDialect<StandardOpsDialect>();

     if (failed(
             applyPartialConversion(getFunction(), target, std::move(patterns))))
       signalPassFailure();
   }
 };
 ```

 The pass has all the hallmarks of a dialect conversion pass that does type
 conversions: a `TypeConverter`, a `OwningRewritePatternList`, and a
 `ConversionTarget`, and a call to `applyPartialConversion`. Note that a function
 `populateTensorBufferizePatterns` is separated, so that power users can use the
 patterns independently, if necessary (such as to combine multiple sets of
 conversion patterns into a single conversion call, for performance).

 One convenient utility provided by the MLIR bufferization infrastructure is the
 `BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
 and materializations between `tensor` and `memref`.

 In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load`
 and `tensor_to_memref` ops, which are inserted automatically by the dialect
 conversion framework as materializations, are legal. There is a helper
 `populateBufferizeMaterializationLegality`
 ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
 which helps with this in general.

 ### Other partial bufferization examples

 - `linalg-bufferize`
   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1),
   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1))

   - Bufferizes the `linalg` dialect.
   - This is an example of how to simultaneously bufferize all the ops that
     satisfy a certain OpInterface with a single pattern. Specifically,
     `BufferizeAnyLinalgOp`
     ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170))
     bufferizes any ops that implements the `LinalgOp` interface.

 - `scf-bufferize`
   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1),
   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1))

   - Bufferizes ops from the `scf` dialect.
   - This is an example of how to bufferize ops that implement
     `RegionBranchOpInterface` (that is, they use regions to represent control
     flow).
   - The bulk of the work is done by
     `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp`
     ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)),
     which is well-commented and covers how to correctly convert ops that contain
     regions.

 - `func-bufferize`
   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1),
   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1))

   - Bufferizes `func`, `call`, and `BranchOpInterface` ops.
   - This is an example of how to bufferize ops that have multi-block regions.
   - This is an example of a pass that is not split along dialect subdivisions.

 - `tensor-constant-bufferize`
   ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1),
   [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1))
   - Bufferizes only `std.constant` ops of `tensor` type.
   - This is an example of setting up the legality so that only a subset of
     `std.constant` ops get bufferized.
   - This is an example of a pass that is not split along dialect subdivisions.

 ## How to write a finalizing bufferization pass

 The contract of a finalizing bufferization pass is that all tensors are gone
 from the program.

 The easiest way to write a finalizing bufferize pass is to not write one at all!
 MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
 `tensor_to_memref` materialization ops inserted by partial bufferization passes
 and emits an error if that is not sufficient to remove all tensors from the
 program.

 This pass is sufficient when partial bufferization passes have bufferized all
 the ops in the program, leaving behind only the materializations. When possible,
 it is recommended to structure your pass pipeline this way, as this has the
 significant advantage that if an op does not get bufferized (due to a missing
 pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
 error, and the IR seen by `finalizing-bufferize` will only contain only one
 unbufferized op.

 However, before the current bufferization infrastructure was put in place,
 bufferization could only be done as a single finalizing bufferization
 mega-pass that used the `populate*BufferizePatterns` functions from multiple
 dialects to simultaneously bufferize everything at once. Thus, one might see
 code in downstream projects structured this way. This structure is not
 recommended in new code. A helper,
 `populateEliminateBufferizeMaterializationsPatterns`
 ([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
 is available for such passes to provide patterns that eliminate `tensor_load`
 and `tensor_to_memref`.

 ## Changes since [the talk](#the-talk)

 - `func-bufferize` was changed to be a partial conversion pass, and there is a
   new `finalizing-bufferize` which serves as a general finalizing bufferization
   pass.
	# Bufferization

	[TOC]

	## Overview

	Bufferization in MLIR is the process of converting the `tensor` type to the
	`memref` type. MLIR provides a composable system that allows dialects to
	systematically bufferize a program. This system is a simple application
	of MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of
	the code related to bufferization is a set of ordinary `ConversionPattern`'s
	that dialect authors write for converting ops that operate on `tensor`'s to ops
	that operate on `memref`'s. A set of conventions and best practices are followed
	that allow these patterns to be run across multiple independent passes (rather
	than requiring a single huge atomic conversion pass), which makes the
	compilation pipelines scalable, robust, and easy to debug.

	This document is targeted at people looking to utilize MLIR's bufferization
	functionality, along with people who want to extend it to cover their own ops.

	<a name="the-talk">NOTE:</a> Before reading this document, please watch the
	talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
	Infrastructure"
	([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
	[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
	That talk gives a high-level overview of the bufferization infrastructure and
	important conceptual details related to using the MLIR dialect conversion
	infrastructure.

	## Bufferization's place in a compilation pipeline

	Bufferization itself does not free any of the buffers that have been allocated,
	nor does it do anything particularly intelligent with the placement of buffers
	w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
	of:

	1. Bufferization
	1. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
	`promote-buffers-to-stack`, which do optimizations that are only exposed
	after bufferization.
	1. Finally, running the [buffer deallocation](BufferDeallocation.md) pass.

	After buffer deallocation has been completed, the program will be quite
	difficult to transform due to the presence of the deallocation ops. Thus, other
	optimizations such as linalg fusion on memrefs should be done before that stage.

	## General structure of the bufferization process

	Bufferization consists of running multiple _partial_ bufferization passes,
	followed by one _finalizing_ bufferization pass.

	There is typically one partial bufferization pass per dialect (though other
	subdivisions are possible). For example, for a dialect `X` there will typically
	be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
	By running pass `X-bufferize` for each dialect `X` in the program, all the ops
	in the program are incrementally bufferized.

	Partial bufferization passes create programs where only some ops have been
	bufferized. These passes will create _materializations_ (also sometimes called
	"casts") that convert between the `tensor` and `memref` type, which allows
	bridging between ops that have been bufferized and ops that have not yet been
	bufferized.

	Finalizing bufferizations complete the bufferization process, and guarantee that
	there are no tensors remaining in the program. This involves eliminating the
	materializations. The pass `finalizing-bufferize` provides a minimal pass that
	only eliminates materializations and issues an error if any unbufferized ops
	exist in the program.

	However, it is possible for a finalizing bufferization to do more than just
	eliminate materializations. By adding patterns (just as a partial bufferization
	would), it is possible for a finalizing bufferization pass to simultaneously
	bufferize ops and eliminate materializations. This has a number of disadvantages
	discussed in the talk and should generally be avoided.

	### Example

	As a concrete example, we will look at the bufferization pipeline from the
	`mlir-npcomp` reference backend
	([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
	The code, slightly simplified and annotated, is reproduced here:

	```c++
	// Partial bufferization passes.
	pm.addPass(createTensorConstantBufferizePass());
	pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
	pm.addNestedPass<FuncOp>(createSCFBufferizePass());
	pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
	pm.addNestedPass<FuncOp>(createStdBufferizePass());
	pm.addNestedPass<FuncOp>(createTensorBufferizePass());
	pm.addPass(createFuncBufferizePass());

	// Finalizing bufferization pass.
	pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());
	```

	Looking first at the partial bufferization passes, we see that there are a
	sequence of `FuncOp` passes (which run in parallel on functions). These function
	passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which
	are module passes (and thus serialize the parallel compilation process). These
	two passes must be module passes because they make changes to the top-level
	module.

	The bulk of the bufferization work is done by the function passes. Most of these
	passes are provided as part of the upstream MLIR distribution and bufferize
	their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect).
	The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
	used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
	the other passes provided upstream.

	The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
	backend has arranged that all ops are bufferized by partial bufferizations, so
	that the upstream `finalizing-bufferize` pass can be used as the finalizing
	bufferization pass. This gives excellent diagnostics when something goes wrong
	with the bufferization process, such as due to an op that wasn't handled by any
	pattern.

	## How to write a partial bufferization pass

	The contract of a partial bufferization pass is that a subset of ops (or kinds
	of ops, customizable by a ConversionTarget) get bufferized.

	A partial bufferization pass is just a pass that uses the
	[dialect conversion](DialectConversion.md) framework to apply
	`ConversionPattern`s with a `tensor` to `memref` type conversion.

	To describe how to write such a pass, we will walk through an example, the
	`tensor-bufferize` pass
	([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
	[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
	that bufferizes the `tensor` dialect.

	The bulk of the code in the pass will be a set of conversion patterns, with a
	simple example being
	[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).

	```
	class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
	public:
	using OpConversionPattern::OpConversionPattern;
	LogicalResult
	matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands,
	ConversionPatternRewriter &rewriter) const override {
	auto resultType = getTypeConverter()->convertType(op.getType());
	rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]);
	return success();
	}
	};
	```

	See [the talk](#the-talk) for more details on how to write these patterns.

	The
	[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
	is very small, and follows the basic pattern of any dialect conversion pass.

	```
	void mlir::populateTensorBufferizePatterns(
	MLIRContext *context, BufferizeTypeConverter &typeConverter,
	OwningRewritePatternList &patterns) {
	patterns.insert<BufferizeCastOp, BufferizeExtractOp>(typeConverter, context);
	}

	struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
	void runOnFunction() override {
	auto *context = &getContext();
	BufferizeTypeConverter typeConverter;
	OwningRewritePatternList patterns;
	ConversionTarget target(*context);

	populateTensorBufferizePatterns(context, typeConverter, patterns);
	target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
	target.addLegalDialect<StandardOpsDialect>();

	if (failed(
	applyPartialConversion(getFunction(), target, std::move(patterns))))
	signalPassFailure();
	}
	};
	```

	The pass has all the hallmarks of a dialect conversion pass that does type
	conversions: a `TypeConverter`, a `OwningRewritePatternList`, and a
	`ConversionTarget`, and a call to `applyPartialConversion`. Note that a function
	`populateTensorBufferizePatterns` is separated, so that power users can use the
	patterns independently, if necessary (such as to combine multiple sets of
	conversion patterns into a single conversion call, for performance).

	One convenient utility provided by the MLIR bufferization infrastructure is the
	`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
	and materializations between `tensor` and `memref`.

	In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load`
	and `tensor_to_memref` ops, which are inserted automatically by the dialect
	conversion framework as materializations, are legal. There is a helper
	`populateBufferizeMaterializationLegality`
	([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
	which helps with this in general.

	### Other partial bufferization examples

	- `linalg-bufferize`
	([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1),
	[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1))

	- Bufferizes the `linalg` dialect.
	- This is an example of how to simultaneously bufferize all the ops that
	satisfy a certain OpInterface with a single pattern. Specifically,
	`BufferizeAnyLinalgOp`
	([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170))
	bufferizes any ops that implements the `LinalgOp` interface.

	- `scf-bufferize`
	([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1),
	[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1))

	- Bufferizes ops from the `scf` dialect.
	- This is an example of how to bufferize ops that implement
	`RegionBranchOpInterface` (that is, they use regions to represent control
	flow).
	- The bulk of the work is done by
	`lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp`
	([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)),
	which is well-commented and covers how to correctly convert ops that contain
	regions.

	- `func-bufferize`
	([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1),
	[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1))

	- Bufferizes `func`, `call`, and `BranchOpInterface` ops.
	- This is an example of how to bufferize ops that have multi-block regions.
	- This is an example of a pass that is not split along dialect subdivisions.

	- `tensor-constant-bufferize`
	([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1),
	[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1))
	- Bufferizes only `std.constant` ops of `tensor` type.
	- This is an example of setting up the legality so that only a subset of
	`std.constant` ops get bufferized.
	- This is an example of a pass that is not split along dialect subdivisions.

	## How to write a finalizing bufferization pass

	The contract of a finalizing bufferization pass is that all tensors are gone
	from the program.

	The easiest way to write a finalizing bufferize pass is to not write one at all!
	MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
	`tensor_to_memref` materialization ops inserted by partial bufferization passes
	and emits an error if that is not sufficient to remove all tensors from the
	program.

	This pass is sufficient when partial bufferization passes have bufferized all
	the ops in the program, leaving behind only the materializations. When possible,
	it is recommended to structure your pass pipeline this way, as this has the
	significant advantage that if an op does not get bufferized (due to a missing
	pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
	error, and the IR seen by `finalizing-bufferize` will only contain only one
	unbufferized op.

	However, before the current bufferization infrastructure was put in place,
	bufferization could only be done as a single finalizing bufferization
	mega-pass that used the `populate*BufferizePatterns` functions from multiple
	dialects to simultaneously bufferize everything at once. Thus, one might see
	code in downstream projects structured this way. This structure is not
	recommended in new code. A helper,
	`populateEliminateBufferizeMaterializationsPatterns`
	([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
	is available for such passes to provide patterns that eliminate `tensor_load`
	and `tensor_to_memref`.

	## Changes since [the talk](#the-talk)

	- `func-bufferize` was changed to be a partial conversion pass, and there is a
	new `finalizing-bufferize` which serves as a general finalizing bufferization
	pass.