llvm/docs/InAlloca.rst - rust-lang/llvm-project - Git at Google

 ==========================================
 Design and Usage of the InAlloca Attribute
 ==========================================

 Introduction
 ============

 The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
 taking the address of an aggregate argument that is being passed by
 value through memory.  Primarily, this feature is required for
 compatibility with the Microsoft C++ ABI.  Under that ABI, class
 instances that are passed by value are constructed directly into
 argument stack memory.  Prior to the addition of inalloca, calls in LLVM
 were indivisible instructions.  There was no way to perform intermediate
 work, such as object construction, between the first stack adjustment
 and the final control transfer.  With inalloca, all arguments passed in
 memory are modelled as a single alloca, which can be stored to prior to
 the call.  Unfortunately, this complicated feature comes with a large
 set of restrictions designed to bound the lifetime of the argument
 memory around the call.

 For now, it is recommended that frontends and optimizers avoid producing
 this construct, primarily because it forces the use of a base pointer.
 This feature may grow in the future to allow general mid-level
 optimization, but for now, it should be regarded as less efficient than
 passing by value with a copy.

 Intended Usage
 ==============

 The example below is the intended LLVM IR lowering for some C++ code
 that passes two default-constructed ``Foo`` objects to ``g`` in the
 32-bit Microsoft C++ ABI.

 .. code-block:: c++

     // Foo is non-trivial.
     struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
     void g(Foo a, Foo b);
     void f() {
       g(Foo(), Foo());
     }

 .. code-block:: text

     %struct.Foo = type { i32, i32 }
     declare void @Foo_ctor(%struct.Foo* %this)
     declare void @Foo_dtor(%struct.Foo* %this)
     declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)

     define void @f() {
     entry:
       %base = call i8* @llvm.stacksave()
       %memargs = alloca <{ %struct.Foo, %struct.Foo }>
       %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
       call void @Foo_ctor(%struct.Foo* %b)

       ; If a's ctor throws, we must destruct b.
       %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
       invoke void @Foo_ctor(%struct.Foo* %a)
           to label %invoke.cont unwind %invoke.unwind

     invoke.cont:
       call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
       call void @llvm.stackrestore(i8* %base)
       ...

     invoke.unwind:
       call void @Foo_dtor(%struct.Foo* %b)
       call void @llvm.stackrestore(i8* %base)
       ...
     }

 To avoid stack leaks, the frontend saves the current stack pointer with
 a call to :ref:`llvm.stacksave <int_stacksave>`.  Then, it allocates the
 argument stack space with alloca and calls the default constructor.  The
 default constructor could throw an exception, so the frontend has to
 create a landing pad.  The frontend has to destroy the already
 constructed argument ``b`` before restoring the stack pointer.  If the
 constructor does not unwind, ``g`` is called.  In the Microsoft C++ ABI,
 ``g`` will destroy its arguments, and then the stack is restored in
 ``f``.

 Design Considerations
 =====================

 Lifetime
 --------

 The biggest design consideration for this feature is object lifetime.
 We cannot model the arguments as static allocas in the entry block,
 because all calls need to use the memory at the top of the stack to pass
 arguments.  We cannot vend pointers to that memory at function entry
 because after code generation they will alias.

 The rule against allocas between argument allocations and the call site
 avoids this problem, but it creates a cleanup problem.  Cleanup and
 lifetime is handled explicitly with stack save and restore calls.  In
 the future, we may want to introduce a new construct such as ``freea``
 or ``afree`` to make it clear that this stack adjusting cleanup is less
 powerful than a full stack save and restore.

 Nested Calls and Copy Elision
 -----------------------------

 We also want to be able to support copy elision into these argument
 slots.  This means we have to support multiple live argument
 allocations.

 Consider the evaluation of:

 .. code-block:: c++

     // Foo is non-trivial.
     struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
     Foo bar(Foo b);
     int main() {
       bar(bar(Foo()));
     }

 In this case, we want to be able to elide copies into ``bar``'s argument
 slots.  That means we need to have more than one set of argument frames
 active at the same time.  First, we need to allocate the frame for the
 outer call so we can pass it in as the hidden struct return pointer to
 the middle call.  Then we do the same for the middle call, allocating a
 frame and passing its address to ``Foo``'s default constructor.  By
 wrapping the evaluation of the inner ``bar`` with stack save and
 restore, we can have multiple overlapping active call frames.

 Callee-cleanup Calling Conventions
 ----------------------------------

 Another wrinkle is the existence of callee-cleanup conventions.  On
 Windows, all methods and many other functions adjust the stack to clear
 the memory used to pass their arguments.  In some sense, this means that
 the allocas are automatically cleared by the call.  However, LLVM
 instead models this as a write of undef to all of the inalloca values
 passed to the call instead of a stack adjustment.  Frontends should
 still restore the stack pointer to avoid a stack leak.

 Exceptions
 ----------

 There is also the possibility of an exception.  If argument evaluation
 or copy construction throws an exception, the landing pad must do
 cleanup, which includes adjusting the stack pointer to avoid a stack
 leak.  This means the cleanup of the stack memory cannot be tied to the
 call itself.  There needs to be a separate IR-level instruction that can
 perform independent cleanup of arguments.

 Efficiency
 ----------

 Eventually, it should be possible to generate efficient code for this
 construct.  In particular, using inalloca should not require a base
 pointer.  If the backend can prove that all points in the CFG only have
 one possible stack level, then it can address the stack directly from
 the stack pointer.  While this is not yet implemented, the plan is that
 the inalloca attribute should not change much, but the frontend IR
 generation recommendations may change.
	==========================================
	Design and Usage of the InAlloca Attribute
	==========================================

	Introduction
	============

	The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
	taking the address of an aggregate argument that is being passed by
	value through memory. Primarily, this feature is required for
	compatibility with the Microsoft C++ ABI. Under that ABI, class
	instances that are passed by value are constructed directly into
	argument stack memory. Prior to the addition of inalloca, calls in LLVM
	were indivisible instructions. There was no way to perform intermediate
	work, such as object construction, between the first stack adjustment
	and the final control transfer. With inalloca, all arguments passed in
	memory are modelled as a single alloca, which can be stored to prior to
	the call. Unfortunately, this complicated feature comes with a large
	set of restrictions designed to bound the lifetime of the argument
	memory around the call.

	For now, it is recommended that frontends and optimizers avoid producing
	this construct, primarily because it forces the use of a base pointer.
	This feature may grow in the future to allow general mid-level
	optimization, but for now, it should be regarded as less efficient than
	passing by value with a copy.

	Intended Usage
	==============

	The example below is the intended LLVM IR lowering for some C++ code
	that passes two default-constructed ``Foo`` objects to ``g`` in the
	32-bit Microsoft C++ ABI.

	.. code-block:: c++

	// Foo is non-trivial.
	struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
	void g(Foo a, Foo b);
	void f() {
	g(Foo(), Foo());
	}

	.. code-block:: text

	%struct.Foo = type { i32, i32 }
	declare void @Foo_ctor(%struct.Foo* %this)
	declare void @Foo_dtor(%struct.Foo* %this)
	declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)

	define void @f() {
	entry:
	%base = call i8* @llvm.stacksave()
	%memargs = alloca <{ %struct.Foo, %struct.Foo }>
	%b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
	call void @Foo_ctor(%struct.Foo* %b)

	; If a's ctor throws, we must destruct b.
	%a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
	invoke void @Foo_ctor(%struct.Foo* %a)
	to label %invoke.cont unwind %invoke.unwind

	invoke.cont:
	call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
	call void @llvm.stackrestore(i8* %base)
	...

	invoke.unwind:
	call void @Foo_dtor(%struct.Foo* %b)
	call void @llvm.stackrestore(i8* %base)
	...
	}

	To avoid stack leaks, the frontend saves the current stack pointer with
	a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
	argument stack space with alloca and calls the default constructor. The
	default constructor could throw an exception, so the frontend has to
	create a landing pad. The frontend has to destroy the already
	constructed argument ``b`` before restoring the stack pointer. If the
	constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
	``g`` will destroy its arguments, and then the stack is restored in
	``f``.

	Design Considerations
	=====================

	Lifetime
	--------

	The biggest design consideration for this feature is object lifetime.
	We cannot model the arguments as static allocas in the entry block,
	because all calls need to use the memory at the top of the stack to pass
	arguments. We cannot vend pointers to that memory at function entry
	because after code generation they will alias.

	The rule against allocas between argument allocations and the call site
	avoids this problem, but it creates a cleanup problem. Cleanup and
	lifetime is handled explicitly with stack save and restore calls. In
	the future, we may want to introduce a new construct such as ``freea``
	or ``afree`` to make it clear that this stack adjusting cleanup is less
	powerful than a full stack save and restore.

	Nested Calls and Copy Elision
	-----------------------------

	We also want to be able to support copy elision into these argument
	slots. This means we have to support multiple live argument
	allocations.

	Consider the evaluation of:

	.. code-block:: c++

	// Foo is non-trivial.
	struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
	Foo bar(Foo b);
	int main() {
	bar(bar(Foo()));
	}

	In this case, we want to be able to elide copies into ``bar``'s argument
	slots. That means we need to have more than one set of argument frames
	active at the same time. First, we need to allocate the frame for the
	outer call so we can pass it in as the hidden struct return pointer to
	the middle call. Then we do the same for the middle call, allocating a
	frame and passing its address to ``Foo``'s default constructor. By
	wrapping the evaluation of the inner ``bar`` with stack save and
	restore, we can have multiple overlapping active call frames.

	Callee-cleanup Calling Conventions
	----------------------------------

	Another wrinkle is the existence of callee-cleanup conventions. On
	Windows, all methods and many other functions adjust the stack to clear
	the memory used to pass their arguments. In some sense, this means that
	the allocas are automatically cleared by the call. However, LLVM
	instead models this as a write of undef to all of the inalloca values
	passed to the call instead of a stack adjustment. Frontends should
	still restore the stack pointer to avoid a stack leak.

	Exceptions
	----------

	There is also the possibility of an exception. If argument evaluation
	or copy construction throws an exception, the landing pad must do
	cleanup, which includes adjusting the stack pointer to avoid a stack
	leak. This means the cleanup of the stack memory cannot be tied to the
	call itself. There needs to be a separate IR-level instruction that can
	perform independent cleanup of arguments.

	Efficiency
	----------

	Eventually, it should be possible to generate efficient code for this
	construct. In particular, using inalloca should not require a base
	pointer. If the backend can prove that all points in the CFG only have
	one possible stack level, then it can address the stack directly from
	the stack pointer. While this is not yet implemented, the plan is that
	the inalloca attribute should not change much, but the frontend IR
	generation recommendations may change.