|  | .. _loop-terminology: | 
|  |  | 
|  | =========================================== | 
|  | LLVM Loop Terminology (and Canonical Forms) | 
|  | =========================================== | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | Loops are a core concept in any optimizer.  This page spells out some | 
|  | of the common terminology used within LLVM code to describe loop | 
|  | structures. | 
|  |  | 
|  | First, let's start with the basics. In LLVM, a Loop is a maximal set of basic | 
|  | blocks that form a strongly connected component (SCC) in the Control | 
|  | Flow Graph (CFG) where there exists a dedicated entry/header block that | 
|  | dominates all other blocks within the loop. Thus, without leaving the | 
|  | loop, one can reach every block in the loop from the header block and | 
|  | the header block from every block in the loop. | 
|  |  | 
|  | Note that there are some important implications of this definition: | 
|  |  | 
|  | * Not all SCCs are loops.  There exist SCCs that do not meet the | 
|  | dominance requirement and such are not considered loops. | 
|  |  | 
|  | * Loops can contain non-loop SCCs and non-loop SCCs may contain | 
|  | loops.  Loops may also contain sub-loops. | 
|  |  | 
|  | * A header block is uniquely associated with one loop.  There can be | 
|  | multiple SCC within that loop, but the strongly connected component | 
|  | (SCC) formed from their union must always be unique. | 
|  |  | 
|  | * Given the use of dominance in the definition, all loops are | 
|  | statically reachable from the entry of the function. | 
|  |  | 
|  | * Every loop must have a header block, and some set of predecessors | 
|  | outside the loop.  A loop is allowed to be statically infinite, so | 
|  | there need not be any exiting edges. | 
|  |  | 
|  | * Any two loops are either fully disjoint (no intersecting blocks), or | 
|  | one must be a sub-loop of the other. | 
|  |  | 
|  | * Loops in a function form a forest. One implication of this fact | 
|  | is that a loop either has no parent or a single parent. | 
|  |  | 
|  | A loop may have an arbitrary number of exits, both explicit (via | 
|  | control flow) and implicit (via throwing calls which transfer control | 
|  | out of the containing function).  There is no special requirement on | 
|  | the form or structure of exit blocks (the block outside the loop which | 
|  | is branched to).  They may have multiple predecessors, phis, etc... | 
|  |  | 
|  | Key Terminology | 
|  | =============== | 
|  |  | 
|  | **Header Block** - The basic block which dominates all other blocks | 
|  | contained within the loop.  As such, it is the first one executed if | 
|  | the loop executes at all.  Note that a block can be the header of | 
|  | two separate loops at the same time, but only if one is a sub-loop | 
|  | of the other. | 
|  |  | 
|  | **Exiting Block** - A basic block contained within a given loop which has | 
|  | at least one successor outside of the loop and one successor inside the | 
|  | loop.  (The latter is a consequence of the block being contained within | 
|  | an SCC which is part of the loop.)  That is, it has a successor which | 
|  | is an Exit Block. | 
|  |  | 
|  | **Exit Block** - A basic block outside of the associated loop which has a | 
|  | predecessor inside the loop.  That is, it has a predecessor which is | 
|  | an Exiting Block. | 
|  |  | 
|  | **Latch Block** - A basic block within the loop whose successors include | 
|  | the header block of the loop.  Thus, a latch is a source of backedge. | 
|  | A loop may have multiple latch blocks.  A latch block may be either | 
|  | conditional or unconditional. | 
|  |  | 
|  | **Backedge(s)** - The edge(s) in the CFG from latch blocks to the header | 
|  | block.  Note that there can be multiple such edges, and even multiple | 
|  | such edges leaving a single latch block. | 
|  |  | 
|  | **Loop Predecessor** -  The predecessor blocks of the loop header which | 
|  | are not contained by the loop itself.  These are the only blocks | 
|  | through which execution can enter the loop.  When used in the | 
|  | singular form implies that there is only one such unique block. | 
|  |  | 
|  | **Preheader Block** - A preheader is a (singular) loop predecessor which | 
|  | ends in an unconditional transfer of control to the loop header.  Note | 
|  | that not all loops have such blocks. | 
|  |  | 
|  | **Backedge Taken Count** - The number of times the backedge will execute | 
|  | before some interesting event happens.  Commonly used without | 
|  | qualification of the event as a shorthand for when some exiting block | 
|  | branches to some exit block. May be zero, or not statically computable. | 
|  |  | 
|  | **Iteration Count** - The number of times the header will execute before | 
|  | some interesting event happens.  Commonly used without qualification to | 
|  | refer to the iteration count at which the loop exits.  Will always be | 
|  | one greater than the backedge taken count.  *Warning*: Preceding | 
|  | statement is true in the *integer domain*; if you're dealing with fixed | 
|  | width integers (such as LLVM Values or SCEVs), you need to be cautious | 
|  | of overflow when converting one to the other. | 
|  |  | 
|  | It's important to note that the same basic block can play multiple | 
|  | roles in the same loop, or in different loops at once.  For example, a | 
|  | single block can be the header for two nested loops at once, while | 
|  | also being an exiting block for the inner one only, and an exit block | 
|  | for a sibling loop.  Example: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | while (..) { | 
|  | for (..) {} | 
|  | do { | 
|  | do { | 
|  | // <-- block of interest | 
|  | if (exit) break; | 
|  | } while (..); | 
|  | } while (..) | 
|  | } | 
|  |  | 
|  | LoopInfo | 
|  | ======== | 
|  |  | 
|  | LoopInfo is the core analysis for obtaining information about loops. | 
|  | There are few key implications of the definitions given above which | 
|  | are important for working successfully with this interface. | 
|  |  | 
|  | * LoopInfo does not contain information about non-loop cycles.  As a | 
|  | result, it is not suitable for any algorithm which requires complete | 
|  | cycle detection for correctness. | 
|  |  | 
|  | * LoopInfo provides an interface for enumerating all top level loops | 
|  | (e.g. those not contained in any other loop).  From there, you may | 
|  | walk the tree of sub-loops rooted in that top level loop. | 
|  |  | 
|  | * Loops which become statically unreachable during optimization *must* | 
|  | be removed from LoopInfo. If this can not be done for some reason, | 
|  | then the optimization is *required* to preserve the static | 
|  | reachability of the loop. | 
|  |  | 
|  |  | 
|  | .. _loop-terminology-loop-simplify: | 
|  |  | 
|  | Loop Simplify Form | 
|  | ================== | 
|  |  | 
|  | The Loop Simplify Form is a canonical form that makes | 
|  | several analyses and transformations simpler and more effective. | 
|  | It is ensured by the LoopSimplify | 
|  | (:ref:`-loop-simplify <passes-loop-simplify>`) pass and is automatically | 
|  | added by the pass managers when scheduling a LoopPass. | 
|  | This pass is implemented in | 
|  | `LoopSimplify.h <https://llvm.org/doxygen/LoopSimplify_8h_source.html>`_. | 
|  | When it is successful, the loop has: | 
|  |  | 
|  | * A preheader. | 
|  | * A single backedge (which implies that there is a single latch). | 
|  | * Dedicated exits. That is, no exit block for the loop | 
|  | has a predecessor that is outside the loop. This implies | 
|  | that all exit blocks are dominated by the loop header. | 
|  |  | 
|  | .. _loop-terminology-lcssa: | 
|  |  | 
|  | Loop Closed SSA (LCSSA) | 
|  | ======================= | 
|  |  | 
|  | A program is in Loop Closed SSA Form if it is in SSA form | 
|  | and all values that are defined in a loop are used only inside | 
|  | this loop. | 
|  | Programs written in LLVM IR are always in SSA form but not necessarily | 
|  | in LCSSA. To achieve the latter, single entry PHI nodes are inserted | 
|  | at the end of the loops for all values that are live | 
|  | across the loop boundary [#lcssa-construction]_. | 
|  | In particular, consider the following loop: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | c = ...; | 
|  | for (...) { | 
|  | if (c) | 
|  | X1 = ... | 
|  | else | 
|  | X2 = ... | 
|  | X3 = phi(X1, X2);  // X3 defined | 
|  | } | 
|  |  | 
|  | ... = X3 + 4;  // X3 used, i.e. live | 
|  | // outside the loop | 
|  |  | 
|  | In the inner loop, the X3 is defined inside the loop, but used | 
|  | outside of it. In Loop Closed SSA form, this would be represented as follows: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | c = ...; | 
|  | for (...) { | 
|  | if (c) | 
|  | X1 = ... | 
|  | else | 
|  | X2 = ... | 
|  | X3 = phi(X1, X2); | 
|  | } | 
|  | X4 = phi(X3); | 
|  |  | 
|  | ... = X4 + 4; | 
|  |  | 
|  | This is still valid LLVM; the extra phi nodes are purely redundant, | 
|  | but all LoopPass'es are required to preserve them. | 
|  | This form is ensured by the LCSSA (:ref:`-lcssa <passes-lcssa>`) | 
|  | pass and is added automatically by the LoopPassManager when | 
|  | scheduling a LoopPass. | 
|  | After the loop optimizations are done, these extra phi nodes | 
|  | will be deleted by :ref:`-instcombine <passes-instcombine>`. | 
|  |  | 
|  | The major benefit of this transformation is that it makes many other | 
|  | loop optimizations simpler. | 
|  |  | 
|  | First of all, a simple observation is that if one needs to see all | 
|  | the outside users, they can just iterate over all the (loop closing) | 
|  | PHI nodes in the exit blocks (the alternative would be to | 
|  | scan the def-use chain [#def-use-chain]_ of all instructions in the loop). | 
|  |  | 
|  | Then, consider for example | 
|  | :ref:`-loop-unswitch <passes-loop-unswitch>` ing the loop above. | 
|  | Because it is in LCSSA form, we know that any value defined inside of | 
|  | the loop will be used either only inside the loop or in a loop closing | 
|  | PHI node. In this case, the only loop closing PHI node is X4. | 
|  | This means that we can just copy the loop and change the X4 | 
|  | accordingly, like so: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | c = ...; | 
|  | if (c) { | 
|  | for (...) { | 
|  | if (true) | 
|  | X1 = ... | 
|  | else | 
|  | X2 = ... | 
|  | X3 = phi(X1, X2); | 
|  | } | 
|  | } else { | 
|  | for (...) { | 
|  | if (false) | 
|  | X1' = ... | 
|  | else | 
|  | X2' = ... | 
|  | X3' = phi(X1', X2'); | 
|  | } | 
|  | } | 
|  | X4 = phi(X3, X3') | 
|  |  | 
|  | Now, all uses of X4 will get the updated value (in general, | 
|  | if a loop is in LCSSA form, in any loop transformation, | 
|  | we only need to update the loop closing PHI nodes for the changes | 
|  | to take effect).  If we did not have Loop Closed SSA form, it means that X3 could | 
|  | possibly be used outside the loop. So, we would have to introduce the | 
|  | X4 (which is the new X3) and replace all uses of X3 with that. | 
|  | However, we should note that because LLVM keeps a def-use chain | 
|  | [#def-use-chain]_ for each Value, we wouldn't need | 
|  | to perform data-flow analysis to find and replace all the uses | 
|  | (there is even a utility function, replaceAllUsesWith(), | 
|  | that performs this transformation by iterating the def-use chain). | 
|  |  | 
|  | Another important advantage is that the behavior of all uses | 
|  | of an induction variable is the same.  Without this, you need to | 
|  | distinguish the case when the variable is used outside of | 
|  | the loop it is defined in, for example: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | for (i = 0; i < 100; i++) { | 
|  | for (j = 0; j < 100; j++) { | 
|  | k = i + j; | 
|  | use(k);    // use 1 | 
|  | } | 
|  | use(k);      // use 2 | 
|  | } | 
|  |  | 
|  | Looking from the outer loop with the normal SSA form, the first use of k | 
|  | is not well-behaved, while the second one is an induction variable with | 
|  | base 100 and step 1.  Although, in practice, and in the LLVM context, | 
|  | such cases can be handled effectively by SCEV. Scalar Evolution | 
|  | (:ref:`scalar-evolution <passes-scalar-evolution>`) or SCEV, is a | 
|  | (analysis) pass that analyzes and categorizes the evolution of scalar | 
|  | expressions in loops. | 
|  |  | 
|  | In general, it's easier to use SCEV in loops that are in LCSSA form. | 
|  | The evolution of a scalar (loop-variant) expression that | 
|  | SCEV can analyze is, by definition, relative to a loop. | 
|  | An expression is represented in LLVM by an | 
|  | `llvm::Instruction <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`. | 
|  | If the expression is inside two (or more) loops (which can only | 
|  | happen if the loops are nested, like in the example above) and you want | 
|  | to get an analysis of its evolution (from SCEV), | 
|  | you have to also specify relative to what Loop you want it. | 
|  | Specifically, you have to use | 
|  | `getSCEVAtScope() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a21d6ee82eed29080d911dbb548a8bb68>`_. | 
|  |  | 
|  | However, if all loops are in LCSSA form, each expression is actually | 
|  | represented by two different llvm::Instructions.  One inside the loop | 
|  | and one outside, which is the loop-closing PHI node and represents | 
|  | the value of the expression after the last iteration (effectively, | 
|  | we break each loop-variant expression into two expressions and so, every | 
|  | expression is at most in one loop).  You can now just use | 
|  | `getSCEV() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a30bd18ac905eacf3601bc6a553a9ff49>`_. | 
|  | and which of these two llvm::Instructions you pass to it disambiguates | 
|  | the context / scope / relative loop. | 
|  |  | 
|  | .. rubric:: Footnotes | 
|  |  | 
|  | .. [#lcssa-construction] To insert these loop-closing PHI nodes, one has to | 
|  | (re-)compute dominance frontiers (if the loop has multiple exits). | 
|  |  | 
|  | .. [#def-use-chain] A property of SSA is that there exists a def-use chain | 
|  | for each definition, which is a list of all the uses of this definition. | 
|  | LLVM implements this property by keeping a list of all the uses of a Value | 
|  | in an internal data structure. | 
|  |  | 
|  | "More Canonical" Loops | 
|  | ====================== | 
|  |  | 
|  | .. _loop-terminology-loop-rotate: | 
|  |  | 
|  | Rotated Loops | 
|  | ------------- | 
|  |  | 
|  | Loops are rotated by the LoopRotate (:ref:`loop-rotate <passes-loop-rotate>`) | 
|  | pass, which converts loops into do/while style loops and is | 
|  | implemented in | 
|  | `LoopRotation.h <https://llvm.org/doxygen/LoopRotation_8h_source.html>`_.  Example: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | void test(int n) { | 
|  | for (int i = 0; i < n; i += 1) | 
|  | // Loop body | 
|  | } | 
|  |  | 
|  | is transformed to: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | void test(int n) { | 
|  | int i = 0; | 
|  | do { | 
|  | // Loop body | 
|  | i += 1; | 
|  | } while (i < n); | 
|  | } | 
|  |  | 
|  | **Warning**: This transformation is valid only if the compiler | 
|  | can prove that the loop body will be executed at least once. Otherwise, | 
|  | it has to insert a guard which will test it at runtime. In the example | 
|  | above, that would be: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | void test(int n) { | 
|  | int i = 0; | 
|  | if (n > 0) { | 
|  | do { | 
|  | // Loop body | 
|  | i += 1; | 
|  | } while (i < n); | 
|  | } | 
|  | } | 
|  |  | 
|  | It's important to understand the effect of loop rotation | 
|  | at the LLVM IR level. We follow with the previous examples | 
|  | in LLVM IR while also providing a graphical representation | 
|  | of the control-flow graphs (CFG). You can get the same graphical | 
|  | results by utilizing the :ref:`view-cfg <passes-view-cfg>` pass. | 
|  |  | 
|  | The initial **for** loop could be translated to: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | define void @test(i32 %n) { | 
|  | entry: | 
|  | br label %for.header | 
|  |  | 
|  | for.header: | 
|  | %i = phi i32 [ 0, %entry ], [ %i.next, %latch ] | 
|  | %cond = icmp slt i32 %i, %n | 
|  | br i1 %cond, label %body, label %exit | 
|  |  | 
|  | body: | 
|  | ; Loop body | 
|  | br label %latch | 
|  |  | 
|  | latch: | 
|  | %i.next = add nsw i32 %i, 1 | 
|  | br label %for.header | 
|  |  | 
|  | exit: | 
|  | ret void | 
|  | } | 
|  |  | 
|  | .. image:: ./loop-terminology-initial-loop.png | 
|  | :width: 400 px | 
|  |  | 
|  | Before we explain how LoopRotate will actually | 
|  | transform this loop, here's how we could convert | 
|  | it (by hand) to a do-while style loop. | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | define void @test(i32 %n) { | 
|  | entry: | 
|  | br label %body | 
|  |  | 
|  | body: | 
|  | %i = phi i32 [ 0, %entry ], [ %i.next, %latch ] | 
|  | ; Loop body | 
|  | br label %latch | 
|  |  | 
|  | latch: | 
|  | %i.next = add nsw i32 %i, 1 | 
|  | %cond = icmp slt i32 %i.next, %n | 
|  | br i1 %cond, label %body, label %exit | 
|  |  | 
|  | exit: | 
|  | ret void | 
|  | } | 
|  |  | 
|  | .. image:: ./loop-terminology-rotated-loop.png | 
|  | :width: 400 px | 
|  |  | 
|  | Note two things: | 
|  |  | 
|  | * The condition check was moved to the "bottom" of the loop, i.e. | 
|  | the latch. This is something that LoopRotate does by copying the header | 
|  | of the loop to the latch. | 
|  | * The compiler in this case can't deduce that the loop will | 
|  | definitely execute at least once so the above transformation | 
|  | is not valid. As mentioned above, a guard has to be inserted, | 
|  | which is something that LoopRotate will do. | 
|  |  | 
|  | This is how LoopRotate transforms this loop: | 
|  |  | 
|  | .. code-block:: none | 
|  |  | 
|  | define void @test(i32 %n) { | 
|  | entry: | 
|  | %guard_cond = icmp slt i32 0, %n | 
|  | br i1 %guard_cond, label %loop.preheader, label %exit | 
|  |  | 
|  | loop.preheader: | 
|  | br label %body | 
|  |  | 
|  | body: | 
|  | %i2 = phi i32 [ 0, %loop.preheader ], [ %i.next, %latch ] | 
|  | br label %latch | 
|  |  | 
|  | latch: | 
|  | %i.next = add nsw i32 %i2, 1 | 
|  | %cond = icmp slt i32 %i.next, %n | 
|  | br i1 %cond, label %body, label %loop.exit | 
|  |  | 
|  | loop.exit: | 
|  | br label %exit | 
|  |  | 
|  | exit: | 
|  | ret void | 
|  | } | 
|  |  | 
|  | .. image:: ./loop-terminology-guarded-loop.png | 
|  | :width: 500 px | 
|  |  | 
|  | The result is a little bit more complicated than we may expect | 
|  | because LoopRotate ensures that the loop is in | 
|  | :ref:`Loop Simplify Form <loop-terminology-loop-simplify>` | 
|  | after rotation. | 
|  | In this case, it inserted the %loop.preheader basic block so | 
|  | that the loop has a preheader and it introduced the %loop.exit | 
|  | basic block so that the loop has dedicated exits | 
|  | (otherwise, %exit would be jumped from both %latch and %entry, | 
|  | but %entry is not contained in the loop). | 
|  | Note that a loop has to be in Loop Simplify Form beforehand | 
|  | too for LoopRotate to be applied successfully. | 
|  |  | 
|  | The main advantage of this form is that it allows hoisting | 
|  | invariant instructions, especially loads, into the preheader. | 
|  | That could be done in non-rotated loops as well but with | 
|  | some disadvantages.  Let's illustrate them with an example: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | for (int i = 0; i < n; ++i) { | 
|  | auto v = *p; | 
|  | use(v); | 
|  | } | 
|  |  | 
|  | We assume that loading from p is invariant and use(v) is some | 
|  | statement that uses v. | 
|  | If we wanted to execute the load only once we could move it | 
|  | "out" of the loop body, resulting in this: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | auto v = *p; | 
|  | for (int i = 0; i < n; ++i) { | 
|  | use(v); | 
|  | } | 
|  |  | 
|  | However, now, in the case that n <= 0, in the initial form, | 
|  | the loop body would never execute, and so, the load would | 
|  | never execute.  This is a problem mainly for semantic reasons. | 
|  | Consider the case in which n <= 0 and loading from p is invalid. | 
|  | In the initial program there would be no error.  However, with this | 
|  | transformation we would introduce one, effectively breaking | 
|  | the initial semantics. | 
|  |  | 
|  | To avoid both of these problems, we can insert a guard: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | if (n > 0) {  // loop guard | 
|  | auto v = *p; | 
|  | for (int i = 0; i < n; ++i) { | 
|  | use(v); | 
|  | } | 
|  | } | 
|  |  | 
|  | This is certainly better but it could be improved slightly. Notice | 
|  | that the check for whether n is bigger than 0 is executed twice (and | 
|  | n does not change in between).  Once when we check the guard condition | 
|  | and once in the first execution of the loop.  To avoid that, we could | 
|  | do an unconditional first execution and insert the loop condition | 
|  | in the end. This effectively means transforming the loop into a do-while loop: | 
|  |  | 
|  | .. code-block:: C | 
|  |  | 
|  | if (0 < n) { | 
|  | auto v = *p; | 
|  | do { | 
|  | use(v); | 
|  | ++i; | 
|  | } while (i < n); | 
|  | } | 
|  |  | 
|  | Note that LoopRotate does not generally do such | 
|  | hoisting.  Rather, it is an enabling transformation for other | 
|  | passes like Loop-Invariant Code Motion (:ref:`-licm <passes-licm>`). |