|  | <!--===- docs/Semantics.md | 
|  |  | 
|  | Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | 
|  | See https://llvm.org/LICENSE.txt for license information. | 
|  | SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | 
|  |  | 
|  | --> | 
|  |  | 
|  | # Semantic Analysis | 
|  |  | 
|  | ```{contents} | 
|  | --- | 
|  | local: | 
|  | --- | 
|  | ``` | 
|  |  | 
|  | The semantic analysis pass determines if a syntactically correct Fortran | 
|  | program is is legal by enforcing the constraints of the language. | 
|  |  | 
|  | The input is a parse tree with a `Program` node at the root; | 
|  | and a "cooked" character stream, a contiguous stream of characters | 
|  | containing a normalized form of the Fortran source. | 
|  |  | 
|  | The semantic analysis pass takes a parse tree for a syntactically | 
|  | correct Fortran program and determines whether it is legal by enforcing | 
|  | the constraints of the language. | 
|  |  | 
|  | If the program is not legal, the results of the semantic pass will be a list of | 
|  | errors associated with the program. | 
|  |  | 
|  | If the program is legal, the semantic pass will produce a (possibly modified) | 
|  | parse tree for the semantically correct program with each name mapped to a symbol | 
|  | and each expression fully analyzed. | 
|  |  | 
|  | All user errors are detected either prior to or during semantic analysis. | 
|  | After it completes successfully the program should compile with no error messages. | 
|  | There may still be warnings or informational messages. | 
|  |  | 
|  | ## Phases of Semantic Analysis | 
|  |  | 
|  | 1. [Validate labels](#validate-labels) - | 
|  | Check all constraints on labels and branches | 
|  | 2. [Rewrite DO loops](#rewrite-do-loops) - | 
|  | Convert all occurrences of `LabelDoStmt` to `DoConstruct`. | 
|  | 3. [Name resolution](#name-resolution) - | 
|  | Analyze names and declarations, build a tree of Scopes containing Symbols, | 
|  | and fill in the `Name::symbol` data member in the parse tree | 
|  | 4. [Rewrite parse tree](#rewrite-parse-tree) - | 
|  | Fix incorrect parses based on symbol information | 
|  | 5. [Expression analysis](#expression-analysis) - | 
|  | Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and | 
|  | `Variable::typedExpr` with analyzed expressions; fix incorrect parses | 
|  | based on the result of this analysis | 
|  | 6. [Statement semantics](#statement-semantics) - | 
|  | Perform remaining semantic checks on the execution parts of subprograms | 
|  | 7. [Write module files](#write-module-files) - | 
|  | If no errors have occurred, write out `.mod` files for modules and submodules | 
|  |  | 
|  | If phase 1 or phase 2 encounter an error on any of the program units, | 
|  | compilation terminates. Otherwise, phases 3-6 are all performed even if | 
|  | errors occur. | 
|  | Module files are written (phase 7) only if there are no errors. | 
|  |  | 
|  | ### Validate labels | 
|  |  | 
|  | Perform semantic checks related to labels and branches: | 
|  | - check that any labels that are referenced are defined and in scope | 
|  | - check branches into loop bodies | 
|  | - check that labeled `DO` loops are properly nested | 
|  | - check labels in data transfer statements | 
|  |  | 
|  | ### Rewrite DO loops | 
|  |  | 
|  | This phase normalizes the parse tree by removing all unstructured `DO` loops | 
|  | and replacing them with `DO` constructs. | 
|  |  | 
|  | ### Name resolution | 
|  |  | 
|  | The name resolution phase walks the parse tree and constructs the symbol table. | 
|  |  | 
|  | The symbol table consists of a tree of `Scope` objects rooted at the global scope. | 
|  | The global scope is owned by the `SemanticsContext` object. | 
|  | It contains a `Scope` for each program unit in the compilation. | 
|  |  | 
|  | Each `Scope` in the scope tree contains child scopes representing other scopes | 
|  | lexically nested in it. | 
|  | Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names | 
|  | declared in that scope. (All names in the symbol table are represented as | 
|  | `CharBlock` objects, i.e. as substrings of the cooked character stream.) | 
|  |  | 
|  | All `Symbol` objects are owned by the symbol table data structures. | 
|  | They should be accessed as `Symbol *` or `Symbol &` outside of the symbol | 
|  | table classes as they can't be created, copied, or moved. | 
|  | The `Symbol` class has functions and data common across all symbols, and a | 
|  | `details` field that contains more information specific to that type of symbol. | 
|  | Many symbols also have types, represented by `DeclTypeSpec`. | 
|  | Types are also owned by scopes. | 
|  |  | 
|  | Name resolution happens on the parse tree in this order: | 
|  | 1. Process the specification of a program unit: | 
|  | 1. Create a new scope for the unit | 
|  | 2. Create a symbol for each contained subprogram containing just the name | 
|  | 3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.) | 
|  | 4. Process the specification part of the unit | 
|  | 2. Apply the same process recursively to nested subprograms | 
|  | 3. Process the execution part of the program unit | 
|  | 4. Process the execution parts of nested subprograms recursively | 
|  |  | 
|  | After the completion of this phase, every `Name` corresponds to a `Symbol` | 
|  | unless an error occurred. | 
|  |  | 
|  | ### Rewrite parse tree | 
|  |  | 
|  | The parser cannot build a completely correct parse tree without symbol information. | 
|  | This phase corrects mis-parses based on symbols: | 
|  | - Array element assignments may be parsed as statement functions: `a(i) = ...` | 
|  | - Namelist group names without `NML=` may be parsed as format expressions | 
|  | - A file unit number expression may be parsed as a character variable | 
|  |  | 
|  | This phase also produces an internal error if it finds a `Name` that does not | 
|  | have its `symbol` data member filled in. This error is suppressed if other | 
|  | errors have occurred because in that case a `Name` corresponding to an erroneous | 
|  | symbol may not be resolved. | 
|  |  | 
|  | ### Expression analysis | 
|  |  | 
|  | Expressions that occur in the specification part are analyzed during name | 
|  | resolution, for example, initial values, array bounds, type parameters. | 
|  | Any remaining expressions are analyzed in this phase. | 
|  |  | 
|  | For each `Variable` and top-level `Expr` (i.e. one that is not nested below | 
|  | another `Expr` in the parse tree) the analyzed form of the expression is saved | 
|  | in the `typedExpr` data member. After this phase has completed, the analyzed | 
|  | expression can be accessed using `semantics::GetExpr()`. | 
|  |  | 
|  | This phase also corrects mis-parses based on the result of expression analysis: | 
|  | - An expression like `a(b)` is parsed as a function reference but may need | 
|  | to be rewritten to an array element reference (if `a` is an object entity) | 
|  | or to a structure constructor (if `a` is a derive type) | 
|  | - An expression like `a(b:c)` is parsed as an array section but may need to be | 
|  | rewritten as a substring if `a` is an object with type CHARACTER | 
|  |  | 
|  | ### Statement semantics | 
|  |  | 
|  | Multiple independent checkers driven by the `SemanticsVisitor` framework | 
|  | perform the remaining semantic checks. | 
|  | By this phase, all names and expressions that can be successfully resolved | 
|  | have been. But there may be names without symbols or expressions without | 
|  | analyzed form if errors occurred earlier. | 
|  |  | 
|  | ### Initialization processing | 
|  |  | 
|  | Fortran supports many means of specifying static initializers for variables, | 
|  | object pointers, and procedure pointers, as well as default initializers for | 
|  | derived type object components, pointers, and type parameters. | 
|  |  | 
|  | Non-pointer static initializers of variables and named constants are | 
|  | scanned, analyzed, folded, scalar-expanded, and validated as they are | 
|  | traversed during declaration processing in name resolution. | 
|  | So are the default initializers of non-pointer object components in | 
|  | non-parameterized derived types. | 
|  | Name constant arrays with implied shapes take their actual shape from | 
|  | the initialization expression. | 
|  |  | 
|  | Default initializers of non-pointer components and type parameters | 
|  | in distinct parameterized | 
|  | derived type instantiations are similarly processed as those instances | 
|  | are created, as their expressions may depend on the values of type | 
|  | parameters. | 
|  | Error messages produced during parameterized derived type instantiation | 
|  | are decorated with contextual attachments that point to the declarations | 
|  | or other type specifications that caused the instantiation. | 
|  |  | 
|  | Static initializations in `DATA` statements are collected, validated, | 
|  | and converted into static initialization in the symbol table, as if | 
|  | the initialized objects had used the newer style of static initialization | 
|  | in their entity declarations. | 
|  |  | 
|  | All statically initialized pointers, and default component initializers for | 
|  | pointers, are processed late in name resolution after all specification parts | 
|  | have been traversed. | 
|  | This allows for forward references even in the presence of `IMPLICIT NONE`. | 
|  | Object pointer initializers in parameterized derived type instantiations are | 
|  | also cloned and folded at this late stage. | 
|  | Validation of pointer initializers takes place later in declaration | 
|  | checking (below). | 
|  |  | 
|  | ### Declaration checking | 
|  |  | 
|  | Whenever possible, the enforcement of constraints and "shalls" pertaining to | 
|  | properties of symbols is deferred to a single read-only pass over the symbol table | 
|  | that takes place after all name resolution and typing is complete. | 
|  |  | 
|  | ### Write module files | 
|  |  | 
|  | Separate compilation information is written out on successful compilation | 
|  | of modules and submodules. These are used as input to name resolution | 
|  | in program units that `USE` the modules. | 
|  |  | 
|  | Module files are stripped down Fortran source for the module. | 
|  | Parts that aren't needed to compile dependent program units (e.g. action statements) | 
|  | are omitted. | 
|  |  | 
|  | The module file for module `m` is named `m.mod` and the module file for | 
|  | submodule `s` of module `m` is named `m-s.mod`. |