flang/docs/Semantics.md - rust-lang/llvm-project - Git at Google

 <!--===- docs/Semantics.md

    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
    See https://llvm.org/LICENSE.txt for license information.
    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

 -->

 # Semantic Analysis

 ```{contents}
 ---
 local:
 ---
 ```

 The semantic analysis pass determines if a syntactically correct Fortran
 program is is legal by enforcing the constraints of the language.

 The input is a parse tree with a `Program` node at the root;
 and a "cooked" character stream, a contiguous stream of characters
 containing a normalized form of the Fortran source.

 The semantic analysis pass takes a parse tree for a syntactically
 correct Fortran program and determines whether it is legal by enforcing
 the constraints of the language.

 If the program is not legal, the results of the semantic pass will be a list of
 errors associated with the program.

 If the program is legal, the semantic pass will produce a (possibly modified)
 parse tree for the semantically correct program with each name mapped to a symbol
 and each expression fully analyzed.

 All user errors are detected either prior to or during semantic analysis.
 After it completes successfully the program should compile with no error messages.
 There may still be warnings or informational messages.

 ## Phases of Semantic Analysis

 1. [Validate labels](#validate-labels) -
    Check all constraints on labels and branches
 2. [Rewrite DO loops](#rewrite-do-loops) -
    Convert all occurrences of `LabelDoStmt` to `DoConstruct`.
 3. [Name resolution](#name-resolution) -
    Analyze names and declarations, build a tree of Scopes containing Symbols,
    and fill in the `Name::symbol` data member in the parse tree
 4. [Rewrite parse tree](#rewrite-parse-tree) -
    Fix incorrect parses based on symbol information
 5. [Expression analysis](#expression-analysis) -
    Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and
    `Variable::typedExpr` with analyzed expressions; fix incorrect parses
    based on the result of this analysis
 6. [Statement semantics](#statement-semantics) -
    Perform remaining semantic checks on the execution parts of subprograms
 7. [Write module files](#write-module-files) -
    If no errors have occurred, write out `.mod` files for modules and submodules

 If phase 1 or phase 2 encounter an error on any of the program units,
 compilation terminates. Otherwise, phases 3-6 are all performed even if
 errors occur.
 Module files are written (phase 7) only if there are no errors.

 ### Validate labels

 Perform semantic checks related to labels and branches:
 - check that any labels that are referenced are defined and in scope
 - check branches into loop bodies
 - check that labeled `DO` loops are properly nested
 - check labels in data transfer statements

 ### Rewrite DO loops

 This phase normalizes the parse tree by removing all unstructured `DO` loops
 and replacing them with `DO` constructs.

 ### Name resolution

 The name resolution phase walks the parse tree and constructs the symbol table.

 The symbol table consists of a tree of `Scope` objects rooted at the global scope.
 The global scope is owned by the `SemanticsContext` object.
 It contains a `Scope` for each program unit in the compilation.

 Each `Scope` in the scope tree contains child scopes representing other scopes
 lexically nested in it.
 Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names
 declared in that scope. (All names in the symbol table are represented as
 `CharBlock` objects, i.e. as substrings of the cooked character stream.)

 All `Symbol` objects are owned by the symbol table data structures.
 They should be accessed as `Symbol *` or `Symbol &` outside of the symbol
 table classes as they can't be created, copied, or moved.
 The `Symbol` class has functions and data common across all symbols, and a
 `details` field that contains more information specific to that type of symbol.
 Many symbols also have types, represented by `DeclTypeSpec`.
 Types are also owned by scopes.

 Name resolution happens on the parse tree in this order:
 1. Process the specification of a program unit:
    1. Create a new scope for the unit
    2. Create a symbol for each contained subprogram containing just the name
    3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.)
    4. Process the specification part of the unit
 2. Apply the same process recursively to nested subprograms
 3. Process the execution part of the program unit
 4. Process the execution parts of nested subprograms recursively

 After the completion of this phase, every `Name` corresponds to a `Symbol`
 unless an error occurred.

 ### Rewrite parse tree

 The parser cannot build a completely correct parse tree without symbol information.
 This phase corrects mis-parses based on symbols:
 - Array element assignments may be parsed as statement functions: `a(i) = ...`
 - Namelist group names without `NML=` may be parsed as format expressions
 - A file unit number expression may be parsed as a character variable

 This phase also produces an internal error if it finds a `Name` that does not
 have its `symbol` data member filled in. This error is suppressed if other
 errors have occurred because in that case a `Name` corresponding to an erroneous
 symbol may not be resolved.

 ### Expression analysis

 Expressions that occur in the specification part are analyzed during name
 resolution, for example, initial values, array bounds, type parameters.
 Any remaining expressions are analyzed in this phase.

 For each `Variable` and top-level `Expr` (i.e. one that is not nested below
 another `Expr` in the parse tree) the analyzed form of the expression is saved
 in the `typedExpr` data member. After this phase has completed, the analyzed
 expression can be accessed using `semantics::GetExpr()`.

 This phase also corrects mis-parses based on the result of expression analysis:
 - An expression like `a(b)` is parsed as a function reference but may need
   to be rewritten to an array element reference (if `a` is an object entity)
   or to a structure constructor (if `a` is a derive type)
 - An expression like `a(b:c)` is parsed as an array section but may need to be
   rewritten as a substring if `a` is an object with type CHARACTER

 ### Statement semantics

 Multiple independent checkers driven by the `SemanticsVisitor` framework
 perform the remaining semantic checks.
 By this phase, all names and expressions that can be successfully resolved
 have been. But there may be names without symbols or expressions without
 analyzed form if errors occurred earlier.

 ### Initialization processing

 Fortran supports many means of specifying static initializers for variables,
 object pointers, and procedure pointers, as well as default initializers for
 derived type object components, pointers, and type parameters.

 Non-pointer static initializers of variables and named constants are
 scanned, analyzed, folded, scalar-expanded, and validated as they are
 traversed during declaration processing in name resolution.
 So are the default initializers of non-pointer object components in
 non-parameterized derived types.
 Name constant arrays with implied shapes take their actual shape from
 the initialization expression.

 Default initializers of non-pointer components and type parameters
 in distinct parameterized
 derived type instantiations are similarly processed as those instances
 are created, as their expressions may depend on the values of type
 parameters.
 Error messages produced during parameterized derived type instantiation
 are decorated with contextual attachments that point to the declarations
 or other type specifications that caused the instantiation.

 Static initializations in `DATA` statements are collected, validated,
 and converted into static initialization in the symbol table, as if
 the initialized objects had used the newer style of static initialization
 in their entity declarations.

 All statically initialized pointers, and default component initializers for
 pointers, are processed late in name resolution after all specification parts
 have been traversed.
 This allows for forward references even in the presence of `IMPLICIT NONE`.
 Object pointer initializers in parameterized derived type instantiations are
 also cloned and folded at this late stage.
 Validation of pointer initializers takes place later in declaration
 checking (below).

 ### Declaration checking

 Whenever possible, the enforcement of constraints and "shalls" pertaining to
 properties of symbols is deferred to a single read-only pass over the symbol table
 that takes place after all name resolution and typing is complete.

 ### Write module files

 Separate compilation information is written out on successful compilation
 of modules and submodules. These are used as input to name resolution
 in program units that `USE` the modules.

 Module files are stripped down Fortran source for the module.
 Parts that aren't needed to compile dependent program units (e.g. action statements)
 are omitted.

 The module file for module `m` is named `m.mod` and the module file for
 submodule `s` of module `m` is named `m-s.mod`.
	<!--===- docs/Semantics.md

	Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	See https://llvm.org/LICENSE.txt for license information.
	SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

	-->

	# Semantic Analysis

	```{contents}
	---
	local:
	---
	```

	The semantic analysis pass determines if a syntactically correct Fortran
	program is is legal by enforcing the constraints of the language.

	The input is a parse tree with a `Program` node at the root;
	and a "cooked" character stream, a contiguous stream of characters
	containing a normalized form of the Fortran source.

	The semantic analysis pass takes a parse tree for a syntactically
	correct Fortran program and determines whether it is legal by enforcing
	the constraints of the language.

	If the program is not legal, the results of the semantic pass will be a list of
	errors associated with the program.

	If the program is legal, the semantic pass will produce a (possibly modified)
	parse tree for the semantically correct program with each name mapped to a symbol
	and each expression fully analyzed.

	All user errors are detected either prior to or during semantic analysis.
	After it completes successfully the program should compile with no error messages.
	There may still be warnings or informational messages.

	## Phases of Semantic Analysis

	1. [Validate labels](#validate-labels) -
	Check all constraints on labels and branches
	2. [Rewrite DO loops](#rewrite-do-loops) -
	Convert all occurrences of `LabelDoStmt` to `DoConstruct`.
	3. [Name resolution](#name-resolution) -
	Analyze names and declarations, build a tree of Scopes containing Symbols,
	and fill in the `Name::symbol` data member in the parse tree
	4. [Rewrite parse tree](#rewrite-parse-tree) -
	Fix incorrect parses based on symbol information
	5. [Expression analysis](#expression-analysis) -
	Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and
	`Variable::typedExpr` with analyzed expressions; fix incorrect parses
	based on the result of this analysis
	6. [Statement semantics](#statement-semantics) -
	Perform remaining semantic checks on the execution parts of subprograms
	7. [Write module files](#write-module-files) -
	If no errors have occurred, write out `.mod` files for modules and submodules

	If phase 1 or phase 2 encounter an error on any of the program units,
	compilation terminates. Otherwise, phases 3-6 are all performed even if
	errors occur.
	Module files are written (phase 7) only if there are no errors.

	### Validate labels

	Perform semantic checks related to labels and branches:
	- check that any labels that are referenced are defined and in scope
	- check branches into loop bodies
	- check that labeled `DO` loops are properly nested
	- check labels in data transfer statements

	### Rewrite DO loops

	This phase normalizes the parse tree by removing all unstructured `DO` loops
	and replacing them with `DO` constructs.

	### Name resolution

	The name resolution phase walks the parse tree and constructs the symbol table.

	The symbol table consists of a tree of `Scope` objects rooted at the global scope.
	The global scope is owned by the `SemanticsContext` object.
	It contains a `Scope` for each program unit in the compilation.

	Each `Scope` in the scope tree contains child scopes representing other scopes
	lexically nested in it.
	Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names
	declared in that scope. (All names in the symbol table are represented as
	`CharBlock` objects, i.e. as substrings of the cooked character stream.)

	All `Symbol` objects are owned by the symbol table data structures.
	They should be accessed as `Symbol *` or `Symbol &` outside of the symbol
	table classes as they can't be created, copied, or moved.
	The `Symbol` class has functions and data common across all symbols, and a
	`details` field that contains more information specific to that type of symbol.
	Many symbols also have types, represented by `DeclTypeSpec`.
	Types are also owned by scopes.

	Name resolution happens on the parse tree in this order:
	1. Process the specification of a program unit:
	1. Create a new scope for the unit
	2. Create a symbol for each contained subprogram containing just the name
	3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.)
	4. Process the specification part of the unit
	2. Apply the same process recursively to nested subprograms
	3. Process the execution part of the program unit
	4. Process the execution parts of nested subprograms recursively

	After the completion of this phase, every `Name` corresponds to a `Symbol`
	unless an error occurred.

	### Rewrite parse tree

	The parser cannot build a completely correct parse tree without symbol information.
	This phase corrects mis-parses based on symbols:
	- Array element assignments may be parsed as statement functions: `a(i) = ...`
	- Namelist group names without `NML=` may be parsed as format expressions
	- A file unit number expression may be parsed as a character variable

	This phase also produces an internal error if it finds a `Name` that does not
	have its `symbol` data member filled in. This error is suppressed if other
	errors have occurred because in that case a `Name` corresponding to an erroneous
	symbol may not be resolved.

	### Expression analysis

	Expressions that occur in the specification part are analyzed during name
	resolution, for example, initial values, array bounds, type parameters.
	Any remaining expressions are analyzed in this phase.

	For each `Variable` and top-level `Expr` (i.e. one that is not nested below
	another `Expr` in the parse tree) the analyzed form of the expression is saved
	in the `typedExpr` data member. After this phase has completed, the analyzed
	expression can be accessed using `semantics::GetExpr()`.

	This phase also corrects mis-parses based on the result of expression analysis:
	- An expression like `a(b)` is parsed as a function reference but may need
	to be rewritten to an array element reference (if `a` is an object entity)
	or to a structure constructor (if `a` is a derive type)
	- An expression like `a(b:c)` is parsed as an array section but may need to be
	rewritten as a substring if `a` is an object with type CHARACTER

	### Statement semantics

	Multiple independent checkers driven by the `SemanticsVisitor` framework
	perform the remaining semantic checks.
	By this phase, all names and expressions that can be successfully resolved
	have been. But there may be names without symbols or expressions without
	analyzed form if errors occurred earlier.

	### Initialization processing

	Fortran supports many means of specifying static initializers for variables,
	object pointers, and procedure pointers, as well as default initializers for
	derived type object components, pointers, and type parameters.

	Non-pointer static initializers of variables and named constants are
	scanned, analyzed, folded, scalar-expanded, and validated as they are
	traversed during declaration processing in name resolution.
	So are the default initializers of non-pointer object components in
	non-parameterized derived types.
	Name constant arrays with implied shapes take their actual shape from
	the initialization expression.

	Default initializers of non-pointer components and type parameters
	in distinct parameterized
	derived type instantiations are similarly processed as those instances
	are created, as their expressions may depend on the values of type
	parameters.
	Error messages produced during parameterized derived type instantiation
	are decorated with contextual attachments that point to the declarations
	or other type specifications that caused the instantiation.

	Static initializations in `DATA` statements are collected, validated,
	and converted into static initialization in the symbol table, as if
	the initialized objects had used the newer style of static initialization
	in their entity declarations.

	All statically initialized pointers, and default component initializers for
	pointers, are processed late in name resolution after all specification parts
	have been traversed.
	This allows for forward references even in the presence of `IMPLICIT NONE`.
	Object pointer initializers in parameterized derived type instantiations are
	also cloned and folded at this late stage.
	Validation of pointer initializers takes place later in declaration
	checking (below).

	### Declaration checking

	Whenever possible, the enforcement of constraints and "shalls" pertaining to
	properties of symbols is deferred to a single read-only pass over the symbol table
	that takes place after all name resolution and typing is complete.

	### Write module files

	Separate compilation information is written out on successful compilation
	of modules and submodules. These are used as input to name resolution
	in program units that `USE` the modules.

	Module files are stripped down Fortran source for the module.
	Parts that aren't needed to compile dependent program units (e.g. action statements)
	are omitted.

	The module file for module `m` is named `m.mod` and the module file for
	submodule `s` of module `m` is named `m-s.mod`.