Auto merge of #138591 - Kobzol:git-ci, r=Mark-Simulacrum

Refactor git change detection in bootstrap

While working on https://github.com/rust-lang/rust/pull/138395, I finally found the courage to delve into the insides of git path change detection in bootstrap, which is used (amongst other things) to detect if we should rebuilt od download `[llvm|rustc|gcc]`. I found it a bit hard to understand, and given that this code was historically quite fragile, I thought that it would be better to rebuild it from scratch.

The previous approach had a bunch of limitations:
- It separated the computation of "are there local changes?" and "what upstream SHA should we use?" even though these two things are intertwined.
- It used hacks to work around what happens on CI.
- It had special cases for CI scattered throughout the codebase, rather than centralized in one place.
- It wasn't documented enough and didn't have tests for the git behavior.

The current approach should hopefully resolve all of that. I implemented a single entrypoint called `check_path_modifications` (naming bikeshed pending, half of the time I spend on this PR was thinking about names, as it's quite tricky here..) that explicitly receives a mode of operation (in CI or outside CI), and accordingly figures out that upstream SHA that we should use for downloading artifacts and it also figures out if there are any local changes. Users of this function can then use this unified output to implement `download-ci-X` and other functionality. Notably, this change detection no longer uses `git merge-base`, which makes it easier to use and doesn't require setting up remotes.

I also added a bunch of integration tests that literally spawn a git repository on disk and then check that the function can deal with various situations (PR CI, auto/try CI, local builds).

After I built this inner layer, I used it for downloading GCC, LLVM and rustc. The latter two (and especially rustc) were using the `last_modified_commit` function before, but in all cases but one this function was actually only used to check if there are any local changes, which was IMO confusing. The LLVM handling would deserve a bit of refactoring, but that's a larger change that can be done as a follow-up.

I hope that the implementation is now clear and easy to understand, so that in combination with the tests we can have more confidence that it does what we want. I tried to include a lot of documentation in the code, so I won't be repeating the actual implementation details here, if there are any questions, I'll add the answers to the documentation too :)

The new approach explicitly supports three scenarios:
- Running on PR CI, where we have one upstream bors parent commit and one PR merge commit made by GitHub.
- Running on try/auto CI, where we have one upstream bors parent commit and one PR merge commit made by bors.
- Running locally, where we assume that we have at least one upstream bors parent commit in our git history.

I removed the handling of upstreams on CI, as I think that it shouldn't be needed and I considered it to be a hack. However, it's possible that there are other use-cases that I haven't considered, so I want to ask around if people have other situations than the three use-cases described above. If there are other such use-cases, I would like to include them in the new centralized implementation and add them to the git test suite, rather than going back to the old ways :)

In particular, the code before relied on `git merge-base`, but I don't see why we can't just lookup the most recent bors commit and assume that is a merge commit that is also upstream? I might be running into Chesterton's Fence here :)

CC `@pietroalbini` To make sure that this won't break downstream users of Rust's CI.

Best reviewed commit by commit.

Companion PRs:
- For testing beta: https://github.com/rust-lang/rust/pull/138597

r? `@onur-ozkan`

Fixes: https://github.com/rust-lang/rust/issues/101907

try-job: x86_64-gnu-aux
try-job: aarch64-gnu
try-job: dist-x86_64-apple
tree: 74f1289f85d5f7a4893d778c85e3cf13a6d4029d
  1. .github/
  2. ci/
  3. examples/
  4. josh-sync/
  5. src/
  6. .editorconfig
  7. .gitattributes
  8. .gitignore
  9. .mailmap
  10. book.toml
  11. CITATION.cff
  12. CNAME
  13. CODE_OF_CONDUCT.md
  14. LICENSE-APACHE
  15. LICENSE-MIT
  16. mermaid-init.js
  17. mermaid.min.js
  18. README.md
  19. rust-version
  20. rustfmt.toml
  21. triagebot.toml
README.md

CI

This is a collaborative effort to build a guide that explains how rustc works. The aim of the guide is to help new contributors get oriented to rustc, as well as to help more experienced folks in figuring out some new part of the compiler that they haven't worked on before.

You can read the latest version of the guide here.

You may also find the rustdocs for the compiler itself useful. Note that these are not intended as a guide; it‘s recommended that you search for the docs you’re looking for instead of reading them top to bottom.

For documentation on developing the standard library, see std-dev-guide.

Contributing to the guide

The guide is useful today, but it has a lot of work still to go.

If you‘d like to help improve the guide, we’d love to have you! You can find plenty of issues on the issue tracker. Just post a comment on the issue you would like to work on to make sure that we don't accidentally duplicate work. If you think something is missing, please open an issue about it!

In general, if you don't know how the compiler works, that is not a problem! In that case, what we will do is to schedule a bit of time for you to talk with someone who does know the code, or who wants to pair with you and figure it out. Then you can work on writing up what you learned.

In general, when writing about a particular part of the compiler's code, we recommend that you link to the relevant parts of the rustc rustdocs.

Build Instructions

To build a local static HTML site, install mdbook with:

cargo install mdbook mdbook-linkcheck2 mdbook-toc mdbook-mermaid

and execute the following command in the root of the repository:

mdbook build --open

The build files are found in the book/html directory.

Link Validations

We use mdbook-linkcheck2 to validate URLs included in our documentation. Link checking is not run by default locally, though it is in CI. To enable it locally, set the environment variable ENABLE_LINKCHECK=1 like in the following example.

ENABLE_LINKCHECK=1 mdbook serve

Table of Contents

We use mdbook-toc to auto-generate TOCs for long sections. You can invoke the preprocessor by including the <!-- toc --> marker at the place where you want the TOC.

Synchronizing josh subtree with rustc

This repository is linked to rust-lang/rust as a josh subtree. You can use the following commands to synchronize the subtree in both directions.

You'll need to install josh-proxy locally via

cargo +stable install josh-proxy --git https://github.com/josh-project/josh --tag r24.10.04

Older versions of josh-proxy may not round trip commits losslessly so it is important to install this exact version.

Pull changes from rust-lang/rust into this repository

  1. Checkout a new branch that will be used to create a PR into rust-lang/rustc-dev-guide
  2. Run the pull command
    cargo run --manifest-path josh-sync/Cargo.toml rustc-pull
    
  3. Push the branch to your fork and create a PR into rustc-dev-guide

Push changes from this repository into rust-lang/rust

  1. Run the push command to create a branch named <branch-name> in a rustc fork under the <gh-username> account
    cargo run --manifest-path josh-sync/Cargo.toml rustc-push <branch-name> <gh-username>
    
  2. Create a PR from <branch-name> into rust-lang/rust

Minimal git config

For simplicity (ease of implementation purposes), the josh-sync script simply calls out to system git. This means that the git invocation may be influenced by global (or local) git configuration.

You may observe “Nothing to pull” even if you know rustc-pull has something to pull if your global git config sets fetch.prunetags = true (and possibly other configurations may cause unexpected outcomes).

To minimize the likelihood of this happening, you may wish to keep a separate minimal git config that only has [user] entries from global git config, then repoint system git to use the minimal git config instead. E.g.

GIT_CONFIG_GLOBAL=/path/to/minimal/gitconfig GIT_CONFIG_SYSTEM='' cargo +stable run --manifest-path josh-sync/Cargo.toml -- rustc-pull