Auto merge of #144494 - scottmcm:min_bigint_helpers, r=Mark-Simulacrum

Partial-stabilize the basics from `bigint_helper_methods`

Direct link to p-FCP comment: https://github.com/rust-lang/rust/pull/144494#issuecomment-3133172161

After libs-api discussion, this is now the following methods:

- [`uN::carrying_add`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_add): uN + uN + bool -> (uN, bool)
- [`uN::borrowing_sub`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.borrowing_sub): uN + uN + bool -> (uN, bool)
- [`uN::carrying_mul`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_mul): uN * uN + uN -> (uN, uN)
- [`uN::carrying_mul_add`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_mul_add): uN * uN + uN + uN -> (uN, uN)

Specifically, these are the ones that are specifically about working with `uN` as a "digit" (or "limb") where the output, despite being larger than can fit in a single digit, wants to be phrased in terms of those *digits*, not in terms of a wider type.

(This leaves open the possibility of things like `widening_mul: u32 * u32 -> u64` for places where one wants to only think in terms of the *number*s, rather than as carries between multiple digits.  Though of course discussions about how best to phrase such a thing are best for the tracking issue, not for this PR.)

---

**Original PR description**:

A [conversation on IRLO](https://internals.rust-lang.org/t/methods-for-splitting-integers-into-their-halves/23210/7?u=scottmcm) the other day pushed me to write this up 🙂

This PR proposes a partial stabilization of `bigint_helper_methods` (rust-lang/rust#85532), focusing on a basic set that hopefully can be non-controversial.  Specifically:

- [`uN::carrying_add`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_add): uN + uN + bool -> (uN, bool)
- [`uN::widening_mul`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.widening_mul): uN * uN -> (uN, uN)
- [`uN::carrying_mul_add`](https://doc.rust-lang.org/nightly/std/primitive.u64.html#method.carrying_mul_add): uN * uN + uN + uN -> (uN, uN)

Why these?

- We should let people write Rust without needing to be backend experts to know what the magic incantation is to do this.  Even `carrying_add`, which doesn't seem that complicated, actually broke in 1.82 (see rust-lang/rust#133674) so we should just offer something fit-for-purpose rather than making people keep up with whatever the secret sauce is today.  We also get to do things that users cannot, like have the LLVM version emit operations on `i256` in the implementation of `u128::carrying_mul_add` (https://rust.godbolt.org/z/cjG7eKcxd).
- Unsigned only because the behaviour is much clearer than when signed is involved, as everything is just unsigned (vs questions like whether `iN * iN` should give `(uN, iN)`) and carries can only happen in one direction (vs questions about whether the carry from `-128_u8 + -128_u8` should be considered `-1`).
- `carrying_add` is the core [full adder](https://en.wikipedia.org/wiki/Adder_(electronics)#Full_adder) primitive for implementing addition.
- `carrying_mul_add` is the core primitive for [grade school](https://en.wikipedia.org/wiki/Multiplication_algorithm#Long_multiplication) multiplication (see the example in its docs for why both carries are needed).
- `widening_mul` even though it's not strictly needed (its implementation is just `carrying_mul_add(a, b, 0, 0)` right now) as the simplest way for users to get to [cranelift's `umulhi`](https://docs.rs/cranelift/latest/cranelift/prelude/trait.InstBuilder.html#method.umulhi), RISC-V's `MULHU`, Arm's `UMULL`, etc.  (For example, I added an ISLE pattern https://github.com/bytecodealliance/wasmtime/commit/d12e4237de867d58d8c3451d77ee7547e9492550#diff-2041f67049d5ac3d8f62ea91d3cb45cdb8608d5f5cdab988731ae2addf90ef01 so Cranelift can notice what's happening from the fallback, even if the intrinsics aren't overridden specifically.  And on x86 this is one of the simplest possible non-trivial functions <https://rust.godbolt.org/z/4oadWKTc1> because `MUL` puts the results in exactly the registers that the scalar pair result happens to want.)

(I did not const-stabilize them in this PR because [the fallbacks](https://github.com/rust-lang/rust/blob/master/library/core/src/intrinsics/fallback.rs) are using `#[const_trait]` plus there's two [new intrinsic](https://doc.rust-lang.org/nightly/std/intrinsics/fn.disjoint_bitor.html)s involved, so I didn't want to *also* open those cans of worms here.  Given that both intrinsics *have* fallbacks, and thus don't do anything that can't already be expressed in existing Rust, const-stabilizing these should be straight-forward once the underlying machinery is allowed on stable.  But that doesn't need to keep these from being usable at runtime in the mean time.)
diff --git a/.github/workflows/rustc-pull.yml b/.github/workflows/rustc-pull.yml
index 04d6469..5ff3118 100644
--- a/.github/workflows/rustc-pull.yml
+++ b/.github/workflows/rustc-pull.yml
@@ -3,8 +3,8 @@
 on:
   workflow_dispatch:
   schedule:
-    # Run at 04:00 UTC every Monday and Thursday
-    - cron: '0 4 * * 1,4'
+    # Run at 04:00 UTC every Monday
+    - cron: '0 4 * * 1'
 
 jobs:
   pull:
diff --git a/rust-version b/rust-version
index 6ec700b..f412399 100644
--- a/rust-version
+++ b/rust-version
@@ -1 +1 @@
-6bcdcc73bd11568fd85f5a38b58e1eda054ad1cd
+a1dbb443527bd126452875eb5d5860c1d001d761
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
index 025a078..a161273 100644
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@@ -103,6 +103,7 @@
 	- [The `rustdoc-json` test suite](./rustdoc-internals/rustdoc-json-test-suite.md)
 - [GPU offload internals](./offload/internals.md)
     - [Installation](./offload/installation.md)
+    - [Usage](./offload/usage.md)
 - [Autodiff internals](./autodiff/internals.md)
     - [Installation](./autodiff/installation.md)
     - [How to debug](./autodiff/debugging.md)
diff --git a/src/about-this-guide.md b/src/about-this-guide.md
index 057e4a4..f395772 100644
--- a/src/about-this-guide.md
+++ b/src/about-this-guide.md
@@ -74,7 +74,6 @@
   of the team procedures, active working groups, and the team calendar.
 - [std-dev-guide] -- a similar guide for developing the standard library.
 - [The t-compiler zulip][z]
-- `#contribute` and `#wg-rustup` on [Discord](https://discord.gg/rust-lang).
 - The [Rust Internals forum][rif], a place to ask questions and
   discuss Rust's internals
 - The [Rust reference][rr], even though it doesn't specifically talk about
diff --git a/src/autodiff/internals.md b/src/autodiff/internals.md
index c1b31a0..c8e304f 100644
--- a/src/autodiff/internals.md
+++ b/src/autodiff/internals.md
@@ -17,7 +17,7 @@
 
 The detailed documentation for the `std::autodiff` module is available at [std::autodiff](https://doc.rust-lang.org/std/autodiff/index.html).
 
-Differentiable programing is used in various fields like numerical computing, [solid mechanics][ratel], [computational chemistry][molpipx], [fluid dynamics][waterlily] or for Neural Network training via Backpropagation, [ODE solver][diffsol], [differentiable rendering][libigl], [quantum computing][catalyst], and climate simulations.
+Differentiable programming is used in various fields like numerical computing, [solid mechanics][ratel], [computational chemistry][molpipx], [fluid dynamics][waterlily] or for Neural Network training via Backpropagation, [ODE solver][diffsol], [differentiable rendering][libigl], [quantum computing][catalyst], and climate simulations.
 
 [ratel]: https://gitlab.com/micromorph/ratel
 [molpipx]: https://arxiv.org/abs/2411.17011v
diff --git a/src/building/bootstrapping/what-bootstrapping-does.md b/src/building/bootstrapping/what-bootstrapping-does.md
index da425d8..bfd75eb 100644
--- a/src/building/bootstrapping/what-bootstrapping-does.md
+++ b/src/building/bootstrapping/what-bootstrapping-does.md
@@ -23,7 +23,7 @@
 
 ### Overview
 
-- Stage 0: the pre-compiled compiler
+- Stage 0: the pre-compiled compiler and standard library
 - Stage 1: from current code, by an earlier compiler
 - Stage 2: the truly current compiler
 - Stage 3: the same-result test
@@ -192,7 +192,7 @@
   artifacts'). If you're working on the standard library, this is normally the
   test command you want.
 - `./x build --stage 0` means to build with the stage0 `rustc`.
-- `./x doc --stage 0` means to document using the stage0 `rustdoc`.
+- `./x doc --stage 1` means to document using the stage0 `rustdoc`.
 
 #### Examples of what *not* to do
 
@@ -211,7 +211,7 @@
 In short, _stage 0 uses the `stage0` compiler to create `stage0` artifacts which
 will later be uplifted to be the stage1 compiler_.
 
-In each stage, two major steps are performed:
+In each stage besides 0, two major steps are performed:
 
 1. `std` is compiled by the stage N compiler.
 2. That `std` is linked to programs built by the stage N compiler, including the
diff --git a/src/diagnostics/error-codes.md b/src/diagnostics/error-codes.md
index 1b6b87e..1693432 100644
--- a/src/diagnostics/error-codes.md
+++ b/src/diagnostics/error-codes.md
@@ -20,7 +20,7 @@
 the compiler. Rust prides itself on helpful error messages and long-form
 explanations are no exception. However, before error explanations are
 overhauled[^new-explanations] it is a bit open as to how exactly they should be
-written, as always: ask your reviewer or ask around on the Rust Discord or Zulip.
+written, as always: ask your reviewer or ask around on the Rust Zulip.
 
 [^new-explanations]: See the draft RFC [here][new-explanations-rfc].
 
diff --git a/src/getting-started.md b/src/getting-started.md
index 04d2e37..87e26d3 100644
--- a/src/getting-started.md
+++ b/src/getting-started.md
@@ -11,7 +11,6 @@
 chapter on how to build and run the compiler](./building/how-to-build-and-run.md).
 
 [internals]: https://internals.rust-lang.org
-[rust-discord]: http://discord.gg/rust-lang
 [rust-zulip]: https://rust-lang.zulipchat.com
 [coc]: https://www.rust-lang.org/policies/code-of-conduct
 [walkthrough]: ./walkthrough.md
@@ -20,8 +19,7 @@
 ## Asking Questions
 
 If you have questions, please make a post on the [Rust Zulip server][rust-zulip] or
-[internals.rust-lang.org][internals]. If you are contributing to Rustup, be aware they are not on
-Zulip - you can ask questions in `#wg-rustup` [on Discord][rust-discord].
+[internals.rust-lang.org][internals].
 See the [list of teams and working groups][governance] and [the Community page][community] on the
 official website for more resources.
 
@@ -30,19 +28,23 @@
 
 As a reminder, all contributors are expected to follow our [Code of Conduct][coc].
 
-The compiler team (or `t-compiler`) usually hangs out in Zulip [in this
-"stream"][z]; it will be easiest to get questions answered there.
+The compiler team (or `t-compiler`) usually hangs out in Zulip in
+[the #t-compiler channel][z-t-compiler];
+questions about how the compiler works can go in [#t-compiler/help][z-help].
 
-[z]: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler
+[z-t-compiler]: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler
+[z-help]: https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp
 
 **Please ask questions!** A lot of people report feeling that they are "wasting
-expert time", but nobody on `t-compiler` feels this way. Contributors are
+expert's time", but nobody on `t-compiler` feels this way. Contributors are
 important to us.
 
 Also, if you feel comfortable, prefer public topics, as this means others can
 see the questions and answers, and perhaps even integrate them back into this
 guide :)
 
+**Tip**: If you're not a native English speaker and feel unsure about writing, try using a translator to help. But avoid using LLM tools that generate long, complex words. In daily teamwork, **simple and clear words** are best for easy understanding. Even small typos or grammar mistakes can make you seem more human, and people connect better with humans.
+
 ### Experts
 
 Not all `t-compiler` members are experts on all parts of `rustc`; it's a
@@ -162,15 +164,12 @@
 - [Triaging issues][triage]: categorizing, replicating, and minimizing issues is very helpful to the Rust maintainers.
 - [Working groups][wg]: there are a bunch of working groups on a wide variety
   of rust-related things.
-- Answer questions in the _Get Help!_ channels on the [Rust Discord
-  server][rust-discord], on [users.rust-lang.org][users], or on
-  [StackOverflow][so].
+- Answer questions on [users.rust-lang.org][users], or on [Stack Overflow][so].
 - Participate in the [RFC process](https://github.com/rust-lang/rfcs).
 - Find a [requested community library][community-library], build it, and publish
   it to [Crates.io](http://crates.io). Easier said than done, but very, very
   valuable!
 
-[rust-discord]: https://discord.gg/rust-lang
 [users]: https://users.rust-lang.org/
 [so]: http://stackoverflow.com/questions/tagged/rust
 [community-library]: https://github.com/rust-lang/rfcs/labels/A-community-library
diff --git a/src/git.md b/src/git.md
index 447c6fd..8f0511a 100644
--- a/src/git.md
+++ b/src/git.md
@@ -338,13 +338,13 @@
 
 ### Keeping things up to date
 
-The above section on [Rebasing](#rebasing) is a specific
+The [above section](#rebasing) is a specific
 guide on rebasing work and dealing with merge conflicts.
 Here is some general advice about how to keep your local repo
 up-to-date with upstream changes:
 
 Using `git pull upstream master` while on your local master branch regularly
-will keep it up-to-date. You will also want to rebase your feature branches
+will keep it up-to-date. You will also want to keep your feature branches
 up-to-date as well. After pulling, you can checkout the feature branches
 and rebase them:
 
diff --git a/src/img/coverage-branch-counting-01.png b/src/img/coverage-branch-counting-01.png
index c445f35..7c6c845 100644
--- a/src/img/coverage-branch-counting-01.png
+++ b/src/img/coverage-branch-counting-01.png
Binary files differ
diff --git a/src/img/dataflow-graphviz-example.png b/src/img/dataflow-graphviz-example.png
index 718411a..7baa37e 100644
--- a/src/img/dataflow-graphviz-example.png
+++ b/src/img/dataflow-graphviz-example.png
Binary files differ
diff --git a/src/img/github-cli.png b/src/img/github-cli.png
index c3b0e77..88ba95f 100644
--- a/src/img/github-cli.png
+++ b/src/img/github-cli.png
Binary files differ
diff --git a/src/img/github-whitespace-changes.png b/src/img/github-whitespace-changes.png
index 9a19a10..e235a30 100644
--- a/src/img/github-whitespace-changes.png
+++ b/src/img/github-whitespace-changes.png
Binary files differ
diff --git a/src/img/llvm-cov-show-01.png b/src/img/llvm-cov-show-01.png
index 35f0459..ce4dec1 100644
--- a/src/img/llvm-cov-show-01.png
+++ b/src/img/llvm-cov-show-01.png
Binary files differ
diff --git a/src/img/other-peoples-commits.png b/src/img/other-peoples-commits.png
index e4fc2c7..0c949d8 100644
--- a/src/img/other-peoples-commits.png
+++ b/src/img/other-peoples-commits.png
Binary files differ
diff --git a/src/img/rustbot-submodules.png b/src/img/rustbot-submodules.png
index c2e6937..c099fdf 100644
--- a/src/img/rustbot-submodules.png
+++ b/src/img/rustbot-submodules.png
Binary files differ
diff --git a/src/img/submodule-conflicts.png b/src/img/submodule-conflicts.png
index e90a6bb..5d4caf0 100644
--- a/src/img/submodule-conflicts.png
+++ b/src/img/submodule-conflicts.png
Binary files differ
diff --git a/src/img/wpa-initial-memory.png b/src/img/wpa-initial-memory.png
index b602066..177d92c 100644
--- a/src/img/wpa-initial-memory.png
+++ b/src/img/wpa-initial-memory.png
Binary files differ
diff --git a/src/img/wpa-stack.png b/src/img/wpa-stack.png
index 29eb5a5..a4a7135 100644
--- a/src/img/wpa-stack.png
+++ b/src/img/wpa-stack.png
Binary files differ
diff --git a/src/macro-expansion.md b/src/macro-expansion.md
index 54d6d2b..96f12b7 100644
--- a/src/macro-expansion.md
+++ b/src/macro-expansion.md
@@ -517,8 +517,9 @@
   are about to ask the MBE parser to parse. We will consume the raw stream of
   tokens and output a binding of metavariables to corresponding token trees.
   The parsing session can be used to report parser errors.
-- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
-  the token stream against. They're converted from token trees before matching.
+- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match the token stream
+  against. They're converted from the original token trees in the macro's definition before
+  matching.
 
 [`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
 
@@ -544,41 +545,26 @@
 The macro parser does pretty much exactly the same as a normal regex parser
 with one exception: in order to parse different types of metavariables, such as
 `ident`, `block`, `expr`, etc., the macro parser must call back to the normal
-Rust parser. Both the definition and invocation of macros are parsed using
-the parser in a process which is non-intuitively self-referential. 
+Rust parser.
 
-The code to parse macro _definitions_ is in
-[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
-pattern for matching a macro definition as `$( $lhs:tt => $rhs:tt );+`. In
-other words, a `macro_rules` definition should have in its body at least one
-occurrence of a token tree followed by `=>` followed by another token tree.
-When the compiler comes to a `macro_rules` definition, it uses this pattern to
-match the two token trees per the rules of the definition of the macro, _thereby
-utilizing the macro parser itself_. In our example definition, the
-metavariable `$lhs` would match the patterns of both arms: `(print
-$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
-bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
-println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
-needs to expand a macro invocation.
+The code to parse macro definitions is in [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr].
+For more information about the macro parser's implementation, see the comments in
+[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
 
-When the compiler comes to a macro invocation, it parses that invocation using
-a NFA-based macro parser described above. However, the matcher variable
-used is the first token tree (`$lhs`) extracted from the arms of the macro
-_definition_. Using our example, we would try to match the token stream `print
-foo` from the invocation against the matchers `print $mvar:ident` and `print
-twice $mvar:ident` that we previously extracted from the definition. The
-algorithm is exactly the same, but when the macro parser comes to a place in the
-current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
-it calls back to the normal Rust parser to get the contents of that
-non-terminal. In this case, the Rust parser would look for an `ident` token,
-which it finds (`foo`) and returns to the macro parser. Then, the macro parser
-proceeds in parsing as normal. Also, note that exactly one of the matchers from
-the various arms should match the invocation; if there is more than one match,
-the parse is ambiguous, while if there are no matches at all, there is a syntax
+Using our example, we would try to match the token stream `print foo` from the invocation against
+the matchers `print $mvar:ident` and `print twice $mvar:ident` that we previously extracted from the
+rules in the macro definition. When the macro parser comes to a place in the current matcher where
+it needs to match a _non-terminal_ (e.g. `$mvar:ident`), it calls back to the normal Rust parser to
+get the contents of that non-terminal. In this case, the Rust parser would look for an `ident`
+token, which it finds (`foo`) and returns to the macro parser. Then, the macro parser continues
+parsing.
+
+Note that exactly one of the matchers from the various rules should match the invocation; if there is
+more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax
 error.
 
-For more information about the macro parser's implementation, see the comments
-in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
+Assuming exactly one rule matches, macro expansion will then *transcribe* the right-hand side of the
+rule, substituting the values of any matches it captured when matching against the left-hand side.
 
 ## Procedural Macros
 
diff --git a/src/offload/installation.md b/src/offload/installation.md
index b376e96..d1ebf33 100644
--- a/src/offload/installation.md
+++ b/src/offload/installation.md
@@ -1,6 +1,6 @@
 # Installation
 
-In the future, `std::offload` should become available in nightly builds for users. For now, everyone still needs to build rustc from source. 
+`std::offload` is partly available in nightly builds for users. For now, everyone however still needs to build rustc from source to use all features of it. 
 
 ## Build instructions
 
@@ -42,30 +42,3 @@
 ```
 ./x test --stage 1 tests/codegen-llvm/gpu_offload
 ```
-
-## Usage
-It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible.
-```
-/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/stage1/bin/rustc --edition=2024 --crate-type cdylib src/main.rs --emit=llvm-ir  -O -C lto=fat -Cpanic=abort -Zoffload=Enable
-/absolute/path/to/rust/build/x86_64-unknown-linux-gnu/llvm/bin/clang++ -fopenmp --offload-arch=native -g  -O3 main.ll -o main -save-temps
-LIBOMPTARGET_INFO=-1  ./main
-```
-The first step will generate a `main.ll` file, which has enough instructions to cause the offload runtime to move data to and from a gpu.
-The second step will use clang as the compilation driver to compile our IR file down to a working binary. Only a very small Rust subset will work out of the box here, unless
-you use features like build-std, which are not covered by this guide. Look at the codegen test to get a feeling for how to write a working example.
-In the last step you can run your binary, if all went well you will see a data transfer being reported:
-```
-omptarget device 0 info: Entering OpenMP data region with being_mapper at unknown:0:0 with 1 arguments:
-omptarget device 0 info: tofrom(unknown)[1024]
-omptarget device 0 info: Creating new map entry with HstPtrBase=0x00007fffffff9540, HstPtrBegin=0x00007fffffff9540, TgtAllocBegin=0x0000155547200000, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=1, HoldRefCount=0, Name=unknown
-omptarget device 0 info: Copying data from host to device, HstPtr=0x00007fffffff9540, TgtPtr=0x0000155547200000, Size=1024, Name=unknown
-omptarget device 0 info: OpenMP Host-Device pointer mappings after block at unknown:0:0:
-omptarget device 0 info: Host Ptr           Target Ptr         Size (B) DynRefCount HoldRefCount Declaration
-omptarget device 0 info: 0x00007fffffff9540 0x0000155547200000 1024     1           0            unknown at unknown:0:0
-// some other output
-omptarget device 0 info: Exiting OpenMP data region with end_mapper at unknown:0:0 with 1 arguments:
-omptarget device 0 info: tofrom(unknown)[1024]
-omptarget device 0 info: Mapping exists with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, DynRefCount=0 (decremented, delayed deletion), HoldRefCount=0
-omptarget device 0 info: Copying data from device to host, TgtPtr=0x0000155547200000, HstPtr=0x00007fffffff9540, Size=1024, Name=unknown
-omptarget device 0 info: Removing map entry with HstPtrBegin=0x00007fffffff9540, TgtPtrBegin=0x0000155547200000, Size=1024, Name=unknown
-```
diff --git a/src/offload/usage.md b/src/offload/usage.md
new file mode 100644
index 0000000..9f51998
--- /dev/null
+++ b/src/offload/usage.md
@@ -0,0 +1,112 @@
+# Usage
+
+This feature is work-in-progress, and not ready for usage. The instructions here are for contributors, or people interested in following the latest progress.
+We currently work on launching the following Rust kernel on the GPU. To follow along, copy it to a `src/lib.rs` file.
+
+```rust
+#![feature(abi_gpu_kernel)]
+#![no_std]
+
+#[cfg(target_os = "linux")]
+extern crate libc;
+#[cfg(target_os = "linux")]
+use libc::c_char;
+
+use core::mem;
+
+#[panic_handler]
+fn panic(_: &core::panic::PanicInfo) -> ! {
+    loop {}
+}
+
+#[cfg(target_os = "linux")]
+#[unsafe(no_mangle)]
+#[inline(never)]
+fn main() {
+    let array_c: *mut [f64; 256] =
+        unsafe { libc::calloc(256, (mem::size_of::<f64>()) as libc::size_t) as *mut [f64; 256] };
+    let output = c"The first element is zero %f\n";
+    let output2 = c"The first element is NOT zero %f\n";
+    let output3 = c"The second element is %f\n";
+    unsafe {
+        let val: *const c_char = if (*array_c)[0] < 0.1 {
+            output.as_ptr()
+        } else {
+            output2.as_ptr()
+        };
+        libc::printf(val, (*array_c)[0]);
+    }
+
+    unsafe {
+        kernel_1(array_c);
+    }
+    core::hint::black_box(&array_c);
+    unsafe {
+        let val: *const c_char = if (*array_c)[0] < 0.1 {
+            output.as_ptr()
+        } else {
+            output2.as_ptr()
+        };
+        libc::printf(val, (*array_c)[0]);
+        libc::printf(output3.as_ptr(), (*array_c)[1]);
+    }
+}
+
+#[cfg(target_os = "linux")]
+unsafe extern "C" {
+    pub fn kernel_1(array_b: *mut [f64; 256]);
+}
+```
+
+## Compile instructions
+It is important to use a clang compiler build on the same llvm as rustc. Just calling clang without the full path will likely use your system clang, which probably will be incompatible. So either substitute clang/lld invocations below with absolute path, or set your `PATH` accordingly.
+
+First we generate the host (cpu) code. The first build is just to compile libc, take note of the hashed path. Then we call rustc directly to build our host code, while providing the libc artifact to rustc.
+```
+cargo +offload build -r -v
+rustc +offload --edition 2024 src/lib.rs -g --crate-type cdylib -C opt-level=3 -C panic=abort -C lto=fat -L dependency=/absolute_path_to/target/release/deps --extern libc=/absolute_path_to/target/release/deps/liblibc-<HASH>.rlib --emit=llvm-bc,llvm-ir  -Zoffload=Enable -Zunstable-options
+```
+
+Now we generate the device code. Replace the target-cpu with the right code for your gpu.
+```
+RUSTFLAGS="-Ctarget-cpu=gfx90a --emit=llvm-bc,llvm-ir" cargo +offload build -Zunstable-options -r -v --target amdgcn-amd-amdhsa -Zbuild-std=core
+```
+
+Now find the <libname>.ll under target/amdgcn-amd-amdhsa folder and copy it to a device.ll file (or adjust the file names below).
+If you work on an NVIDIA or Intel gpu, please adjust the names acordingly and open an issue to share your results (either if you succeed or fail).
+First we compile our .ll files (good for manual inspections) to .bc files and clean up leftover artifacts. The cleanup is important, otherwise caching might interfere on following runs.
+```
+opt lib.ll -o lib.bc
+opt device.ll -o device.bc
+rm *.o
+rm bare.amdgcn.gfx90a.img*
+```
+
+```
+clang-offload-packager" "-o" "host.out" "--image=file=device.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
+
+clang-21" "-cc1" "-triple" "x86_64-unknown-linux-gnu" "-S" "-save-temps=cwd" "-disable-free" "-clear-ast-before-backend" "-main-file-name" "lib.rs" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-funwind-tables=2" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-resource-dir" "/<ABSOLUTE_PATH_TO>/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21" "-ferror-limit" "19" "-fopenmp" "-fopenmp-offload-mandatory" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf" "-fembed-offload-object=host.out" "-fopenmp-targets=amdgcn-amd-amdhsa" "-faddrsig" "-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o" "host.s" "-x" "ir" "lib.bc"
+
+clang-21" "-cc1as" "-triple" "x86_64-unknown-linux-gnu" "-filetype" "obj" "-main-file-name" "lib.rs" "-target-cpu" "x86-64" "-mrelocation-model" "pic" "-o" "host.o" "host.s"
+
+clang-linker-wrapper" "--should-extract=gfx90a" "--device-compiler=amdgcn-amd-amdhsa=-g" "--device-compiler=amdgcn-amd-amdhsa=-save-temps=cwd" "--device-linker=amdgcn-amd-amdhsa=-lompdevice" "--host-triple=x86_64-unknown-linux-gnu" "--save-temps" "--linker-path=/ABSOlUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/lld/bin/ld.lld" "--hash-style=gnu" "--eh-frame-hdr" "-m" "elf_x86_64" "-pie" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "bare" "/lib/../lib64/Scrt1.o" "/lib/../lib64/crti.o" "/ABSOLUTE_PATH_TO/crtbeginS.o" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/bin/../lib/x86_64-unknown-linux-gnu" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21/lib/x86_64-unknown-linux-gnu" "-L/lib/../lib64" "-L/usr/lib64" "-L/lib" "-L/usr/lib" "host.o" "-lstdc++" "-lm" "-lomp" "-lomptarget" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib" "-lgcc_s" "-lgcc" "-lpthread" "-lc" "-lgcc_s" "-lgcc" "/ABSOLUTE_PATH_TO/crtendS.o" "/lib/../lib64/crtn.o"
+```
+
+Especially for the last command I recommend to not fix the paths, but rather just re-generate them by copying a bare-mode openmp example and compiling it with your clang. By adding `-###` to your clang invocation, you can see the invidual steps.
+```
+myclang++ -fuse-ld=lld -O3 -fopenmp  -fopenmp-offload-mandatory --offload-arch=gfx90a omp_bare.cpp -o main -###
+```
+
+In the final step, you can now run your binary
+
+```
+./main
+The first element is zero 0.000000
+The first element is NOT zero 21.000000
+The second element is  0.000000
+```
+
+To receive more information about the memory transfer, you can enable info printing with
+```
+LIBOMPTARGET_INFO=-1  ./main
+```
diff --git a/src/queries/example-0.png b/src/queries/example-0.png
index 14b46c4..dd67d5f 100644
--- a/src/queries/example-0.png
+++ b/src/queries/example-0.png
Binary files differ
diff --git a/src/solve/candidate-preference.md b/src/solve/candidate-preference.md
index 8960529..8b28f56 100644
--- a/src/solve/candidate-preference.md
+++ b/src/solve/candidate-preference.md
@@ -95,7 +95,7 @@
 ```
 
 This preference causes a lot of issues. See [#24066]. Most of the
-issues are caused by prefering where-bounds over impls even if the where-bound guides type inference:
+issues are caused by preferring where-bounds over impls even if the where-bound guides type inference:
 ```rust
 trait Trait<T> {
     fn call_me(&self, x: T) {}
diff --git a/src/tests/ci.md b/src/tests/ci.md
index 750e4fa..a8cc959 100644
--- a/src/tests/ci.md
+++ b/src/tests/ci.md
@@ -84,16 +84,15 @@
 > Thus, it is a good idea to run `./x doc xxx` locally for any doc comment
 > changes to help catch these early.
 
-PR jobs are defined in the `pr` section of [`jobs.yml`]. They run under the
-`rust-lang/rust` repository, and their results can be observed directly on the
-PR, in the "CI checks" section at the bottom of the PR page.
+PR jobs are defined in the `pr` section of [`jobs.yml`]. Their results can be observed
+directly on the PR, in the "CI checks" section at the bottom of the PR page.
 
 ### Auto builds
 
 Before a commit can be merged into the `master` branch, it needs to pass our
 complete test suite. We call this an `auto` build. This build runs tens of CI
 jobs that exercise various tests across operating systems and targets. The full
-test suite is quite slow; it can take two hours or more until all the `auto` CI
+test suite is quite slow; it can take several hours until all the `auto` CI
 jobs finish.
 
 Most platforms only run the build steps, some run a restricted set of tests,
@@ -136,14 +135,21 @@
 [`jobs.yml`] will be executed. We call this mode a "fast try build". Such a try build
 will not execute any tests, and it will allow compilation warnings. It is useful when you want to
 get an optimized toolchain as fast as possible, for a crater run or performance benchmarks,
-even if it might not be working fully correctly.
+even if it might not be working fully correctly. If you want to do a full build for the default try job,
+specify its job name in a job pattern (explained below).
 
-If you want to run a custom CI job in a try build and make sure that it passes all tests and does
-not produce any compilation warnings, you can select CI jobs to be executed by adding lines
-containing `try-job: <job pattern>` to the PR description. All such specified jobs will be executed
-in the try build once the `@bors try` command is used on the PR.
+If you want to run custom CI job(s) in a try build and make sure that they pass all tests and do
+not produce any compilation warnings, you can select CI jobs to be executed by specifying a *job pattern*,
+which can be used in one of two ways:
+- You can add a set of `try-job: <job pattern>` directives to the PR description (described below) and then
+  simply run `@bors try`. CI will read these directives and run the jobs that you have specified. This is
+  useful if you want to rerun the same set of try jobs multiple times, after incrementally modifying a PR.
+- You can specify the job pattern using the `jobs` parameter of the try command: `@bors try jobs=<job pattern>`.
+  This is useful for one-off try builds with specific jobs. Note that the `jobs` parameter has a higher priority
+  than the PR description directives.
+  - There can also be multiple patterns specified, e.g. `@bors try jobs=job1,job2,job3`.
 
-Each pattern can either be an exact name of a job or a glob pattern that matches multiple jobs,
+Each job pattern can either be an exact name of a job or a glob pattern that matches multiple jobs,
 for example `*msvc*` or `*-alt`. You can start at most 20 jobs in a single try build. When using
 glob patterns, you might want to wrap them in backticks (`` ` ``) to avoid GitHub rendering
 the pattern as Markdown.
@@ -182,26 +188,19 @@
 > However, it can be less flexible because you cannot adjust the set of tests
 > that are exercised this way.
 
-Try jobs are defined in the `try` section of [`jobs.yml`]. They are executed on
-the `try` branch under the `rust-lang/rust` repository and
+Try builds are executed on the `try` branch under the `rust-lang/rust` repository and
 their results can be seen [here](https://github.com/rust-lang/rust/actions),
 although usually you will be notified of the result by a comment made by bors on
 the corresponding PR.
 
-Note that if you start the default try job using `@bors try`, it will skip building several `dist` components and running post-optimization tests, to make the build duration shorter. If you want to execute the full build as it would happen before a merge, add an explicit `try-job` pattern with the name of the default try job (currently `dist-x86_64-linux`).
+Multiple try builds can execute concurrently across different PRs, but there can be at most
+a single try build running on a single PR at any given time.
 
-Multiple try builds can execute concurrently across different PRs.
-
-<div class="warning">
-
-Bors identifies try jobs by commit hash. This means that if you have two PRs
-containing the same (latest) commits, running `@bors try` will result in the
-*same* try job and it really confuses `bors`. Please refrain from doing so.
-
-</div>
+Note that try builds are handled using the new [bors][new-bors] implementation.
 
 [rustc-perf]: https://github.com/rust-lang/rustc-perf
 [crater]: https://github.com/rust-lang/crater
+[new-bors]: https://github.com/rust-lang/bors
 
 ### Modifying CI jobs
 
diff --git a/src/tests/directives.md b/src/tests/directives.md
index f4ba9a0..fbbeb7e 100644
--- a/src/tests/directives.md
+++ b/src/tests/directives.md
@@ -111,6 +111,7 @@
 | `forbid-output`                   | A pattern which must not appear in stderr/`cfail` output                                                                 | `ui`, `incremental`                          | Regex pattern                                                                           |
 | `run-flags`                       | Flags passed to the test executable                                                                                      | `ui`                                         | Arbitrary flags                                                                         |
 | `known-bug`                       | No error annotation needed due to known bug                                                                              | `ui`, `crashes`, `incremental`               | Issue number `#123456`                                                                  |
+| `compare-output-by-lines`         | Compare the output by lines, rather than as a single string                                                              | All                                          | N/A                                                                                     |
 
 [^check_stdout]: presently <!-- date-check: Oct 2024 --> this has a weird quirk
     where the test binary's stdout and stderr gets concatenated and then
diff --git a/src/tests/ecosystem-test-jobs/fuchsia.md b/src/tests/ecosystem-test-jobs/fuchsia.md
index b19d94d..75cf782 100644
--- a/src/tests/ecosystem-test-jobs/fuchsia.md
+++ b/src/tests/ecosystem-test-jobs/fuchsia.md
@@ -18,12 +18,8 @@
 is merged.
 
 If you are worried that a pull request might break the Fuchsia builder and want
-to test it out before submitting it to the bors queue, simply add this line to
-your PR description:
-
-> try-job: x86_64-fuchsia
-
-Then when you `@bors try` it will pick the job that builds Fuchsia.
+to test it out before submitting it to the bors queue, simply ask bors to run
+the try job that builds the Fuchsia integration: `@bors try jobs=x86_64-fuchsia`.
 
 ## Building Fuchsia locally
 
diff --git a/src/tests/ecosystem-test-jobs/rust-for-linux.md b/src/tests/ecosystem-test-jobs/rust-for-linux.md
index d549ec6..a6a7374 100644
--- a/src/tests/ecosystem-test-jobs/rust-for-linux.md
+++ b/src/tests/ecosystem-test-jobs/rust-for-linux.md
@@ -40,12 +40,8 @@
 this workflow notifies us if a given compiler change would break it.
 
 If you are worried that a pull request might break the Rust for Linux builder
-and want to test it out before submitting it to the bors queue, simply add this
-line to your PR description:
-
-> try-job: x86_64-rust-for-linux
-
-Then when you `@bors try` it will pick the job that builds the Rust for Linux
-integration.
+and want to test it out before submitting it to the bors queue, simply ask
+bors to run the try job that builds the Rust for Linux integration:
+`@bors try jobs=x86_64-rust-for-linux`.
 
 [rfl-ping]: ../../notification-groups/rust-for-linux.md
diff --git a/src/tests/ui.md b/src/tests/ui.md
index 25dd581..d3a2c40 100644
--- a/src/tests/ui.md
+++ b/src/tests/ui.md
@@ -95,6 +95,7 @@
   [Normalization](#normalization)).
 - `dont-check-compiler-stderr` — Ignores stderr from the compiler.
 - `dont-check-compiler-stdout` — Ignores stdout from the compiler.
+- `compare-output-by-lines` — Some tests have non-deterministic orders of output, so we need to compare by lines.
 
 UI tests run with `-Zdeduplicate-diagnostics=no` flag which disables rustc's
 built-in diagnostic deduplication mechanism. This means you may see some