| # Usage |
| |
| This feature is work-in-progress, and not ready for usage. |
| The instructions here are for contributors, or people interested in following the latest progress. |
| We currently work on launching the following Rust kernel on the GPU. |
| To follow along, copy it to a `src/lib.rs` file. |
| |
| ```rust |
| #![allow(internal_features)] |
| #![feature(gpu_offload)] |
| #![cfg_attr(target_os = "linux", feature(core_intrinsics))] |
| #![cfg_attr(target_arch = "amdgpu", feature(stdarch_amdgpu, abi_gpu_kernel))] |
| #![cfg_attr(target_arch = "nvptx64", feature(stdarch_nvptx, abi_gpu_kernel))] |
| #![no_std] |
| |
| #[cfg(target_os = "linux")] |
| extern crate libc; |
| |
| use core::offload::offload_kernel; |
| |
| #[panic_handler] |
| fn panic(_: &core::panic::PanicInfo) -> ! { |
| loop {} |
| } |
| |
| #[cfg(target_arch = "amdgpu")] |
| use core::arch::amdgpu::{workgroup_id_x as block_idx_x, workitem_id_x as thread_idx_x}; |
| #[cfg(target_arch = "nvptx64")] |
| use core::arch::nvptx::{ |
| _block_dim_x as block_dim_x, _block_idx_x as block_idx_x, _thread_idx_x as thread_idx_x, |
| }; |
| |
| #[offload_kernel] |
| fn kernel(x: *mut [f64; 256]) { |
| unsafe { |
| let n = (*x).len(); |
| let i = (thread_idx_x() + block_idx_x() * block_dim_x()) as usize; |
| if i < n { |
| (*x)[i] = i as f64; |
| } |
| } |
| } |
| |
| #[cfg(target_os = "linux")] |
| #[unsafe(no_mangle)] |
| fn main() { |
| let mut x = [0.0f64; 256]; |
| core::intrinsics::offload::<_, _, ()>(kernel, [256, 1, 1], [1, 1, 1], (&mut x as *mut [f64; 256],)); |
| for i in 0..x.len() { |
| assert_eq!(x[i], i as f64); |
| } |
| unsafe { libc::printf(c"all checks passed".as_ptr()); } |
| } |
| ``` |
| |
| ## Compile instructions |
| It is important to use a clang compiler build on the same LLVM as rustc. |
| Just calling clang without the full path will likely use your system clang, which probably will be incompatible. |
| So either substitute clang/lld invocations below with absolute path, or set your `PATH` accordingly. |
| |
| First we generate the device (GPU) code. |
| |
| <div class="warning"> |
| |
| Replace the `target-cpu` (gfx90a) with the right code for your GPU. These are often referred to as "LLVM target names"[^list]. |
| |
| </div> |
| |
| ``` |
| RUSTFLAGS="-Ctarget-cpu=gfx90a --emit=llvm-bc,llvm-ir -Zoffload=Device -Csave-temps -Zunstable-options" cargo +offload build -Zunstable-options -r -v --target amdgcn-amd-amdhsa -Zbuild-std=core |
| ``` |
| You might afterwards need to copy your target/release/deps/<lib_name>.bc to lib.bc for now, before the next step. |
| |
| Now we generate the host (CPU) code. |
| ``` |
| RUSTFLAGS="--emit=llvm-bc,llvm-ir -Csave-temps -Zoffload=Host=/p/lustre1/drehwald1/prog/offload/r/target/amdgcn-amd-amdhsa/release/deps/device.bin -Zunstable-options" cargo +offload build -r |
| ``` |
| This call also does a lot of work and generates multiple intermediate files for LLVM offload. |
| While we integrated most offload steps into rustc by now, one binary invocation still remains for now: |
| |
| ``` |
| "clang-linker-wrapper" "--should-extract=gfx90a" "--device-compiler=amdgcn-amd-amdhsa=-g" "--device-compiler=amdgcn-amd-amdhsa=-save-temps=cwd" "--device-linker=amdgcn-amd-amdhsa=-lompdevice" "--host-triple=x86_64-unknown-linux-gnu" "--save-temps" "--linker-path=/ABSOlUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/lld/bin/ld.lld" "--hash-style=gnu" "--eh-frame-hdr" "-m" "elf_x86_64" "-pie" "-dynamic-linker" "/lib64/ld-linux-x86-64.so.2" "-o" "main" "/lib/../lib64/Scrt1.o" "/lib/../lib64/crti.o" "/ABSOLUTE_PATH_TO/crtbeginS.o" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/bin/../lib/x86_64-unknown-linux-gnu" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib/clang/21/lib/x86_64-unknown-linux-gnu" "-L/lib/../lib64" "-L/usr/lib64" "-L/lib" "-L/usr/lib" "target/<GPU_DIR>/release/host.o" "-lstdc++" "-lm" "-lomp" "-lomptarget" "-L/ABSOLUTE_PATH_TO/rust/build/x86_64-unknown-linux-gnu/llvm/lib" "-lgcc_s" "-lgcc" "-lpthread" "-lc" "-lgcc_s" "-lgcc" "/ABSOLUTE_PATH_TO/crtendS.o" "/lib/../lib64/crtn.o" |
| ``` |
| |
| You can try to find the paths to those files on your system. |
| However, I recommend to not fix the paths, but rather just re-generate them by copying a bare-mode OpenMP example and compiling it with your clang. |
| By adding `-###` to your clang invocation, you can see the invidual steps. |
| It will show multiple steps, just look for the clang-linker-wrapper example. |
| Make sure to still include the path to the `host.o` file, and not whatever tmp file you got when compiling your c++ example with the following call. |
| ``` |
| myclang++ -fuse-ld=lld -O3 -fopenmp -fopenmp-offload-mandatory --offload-arch=gfx90a omp_bare.cpp -o main -### |
| ``` |
| |
| In the final step, you can now run your binary |
| |
| ``` |
| ./main |
| all checks passed! |
| ``` |
| |
| To receive more information about the memory transfer, you can enable info printing with |
| ``` |
| LIBOMPTARGET_INFO=-1 ./main |
| ``` |
| |
| [^list]: https://rocm.docs.amd.com/en/latest/reference/gpu-arch-specs.html or https://developer.nvidia.com/cuda/gpus. Alternatively, check `rustc --print target-cpus`. |