two small changes

cwittens · web-flow · commit 689de388347c · 2026-04-06T10:26:10.000-04:00
diff --git a/18.337 2026 hw4.md b/18.337 2026 hw4.md
@@ -82,7 +82,7 @@ C .= A .+ B          # 3. benchmark matadd for different n
 C = A + B            # 4. compare with above — what is the difference in speed and why?
 ```
 
-To benchmark correctly, use `@belapsed CUDA.@sync mul!(C, A, B)` (and equivalently
+To benchmark correctly, load `BenchmarkTools` and use `@belapsed CUDA.@sync mul!(C, A, B)` (and equivalently
 for the other operations). Submit your code and a table of the absolute execution time
 for matmul and matadd, with and without allocations, as a function of matrix size.
 
@@ -116,7 +116,7 @@ element-doubling operation. We assign one thread per element. Fill in the blanks
 
 ```julia
 using KernelAbstractions, CUDA
-backend = KernelAbstractions.get_backend(CUDA.zeros(1))
+backend = CUDABackend()
 elty = Float32
 const NUMTHREADSINBLOCK = 64  # threads per CUDA block