Due Thursday March 5, but start early. Your quiz will come from part of this homework.
This is a getting started with HPC. Do questions A and B
A. Do Operation Counts tell the whole story in the real world? Count the operations for both multiplying and adding two n x n matrices. Express the leading term in the answer as C * n^e. What is C and what is e?
Use @belapsed from Benchmarktools.jl to get the performance of both in function
of the input matrix size. Compare the following four operations for small and large
matrices, in powers of 2, from 2 to at least 216 (and by preference much larger until
you run out of patience or memory). Compare theory with practice. YMMV.
# define n =
eltype = Float32 #specify the data type (the default is Float64)
A = randn(eltype,n,n)
B = randn(eltype,n,n)
C = zeros(eltype,n,n) # (or C = similar(A) also creates a )
mul!(C,A,B) #1. benchmark matmul for different n
C = A * B #2. compare with the line above, what is the difference
#in speed and why?
C .= A .+ B #3. benchmark matadd for different n
C = A + B #4. compare with the line above, what is the difference
#in speed and why?Remember in Benchmarktools to use $ (Julia macro interpolation) for any data inputs. Please submit your code, and a table of computation times with n in the rows, and the four operations in the columns. Label your units (seconds or mseconds). Tell us what computer you are on
B. Log onto engaging or your own HPC machine and do something and tell us what you did. No rules.
In this part of the homework you will work through the core machinery underlying differentiable simulation: solving an ODE, computing parameter sensitivities analytically, building a simple forward-mode AD system from scratch, and using all of these pieces together to fit a model to data via gradient descent.
You will need the following packages:
using OrdinaryDiffEq
using PlotsThe Lotka-Volterra (predator-prey) equations are:
where
Tasks:
-
Implement the Lotka-Volterra ODE as a function compatible with the SciML/DiffEq interface, i.e. with signature
f(du, u, p, t). -
Solve the system on the time span
$t \in [0, 10]$ with initial condition$u_0 = [1.0, 1.0]$ and parameters$p = [1.5, 1.0, 3.0, 1.0]$ using theTsit5()solver fromOrdinaryDiffEq.jl. -
Plot both
$u_1(t)$ and$u_2(t)$ on the same axes and describe qualitatively what you observe about the dynamics.
Hint: An ODEProblem is constructed as:
prob = ODEProblem(f, u0, tspan, p)
sol = solve(prob, Tsit5())Rather than using an automatic tool, here you will derive and implement the forward sensitivity equations by hand. The forward sensitivity equations track how the solution
Given the ODE
where
Tasks:
-
Derive by hand all four Jacobian blocks needed for the Lotka-Volterra system:
-
$\partial f / \partial u$ (a$2 \times 2$ matrix) -
$\partial f / \partial p_i$ for each of the four parameters (each a vector of length 2)
-
-
Implement an augmented ODE
f_aug(dz, z, p, t)where the state$z$ concatenates$u$ and the four sensitivity vectors$s_1, s_2, s_3, s_4$ (so$z \in \mathbb{R}^{10}$ ). The initial condition for each$s_i(0)$ is zero (assuming$u_0$ does not depend on$p$ ). -
Solve the augmented system with the same solver and time span as Part 1.
-
Extract and plot the sensitivity of
$u_1$ and$u_2$ with respect to each parameter over time. Comment on which parameters the solution is most sensitive to and when.
In the lecture notes on Forward-Mode AD via High-Dimensional Algebras you saw that forward-mode AD can be implemented by defining a new number type that carries both a primal value and a derivative (the "dual" part), and then overloading arithmetic so that the chain rule propagates automatically.
Tasks:
-
Define a
Dualnumber struct that holds a valuevand a derivatived:struct Dual{T} <: Number v::T d::T end
-
Overload the following operations for
Dualnumbers following the rules derived from the algebra of dual numbers (i.e.$\epsilon^2 = 0$ ):-
+,-,*,/ -
^(integer powers are sufficient, but feel free to implement the general case) -
exp,log,sin,cos - Comparison operators and
isless(needed for the ODE solver internals) - Promotion and conversion rules so that
Dualinteracts correctly withFloat64and other numeric types
-
-
Write a generic function
derivative(f, x)that uses yourDualtype to compute$f'(x)$ by evaluatingf(Dual(x, one(x)))and extracting the dual part. -
Verify your implementation on
$f(x) = x^3 \sin(x)$ and$f(x) = e^{x^2}$ and check against analytically known derivatives. -
Now demonstrate that forward-mode AD with your
Dualtype reproduces the sensitivity results from Part 2. Specifically: for a fixed time point$t=10$ , show that propagatingDualnumbers through the ODE solve (by passingDual-valued parameters into your Part 1 ODE function and solver) recovers$\partial u(t^*) / \partial p_i$ to "within floating point error", matching the augmented ODE result from Part 2.
Note: Getting your Dual type to work through the ODE solver requires that all arithmetic inside the solver be generic. The Tsit5 implementation in OrdinaryDiffEq.jl is written generically and will work if your number type satisfies the necessary interface. Beyond the operations listed above, you will need to overload several additional functions:
-
Base.muladd: The ODE solver uses the@muladdmacro, which rewritesa*b + cintomuladd(a, b, c)throughout the Runge-Kutta step computation. You will needmuladdmethods for all combinations ofDualandNumberarguments (7 signatures total). Implementing it asa * b + cis sufficient. -
Base.sqrt,Base.abs2,Base.inv,Base.log10: Used in error norm computation, step size control, and initial time step estimation. Implement these using the standard chain rule. -
Base.:^(::Dual, ::Dual)andBase.:^(::Dual, ::Float64): The initial step size computation evaluates10^(-(2 + log10(...))/order), which promotes integers toDualvia your promotion rules and requires a general power rule. Use$\frac{d}{dx}[f^g] = f^g(g' \ln f + g f'/f)$ . -
Base.zero,Base.one,Base.abs: Construct aDualwith zero derivative forzero/one; forabs, usesignto flip the derivative. -
Base.isnan,Base.isinf,Base.isfinite,Base.iszero: Predicates for safety checks; these should operate on the primal value only and returnBool.
The following overloads are solver-internal plumbing (type conversions, step size bookkeeping, etc.) that are not mathematically interesting but are required for the solver to run. All of these follow from the idea that a real number is a dual number with 0 dual part, and that control flow should operate on the dual-valued function exactly the same as the real-valued function (i.e. the primal is preserved for non-differentiable operations).
Dual{T}(x::Dual{T}) where {T} = x
Dual{T}(x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
Dual{T}(x::Number) where {T} = Dual(convert(T, x), zero(T))
Base.convert(::Type{Dual{T}}, x::Dual{T}) where {T} = x
Base.convert(::Type{Dual{T}}, x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
Base.convert(::Type{Dual{T}}, x::Number) where {T} = Dual(convert(T, x), zero(T))
Base.promote_rule(::Type{Dual{T}}, ::Type{S}) where {T, S<:Number} = Dual{promote_type(T, S)}
Base.convert(::Type{T}, x::Dual) where {T<:AbstractFloat} = convert(T, x.v)
(::Type{T})(x::Dual) where {T<:AbstractFloat} = T(x.v)
Base.oneunit(::Type{Dual{T}}) where {T} = Dual(oneunit(T), zero(T))
Base.oneunit(a::Dual) = oneunit(typeof(a))
Base.eps(::Type{Dual{T}}) where {T} = eps(T)
Base.eps(a::Dual) = eps(a.v)
Base.floatmin(::Type{Dual{T}}) where {T} = floatmin(T)
Base.float(::Type{Dual{T}}) where {T} = Dual{float(T)}
Base.float(x::Dual{T}) where {T} = Dual(float(x.v), float(x.d))
Base.real(a::Dual) = a
Base.nextfloat(a::Dual) = Dual(nextfloat(a.v), a.d)
Base.isless(a::Dual, b::Real) = isless(a.v, b)
Base.isless(a::Real, b::Dual) = isless(a, b.v)
Base.rtoldefault(::Type{Dual{T}}) where {T} = Base.rtoldefault(T)Use the solution from Part 1 as your "data". That is, solve the Lotka-Volterra system with the true parameters
Tasks:
- Define a mean-squared-error loss function:
where
- Compute the gradient
$\nabla_p L$ using the forward sensitivity equations from Part 2. For each observation time$t_i$ and each parameter$p_j$ , use the augmented ODE solution to obtain$\partial u(t_i; p) / \partial p_j$ , then apply the chain rule:
-
Starting from the perturbed initial guess
$p_0 = [1.2, 0.8, 2.8, 0.8]$ , implement a gradient descent loop with a fixed learning rate (a learning rate of$\eta = 0.0002$ works; you may need to experiment). Run for at least 1000 iterations and print the loss periodically (e.g. every 50 iterations). -
Plot the loss curve over iterations. Plot the final estimated trajectory alongside the reference trajectory and confirm they visually agree. Report the final estimated parameters and verify they are close to
$p^*$ . -
Since the data is exact (no noise), the minimum of
$L$ is zero and is achieved at$p = p^*$ . Does your gradient descent reach a loss near machine epsilon? If not, explain why not and what you would need to do differently to get closer.
Note: You are not required to use a sophisticated optimizer. Plain gradient descent is expected here. The point is to see how the sensitivity equations provide exact (up to ODE solver tolerance) gradients that enable optimization, and that when the data is generated by the model itself, the true parameters are recoverable in principle.
Submit a single Julia script (Pluto notebook), or Jupyter notebook (.ipynb) containing all four parts. Each part should include:
- Your code
- Plots where requested
- Written answers to any qualitative questions (as comments or markdown cells)
Make sure your notebook runs end-to-end without errors before submitting.