18.337 Homework 2: Performance, Sensitivity Analysis and Automatic Differentiation

Due Thursday March 5, but start early. Your quiz will come from part of this homework.

Overview

This is a getting started with HPC. Do questions A and B

A. Do Operation Counts tell the whole story in the real world? Count the operations for both multiplying and adding two n x n matrices. Express the leading term in the answer as C * n^e. What is C and what is e?

Use @belapsed from Benchmarktools.jl to get the performance of both in function of the input matrix size. Compare the following four operations for small and large matrices, in powers of 2, from 2 to at least 216 (and by preference much larger until you run out of patience or memory). Compare theory with practice. YMMV.

# define n =
eltype = Float32 #specify the data type (the default is Float64)
A = randn(eltype,n,n)
B = randn(eltype,n,n)
C = zeros(eltype,n,n) # (or C = similar(A) also creates a )
mul!(C,A,B) #1. benchmark matmul for different n
C = A * B #2. compare with the line above, what is the difference
#in speed and why?
C .= A .+ B #3. benchmark matadd for different n
C = A + B #4. compare with the line above, what is the difference
#in speed and why?

Remember in Benchmarktools to use $ (Julia macro interpolation) for any data inputs. Please submit your code, and a table of computation times with n in the rows, and the four operations in the columns. Label your units (seconds or mseconds). Tell us what computer you are on

B. Log onto engaging or your own HPC machine and do something and tell us what you did. No rules.

Overview

In this part of the homework you will work through the core machinery underlying differentiable simulation: solving an ODE, computing parameter sensitivities analytically, building a simple forward-mode AD system from scratch, and using all of these pieces together to fit a model to data via gradient descent.

You will need the following packages:

using OrdinaryDiffEq
using Plots

Part 1: Solving the Lotka-Volterra Equations

The Lotka-Volterra (predator-prey) equations are:

$$\frac{du_1}{dt} = \alpha u_1 - \beta u_1 u_2$$ $$\frac{du_2}{dt} = -\delta u_2 + \gamma u_1 u_2$$

where $u_1$ is the prey population, $u_2$ is the predator population, and $p = (\alpha, \beta, \delta, \gamma)$ are the model parameters.

Tasks:

Implement the Lotka-Volterra ODE as a function compatible with the SciML/DiffEq interface, i.e. with signature f(du, u, p, t).
Solve the system on the time span $t \in [0, 10]$ with initial condition $u_0 = [1.0, 1.0]$ and parameters $p = [1.5, 1.0, 3.0, 1.0]$ using the Tsit5() solver from OrdinaryDiffEq.jl.
Plot both $u_1(t)$ and $u_2(t)$ on the same axes and describe qualitatively what you observe about the dynamics.

Hint: An ODEProblem is constructed as:

prob = ODEProblem(f, u0, tspan, p)
sol  = solve(prob, Tsit5())

Part 2: Forward Sensitivity Equations

Rather than using an automatic tool, here you will derive and implement the forward sensitivity equations by hand. The forward sensitivity equations track how the solution $u(t)$ changes with respect to each parameter $p_i$ by augmenting the ODE with additional equations for $s_i(t) = \partial u / \partial p_i$.

Given the ODE $\dot{u} = f(u, p, t)$, differentiating with respect to $p_i$ gives:

$$\frac{d s_i}{dt} = \frac{\partial f}{\partial u} s_i + \frac{\partial f}{\partial p_i}, \qquad s_i(0) = \frac{\partial u_0}{\partial p_i}$$

where $\partial f / \partial u$ is the Jacobian of $f$ with respect to the state $u$.

Tasks:

Derive by hand all four Jacobian blocks needed for the Lotka-Volterra system:
- $\partial f / \partial u$ (a $2 \times 2$ matrix)
- $\partial f / \partial p_i$ for each of the four parameters (each a vector of length 2)
Implement an augmented ODE f_aug(dz, z, p, t) where the state $z$ concatenates $u$ and the four sensitivity vectors $s_1, s_2, s_3, s_4$ (so $z \in \mathbb{R}^{10}$). The initial condition for each $s_i(0)$ is zero (assuming $u_0$ does not depend on $p$).
Solve the augmented system with the same solver and time span as Part 1.
Extract and plot the sensitivity of $u_1$ and $u_2$ with respect to each parameter over time. Comment on which parameters the solution is most sensitive to and when.

Part 3: Forward-Mode Automatic Differentiation via Dual Numbers

In the lecture notes on Forward-Mode AD via High-Dimensional Algebras you saw that forward-mode AD can be implemented by defining a new number type that carries both a primal value and a derivative (the "dual" part), and then overloading arithmetic so that the chain rule propagates automatically.

Tasks:

Define a Dual number struct that holds a value v and a derivative d:
```
struct Dual{T} <: Number
    v::T
    d::T
end
```
Overload the following operations for Dual numbers following the rules derived from the algebra of dual numbers (i.e. $\epsilon^2 = 0$):
- +, -, *, /
- ^ (integer powers are sufficient, but feel free to implement the general case)
- exp, log, sin, cos
- Comparison operators and isless (needed for the ODE solver internals)
- Promotion and conversion rules so that Dual interacts correctly with Float64 and other numeric types
Write a generic function derivative(f, x) that uses your Dual type to compute $f'(x)$ by evaluating f(Dual(x, one(x))) and extracting the dual part.
Verify your implementation on $f(x) = x^3 \sin(x)$ and $f(x) = e^{x^2}$ and check against analytically known derivatives.
Now demonstrate that forward-mode AD with your Dual type reproduces the sensitivity results from Part 2. Specifically: for a fixed time point $t=10$, show that propagating Dual numbers through the ODE solve (by passing Dual-valued parameters into your Part 1 ODE function and solver) recovers $\partial u(t^*) / \partial p_i$ to "within floating point error", matching the augmented ODE result from Part 2.

Note: Getting your Dual type to work through the ODE solver requires that all arithmetic inside the solver be generic. The Tsit5 implementation in OrdinaryDiffEq.jl is written generically and will work if your number type satisfies the necessary interface. Beyond the operations listed above, you will need to overload several additional functions:

Base.muladd: The ODE solver uses the @muladd macro, which rewrites a*b + c into muladd(a, b, c) throughout the Runge-Kutta step computation. You will need muladd methods for all combinations of Dual and Number arguments (7 signatures total). Implementing it as a * b + c is sufficient.
Base.sqrt, Base.abs2, Base.inv, Base.log10: Used in error norm computation, step size control, and initial time step estimation. Implement these using the standard chain rule.
Base.:^(::Dual, ::Dual) and Base.:^(::Dual, ::Float64): The initial step size computation evaluates 10^(-(2 + log10(...))/order), which promotes integers to Dual via your promotion rules and requires a general power rule. Use $\frac{d}{dx}[f^g] = f^g(g' \ln f + g f'/f)$.
Base.zero, Base.one, Base.abs: Construct a Dual with zero derivative for zero/one; for abs, use sign to flip the derivative.
Base.isnan, Base.isinf, Base.isfinite, Base.iszero: Predicates for safety checks; these should operate on the primal value only and return Bool.

The following overloads are solver-internal plumbing (type conversions, step size bookkeeping, etc.) that are not mathematically interesting but are required for the solver to run. All of these follow from the idea that a real number is a dual number with 0 dual part, and that control flow should operate on the dual-valued function exactly the same as the real-valued function (i.e. the primal is preserved for non-differentiable operations).

Dual{T}(x::Dual{T}) where {T} = x
Dual{T}(x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
Dual{T}(x::Number) where {T} = Dual(convert(T, x), zero(T))
Base.convert(::Type{Dual{T}}, x::Dual{T}) where {T} = x
Base.convert(::Type{Dual{T}}, x::Dual) where {T} = Dual(convert(T, x.v), convert(T, x.d))
Base.convert(::Type{Dual{T}}, x::Number) where {T} = Dual(convert(T, x), zero(T))
Base.promote_rule(::Type{Dual{T}}, ::Type{S}) where {T, S<:Number} = Dual{promote_type(T, S)}
Base.convert(::Type{T}, x::Dual) where {T<:AbstractFloat} = convert(T, x.v)
(::Type{T})(x::Dual) where {T<:AbstractFloat} = T(x.v)
Base.oneunit(::Type{Dual{T}}) where {T} = Dual(oneunit(T), zero(T))
Base.oneunit(a::Dual) = oneunit(typeof(a))
Base.eps(::Type{Dual{T}}) where {T} = eps(T)
Base.eps(a::Dual) = eps(a.v)
Base.floatmin(::Type{Dual{T}}) where {T} = floatmin(T)
Base.float(::Type{Dual{T}}) where {T} = Dual{float(T)}
Base.float(x::Dual{T}) where {T} = Dual(float(x.v), float(x.d))
Base.real(a::Dual) = a
Base.nextfloat(a::Dual) = Dual(nextfloat(a.v), a.d)
Base.isless(a::Dual, b::Real) = isless(a.v, b)
Base.isless(a::Real, b::Dual) = isless(a, b.v)
Base.rtoldefault(::Type{Dual{T}}) where {T} = Base.rtoldefault(T)

Part 4: Parameter Estimation via Gradient Descent

Use the solution from Part 1 as your "data". That is, solve the Lotka-Volterra system with the true parameters $p^* = [1.5, 1.0, 3.0, 1.0]$ and save the solution at the same discrete time points your solver outputs. This gives you a reference trajectory with no noise, so a perfect optimizer should be able to recover $p^*$ exactly.

Tasks:

Define a mean-squared-error loss function:

$$L(p) = \frac{1}{N} \sum_{i=1}^{N} |u(t_i; p) - u(t_i; p^*)|^2$$

where $u(t_i; p^*)$ are the reference trajectory values from Part 1 and $u(t_i; p)$ is the ODE solution at time $t_i$ with the current parameter guess $p$.

Compute the gradient $\nabla_p L$ using the forward sensitivity equations from Part 2. For each observation time $t_i$ and each parameter $p_j$, use the augmented ODE solution to obtain $\partial u(t_i; p) / \partial p_j$, then apply the chain rule:

$$\frac{\partial L}{\partial p_j} = \frac{2}{N} \sum_{i=1}^{N} \left(u(t_i; p) - u(t_i; p^*)\right)^T \frac{\partial u(t_i; p)}{\partial p_j}$$

Starting from the perturbed initial guess $p_0 = [1.2, 0.8, 2.8, 0.8]$, implement a gradient descent loop with a fixed learning rate (a learning rate of $\eta = 0.0002$ works; you may need to experiment). Run for at least 1000 iterations and print the loss periodically (e.g. every 50 iterations).
Plot the loss curve over iterations. Plot the final estimated trajectory alongside the reference trajectory and confirm they visually agree. Report the final estimated parameters and verify they are close to $p^*$.
Since the data is exact (no noise), the minimum of $L$ is zero and is achieved at $p = p^*$. Does your gradient descent reach a loss near machine epsilon? If not, explain why not and what you would need to do differently to get closer.

Note: You are not required to use a sophisticated optimizer. Plain gradient descent is expected here. The point is to see how the sensitivity equations provide exact (up to ODE solver tolerance) gradients that enable optimization, and that when the data is generated by the model itself, the true parameters are recoverable in principle.

Submission

Submit a single Julia script (Pluto notebook), or Jupyter notebook (.ipynb) containing all four parts. Each part should include:

Your code
Plots where requested
Written answers to any qualitative questions (as comments or markdown cells)

Make sure your notebook runs end-to-end without errors before submitting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

18.337 Homework 2: Performance, Sensitivity Analysis and Automatic Differentiation

Overview

Overview

Part 1: Solving the Lotka-Volterra Equations

Part 2: Forward Sensitivity Equations

Part 3: Forward-Mode Automatic Differentiation via Dual Numbers

Part 4: Parameter Estimation via Gradient Descent

Submission

FilesExpand file tree

18.337 2026 homework 2.md

Latest commit

History

18.337 2026 homework 2.md

File metadata and controls

18.337 Homework 2: Performance, Sensitivity Analysis and Automatic Differentiation

Overview

Overview

Part 1: Solving the Lotka-Volterra Equations

Part 2: Forward Sensitivity Equations

Part 3: Forward-Mode Automatic Differentiation via Dual Numbers

Part 4: Parameter Estimation via Gradient Descent

Submission