Tidy testing SDK design — drop A/B/C exploration framing

GarrettBeatty · GarrettBeatty · commit d54ec2dd9819 · 2026-06-09T12:07:31.000-04:00
The "Approach A vs B vs C" framing only made sense in the brainstorming
chat where the alternatives had been spelled out earlier. The committed
spec should present the chosen design directly and explain why the
service-client interface is right on its own merits.
diff --git a/Libraries/src/Amazon.Lambda.DurableExecution/docs/design/testing-sdk-design.md b/Libraries/src/Amazon.Lambda.DurableExecution/docs/design/testing-sdk-design.md
@@ -54,9 +54,9 @@ The new testing package depends on:
 - `Amazon.Lambda.TestUtilities` (project reference) — `TestLambdaContext`, `TestLambdaLogger` for the runner's `ILambdaContext` substitute.
 - `Amazon.Lambda.Serialization.SystemTextJson` (package reference) — `DefaultLambdaJsonSerializer` is the fallback when `TestRunnerOptions.Serializer` is null.
 
-### Interception strategy: `IDurableServiceClient` seam (Approach B)
+### Interception strategy: `IDurableServiceClient` seam
 
-The runtime SDK already isolates outbound durable RPCs behind a single class — `LambdaDurableServiceClient`, currently `internal sealed`. Both reference SDKs (Python, JavaScript) chose to inject a service-client interface for testing rather than fake the broader Lambda client; .NET follows the same convergent design.
+The runtime SDK already isolates outbound durable RPCs behind a single class — `LambdaDurableServiceClient`, currently `internal sealed`. We promote that class to implement an `internal IDurableServiceClient` interface and inject a fake implementation from the testing package. The orchestration loop in `DurableFunction.WrapAsync` runs unmodified; only the two outbound RPCs (`CheckpointAsync`, `GetExecutionStateAsync`) are swapped. This keeps the testing-package surface tiny (two methods to fake) and exercises the **real** runtime engine — replay logic, checkpoint batching, termination handling, serializer dispatch — on every test.
 
 Three changes to the runtime package — all `internal`, no public-API impact:
 
@@ -135,13 +135,9 @@ InMemoryDurableServiceClient : IDurableServiceClient
 
 Because the seam is the service client, the orchestration loop drives the **real** runtime engine — every replay-consistency check, every operation-id allocation, every batch-flush boundary that ships in production code is exercised by every test.
 
-### Why not a fake `IAmazonLambda` (Approach A)
+### Why an interface and not a broader fake
 
-`IAmazonLambda` exposes ~88 members; faking it requires ~80 stubs throwing `NotImplementedException` (or subclassing `AmazonLambdaClient` and overriding the 5 durable RPCs plus `InvokeAsync`). Both reference SDKs rejected this surface and converged on a service-client interface instead. The decoupling from AWSSDK request/response shapes pays off when AWSSDK adds a new durable RPC (the interface is a contract we own; the SDK shape is not).
-
-### Why not a standalone orchestrator (Approach C)
-
-Java reimplements the orchestration loop in its testing package. The cost: ~2,500 lines of test-runner code that has to track every behavioral change in the runtime. .NET avoids this by injecting at the service-client boundary and reusing the production engine.
+`IDurableServiceClient` exposes only the two methods the runtime needs to talk to the durable execution service. A test fake implements those two methods; everything else stays in the production engine. This is the same shape both reference SDKs (Python's `DurableServiceClient`, JavaScript's `CheckpointApiClient`) settled on. The decoupling from AWSSDK request/response shapes pays off when AWSSDK adds a new durable RPC: the interface is a contract we own, and the runtime keeps mapping AWSSDK shapes to our own `Operation` / `OperationUpdate` types in one place (`LambdaDurableServiceClient`), unchanged.
 
 ---
 
@@ -484,7 +480,7 @@ Name matching is exact-string, with ARN parsing to extract `:function:NAME[:qual
 
 ### What is *not* reimplemented
 
-`ExecutionState`, `TerminationManager`, `CheckpointBatcher`, `OperationIdGenerator`, the `*Operation` classes, `LambdaSerializerHelper.GetRequired`, every replay-consistency check — all from the runtime package, exercised as-is. That is the value of Approach B.
+`ExecutionState`, `TerminationManager`, `CheckpointBatcher`, `OperationIdGenerator`, the `*Operation` classes, `LambdaSerializerHelper.GetRequired`, every replay-consistency check — all from the runtime package, exercised as-is. That is the value of injecting at the service-client boundary instead of reimplementing the orchestrator.
 
 ---
 
@@ -980,7 +976,7 @@ Coverage:
 - `InvokeAsync` to a registered plain (non-durable) sibling completes.
 - Replay-consistency violations surface `NonDeterministicExecutionException` exactly as production does.
 
-This is the most important layer — it proves Approach B works end-to-end.
+This is the most important layer — it proves the `IDurableServiceClient` injection covers the full runtime surface end-to-end.
 
 ### Layer 3 — snapshot tests of generated handler shape
 
@@ -1088,4 +1084,4 @@ Internal types: `InMemoryDurableServiceClient`, `InMemoryOperationStore`, `Check
 
 ### Estimate
 
-Per the parent design doc: **~1.5 weeks** for full Local + Cloud + RegisterFunction + step inspection. This design doesn't change that estimate — Approach B's reuse of the production engine keeps the testing-package code small (~800–1200 lines, comparable to Python's ~3000 because Python reimplements more checkpoint-validation logic).
+Per the parent design doc: **~1.5 weeks** for full Local + Cloud + RegisterFunction + step inspection. This design doesn't change that estimate — reusing the production engine via the `IDurableServiceClient` seam keeps the testing-package code small (~800–1200 lines).