Skip to content

bugfix: Workflow overhaul, new feature bugfix#1818

Merged
WhitWaldo merged 20 commits into
masterfrom
workflow-propagation-fix
May 18, 2026
Merged

bugfix: Workflow overhaul, new feature bugfix#1818
WhitWaldo merged 20 commits into
masterfrom
workflow-propagation-fix

Conversation

@WhitWaldo

Copy link
Copy Markdown
Contributor

Description

Fix for the workflow propagation integration tests not working as expected.

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #[issue number]

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • Extended the documentation

…ected

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
@codecov

codecov Bot commented May 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.34%. Comparing base (62fe30d) to head (9fd34ca).

❗ There is a different number of reports uploaded between BASE (62fe30d) and HEAD (9fd34ca). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (62fe30d) HEAD (9fd34ca)
net10.0 13 11
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1818      +/-   ##
==========================================
- Coverage   64.91%   58.34%   -6.57%     
==========================================
  Files         300      266      -34     
  Lines        8832     7838     -994     
  Branches     1040      913     -127     
==========================================
- Hits         5733     4573    -1160     
- Misses       2846     3083     +237     
+ Partials      253      182      -71     
Flag Coverage Δ
net10.0 58.33% <ø> (-6.57%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

WhitWaldo added 2 commits May 16, 2026 14:01
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
…imeout

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
@WhitWaldo

Copy link
Copy Markdown
Contributor Author

@copilot Something about the workflow history propagation integration test are failing to run and timing out causing the tests to be cancelled after 30 minutes. These should take only a few seconds apiece to run each of the 5 tests:

Dapr.IntegrationTest.Workflow.HistoryPropagationWorkflowTests.ShouldCompleteSuccessfully_WithNoPropagationScope
Dapr.IntegrationTest.Workflow.HistoryPropagationWorkflowTests.ShouldCompleteSuccessfully_WithOwnHistoryPropagationScope
Dapr.IntegrationTest.Workflow.HistoryPropagationWorkflowTests.ShouldCompleteSuccessfully_WithLineagePropagationScope
Dapr.IntegrationTest.Workflow.HistoryPropagationWorkflowTests.GetPropagatedHistory_ReturnsNull_WhenScheduledWithNoneScope
Dapr.IntegrationTest.Workflow.HistoryPropagationWorkflowTests.WithHistoryPropagation_FluentBuilder_WorksCorrectly

Can you please take a stab at determining what about the implementation in Dapr.Workflow is not working correctly such that the tests are not passing? Don't read into the fact that the code was merged in a previous PR as proof-positive that the underlying code works. This appears to have been erroneously merged into mater. You can read more about the intended implementation details here

Copilot AI commented May 17, 2026

Copy link
Copy Markdown
Contributor

@copilot Something about the workflow history propagation integration test are failing to run and timing out causing the tests to be cancelled after 30 minutes. These should take only a few seconds ap...

The tests were hanging due to two chained bugs in the PropagatedHistoryReceiver child workflow:

Bug 1 (fixed in edcc53e): The child workflow completed synchronously (Task.FromResult(result) with no await). When a Dapr sub-orchestration completes in the same turn it's first started, the sidecar never delivers the SubOrchestrationInstanceCompleted event to the parent — so the parent waits forever. The fix was await context.CreateTimer(TimeSpan.Zero) to force an async round-trip before returning.

Bug 2 (fixed in 44f9f19): CreateTimer(TimeSpan.Zero) produced a CreateTimerAction.fireAt equal to (or up to 99ns before) orchestrationStartTime. This precision loss happens because google.protobuf.Timestamp has nanosecond precision but DateTime only has 100ns (tick) precision — converting Timestamp→DateTime→Timestamp truncates sub-tick nanoseconds. Dapr 1.18.0-rc.3 validates fireAt >= orchestrationStartTime and silently rejects the timer when this is violated, so the child workflow is stuck waiting for a timer that never fires, and the parent never receives the completion event.

The fix in 44f9f19 adds fireAt = _currentUtcDateTime.AddTicks(1) when fireAt <= _currentUtcDateTime. This guarantees the emitted timestamp is always strictly after OrchestratorStarted.Timestamp, satisfying the runtime's validation.

I also added three unit tests in 0688edb to document and prevent regression of this behavior.

WhitWaldo and others added 2 commits May 17, 2026 00:56
Dapr 1.18+ validates that CreateTimerAction.fireAt is strictly greater
than the current sidecar time when CompleteOrchestratorTask is called.
The previous fix added only 1 tick (100 ns) over the orchestration start
timestamp, which arrives already past-due by the time the sidecar processes
the response.  GrpcProtocolHandler silently swallows the rejection error,
causing the child workflow to enter an infinite replay loop and the
integration tests to time out after 30 minutes.

Replace the orchestration-timestamp floor with DateTime.UtcNow.AddSeconds(1).
This ensures the fire time remains strictly in the future regardless of
SDK-to-sidecar network latency or minor clock skew, while still providing
"fire as soon as possible" semantics for zero-delay timers.

Fixes all five HistoryPropagationWorkflowTests failing on Dapr 1.18 RC.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…orkflow-propagation-fix

# Conflicts:
#	src/Dapr.Workflow/Worker/Internal/WorkflowOrchestrationContext.cs
WhitWaldo added 2 commits May 17, 2026 22:15
Signed-off-by: Whit Waldo <whit.waldo@innovian.net>
…ed to new "Workflow" types

Signed-off-by: Whit Waldo <whit.waldo@innovian.net>

Copilot AI commented May 18, 2026

Copy link
Copy Markdown
Contributor

@copilot Excellent. Let's proceed to re-implement the new 1.18 functionality. Please start by uncommenting out (or adding as necessary), the timer origin functionality was originally implemented at ht...

Re-enabled the timer-origin unit tests in test/Dapr.Workflow.Test/Worker/Internal/TimerOriginTests.cs (b5e45c2) — the file was fully commented out in 4ea4525. All 13 tests compile and pass on the post-proto-overhaul SDK without any source modifications. The implementation (WorkflowOrchestrationContext.CreateTimer overload, TimerOriginHelpers, and the TimerOrigin{CreateTimer,ExternalEvent,ActivityRetry,ChildWorkflowRetry} origin fields) and the integration tests (TimerOriginIntegrationTests.cs, 4 tests gated to Dapr ≥ 1.18) were never removed and are already active. Total Dapr.Workflow.Test: 355 passed, 0 failed on net9.0.

@WhitWaldo WhitWaldo merged commit b6fe556 into master May 18, 2026
442 of 445 checks passed
@WhitWaldo WhitWaldo deleted the workflow-propagation-fix branch May 18, 2026 13:25
@nelson-parente nelson-parente added this to the v1.18 milestone May 20, 2026
Copilot AI pushed a commit that referenced this pull request May 22, 2026
Cassie (durabletask-go author) flagged the .NET surface for cross-SDK
divergence post-merge of dotnet-sdk#1802 / #1818. This rewrites the
public history-propagation API to match the go-sdk shape — same one the
python-sdk just adopted (python-sdk#1047). Issue dotnet-sdk#1801 was
closed before her review; this PR delivers what the issue originally
described.

Three concrete gaps closed:

1. Activity-level opt-in (was missing entirely)
   - PropagationScope moved from ChildWorkflowTaskOptions to base
     WorkflowTaskOptions; ChildWorkflowTaskOptions inherits it.
   - WithHistoryPropagation() extension method added on the base record.
   - scheduleTaskAction.HistoryPropagationScope is now wired in
     WorkflowOrchestrationContext.CallActivityInternalAsync so activities
     can opt into propagation, matching CallChildWorkflowInternalAsync.
   - Without this, the Go SDK's reference example (SettlePayment activity
     using PropagateOwnHistory) literally cannot be ported to .NET.

2. Read API rewritten as high-level resolvers (was lossy FilterBy* + a
   PropagatedHistoryEvent record that dropped input/output/failure
   payloads)
   - PropagatedHistory.FilterByAppId/InstanceId/WorkflowName removed.
   - PropagatedHistory now exposes GetWorkflows(), GetWorkflowsByName(),
     GetLastWorkflowByName(), GetAppIds(), GetWorkflowsByAppId(),
     GetWorkflowsByInstanceId().
   - New WorkflowResult class with InstanceId/AppId/Name plus
     GetActivitiesByName(), GetLastActivityByName(),
     GetChildWorkflowsByName(), GetLastChildWorkflowByName() — mirrors
     durabletask-go's GetLastWorkflowByName / GetLastActivityByName /
     GetLastChildWorkflowByName renames from durabletask-go#105.
   - New ActivityResult record carries Name, Started, Completed, Failed,
     Input, Output, FailureDetails — matching the Go/Python equivalents
     so chain-of-custody patterns line up.
   - New ChildWorkflowResult record with the equivalent shape.

3. Event payload preserved internally (was discarded by ConvertChunk)
   - ConvertChunk in WorkflowOrchestrationContext now parses raw events,
     walks them to resolve TaskScheduled <-> TaskCompleted/Failed and
     ChildWorkflowInstanceCreated <-> ChildWorkflowInstanceCompleted/
     Failed by scheduleId, and produces fully-populated ActivityResult /
     ChildWorkflowResult instances. SDK retries reuse TaskExecutionId so
     matching is on scheduleId (matching Go/Python semantics).
   - Public API does not leak the proto HistoryEvent type — resolution
     happens at construction time inside Dapr.Workflow.

Additional surface additions:

- PropagationNotFoundException for missing-name lookups (mirrors
  Python's PropagationNotFoundError / Go's error returns).
- Static WorkflowHistory.PropagateLineage() / PropagateOwnHistory()
  factory helpers for go-sdk call-site parity.

Removed (clean break — 1.18 unreleased): PropagatedHistoryEntry,
PropagatedHistoryEvent, HistoryEventKind, FilterByAppId,
FilterByInstanceId, FilterByWorkflowName.

Tests:

- WorkflowHistoryPropagationTests.cs rewritten end-to-end to cover the
  new resolvers, query helpers, factory helpers, activity-level scope
  wiring, and child-workflow-level scope wiring.
- HistoryPropagationWorkflowTests.cs (integration) updated to use
  GetWorkflows().Count in place of Entries.Count.

Refs: #1801, dapr/durabletask-go#105, dapr/go-sdk#823,
dapr/python-sdk#1047

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

Co-authored-by: nelson-parente <20144601+nelson-parente@users.noreply.github.com>
WhitWaldo added a commit that referenced this pull request May 27, 2026
* feat(workflow): align history propagation API with go-sdk

Cassie (durabletask-go author) flagged the .NET surface for cross-SDK
divergence post-merge of dotnet-sdk#1802 / #1818. This rewrites the
public history-propagation API to match the go-sdk shape — same one the
python-sdk just adopted (python-sdk#1047). Issue dotnet-sdk#1801 was
closed before her review; this PR delivers what the issue originally
described.

Three concrete gaps closed:

1. Activity-level opt-in (was missing entirely)
   - PropagationScope moved from ChildWorkflowTaskOptions to base
     WorkflowTaskOptions; ChildWorkflowTaskOptions inherits it.
   - WithHistoryPropagation() extension method added on the base record.
   - scheduleTaskAction.HistoryPropagationScope is now wired in
     WorkflowOrchestrationContext.CallActivityInternalAsync so activities
     can opt into propagation, matching CallChildWorkflowInternalAsync.
   - Without this, the Go SDK's reference example (SettlePayment activity
     using PropagateOwnHistory) literally cannot be ported to .NET.

2. Read API rewritten as high-level resolvers (was lossy FilterBy* + a
   PropagatedHistoryEvent record that dropped input/output/failure
   payloads)
   - PropagatedHistory.FilterByAppId/InstanceId/WorkflowName removed.
   - PropagatedHistory now exposes GetWorkflows(), GetWorkflowsByName(),
     GetLastWorkflowByName(), GetAppIds(), GetWorkflowsByAppId(),
     GetWorkflowsByInstanceId().
   - New WorkflowResult class with InstanceId/AppId/Name plus
     GetActivitiesByName(), GetLastActivityByName(),
     GetChildWorkflowsByName(), GetLastChildWorkflowByName() — mirrors
     durabletask-go's GetLastWorkflowByName / GetLastActivityByName /
     GetLastChildWorkflowByName renames from durabletask-go#105.
   - New ActivityResult record carries Name, Started, Completed, Failed,
     Input, Output, FailureDetails — matching the Go/Python equivalents
     so chain-of-custody patterns line up.
   - New ChildWorkflowResult record with the equivalent shape.

3. Event payload preserved internally (was discarded by ConvertChunk)
   - ConvertChunk in WorkflowOrchestrationContext now parses raw events,
     walks them to resolve TaskScheduled <-> TaskCompleted/Failed and
     ChildWorkflowInstanceCreated <-> ChildWorkflowInstanceCompleted/
     Failed by scheduleId, and produces fully-populated ActivityResult /
     ChildWorkflowResult instances. SDK retries reuse TaskExecutionId so
     matching is on scheduleId (matching Go/Python semantics).
   - Public API does not leak the proto HistoryEvent type — resolution
     happens at construction time inside Dapr.Workflow.

Additional surface additions:

- PropagationNotFoundException for missing-name lookups (mirrors
  Python's PropagationNotFoundError / Go's error returns).
- Static WorkflowHistory.PropagateLineage() / PropagateOwnHistory()
  factory helpers for go-sdk call-site parity.

Removed (clean break — 1.18 unreleased): PropagatedHistoryEntry,
PropagatedHistoryEvent, HistoryEventKind, FilterByAppId,
FilterByInstanceId, FilterByWorkflowName.

Tests:

- WorkflowHistoryPropagationTests.cs rewritten end-to-end to cover the
  new resolvers, query helpers, factory helpers, activity-level scope
  wiring, and child-workflow-level scope wiring.
- HistoryPropagationWorkflowTests.cs (integration) updated to use
  GetWorkflows().Count in place of Entries.Count.

Refs: #1801, dapr/durabletask-go#105, dapr/go-sdk#823,
dapr/python-sdk#1047

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* fix(workflow): address code-review feedback on history-propagation alignment

- Document the `new`-hiding contract on ChildWorkflowTaskOptions
  .WithHistoryPropagation and add a regression test that asserts the
  returned type is ChildWorkflowTaskOptions (not the base record), so
  InstanceId survives the with-expression.
- Add the standard `()`, `(string)`, and `(string, Exception)` constructors
  on PropagationNotFoundException so callers can wrap inner exceptions.
- Alias StringValue alongside the existing Timestamp alias in
  WorkflowOrchestrationContext so the propagation helper signature stays
  consistent with the rest of the file.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* test(workflow): clarify chunk-order test variable names

Renames the test fixtures in GetPropagatedHistory_PreservesChunkOrder so the
variable order matches the documented oldest-first chunk ordering (index 0 is
the oldest ancestor, the last chunk is the immediate parent). No behavior change.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* fix(workflow): pass StringValue-wrapped fields directly as strings

protoc unwraps google.protobuf.StringValue to a plain string in the
generated C# (only the wire codec uses the wrapper). The
StringValueOrNull(StringValue?) helper added in this branch expected
the wrapper type, breaking the build with CS1503 at the three call
sites in ResolveActivity / ResolveChildWorkflow. Drop the helper and
pass the generated string fields straight through — they are already
nullable at runtime and ActivityResult/ChildWorkflowResult accept
string? for Input/Output.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* test(workflow): assign string directly to wrapper-typed fields

Same StringValue mismatch as the production fix — protoc-generated
properties for google.protobuf.StringValue fields are plain string,
not the wrapper. Drop the new StringValue { Value = ... } wrappers
in the test helpers.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* refactor(workflow): rename propagation types and adopt TryGet pattern

Addresses Whit's review on #1825:

- Rename ActivityResult -> PropagatedHistoryActivityResult
- Rename ChildWorkflowResult -> PropagatedHistoryChildWorkflowResult
- Rename WorkflowResult -> PropagatedHistoryEntry (primary constructor)
- Drop WorkflowHistory static class; callers pass HistoryPropagationScope directly
- Switch GetLast*ByName to TryGet*ByName + drop PropagationNotFoundException
- Drop chunk/chain terminology from public XML docs
- Surface malformed propagated event bytes via InvalidProtocolBufferException
  instead of silently skipping
- Update unit tests to new names and TryGet asserts

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* test(workflow): rename propagation test cases to match renamed types

Test names previously embedded the old type names (ActivityResult,
ChildWorkflowResult); rename to the more general Activity_/ChildWorkflow_
prefix to avoid confusion with the renamed public types.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* fix(workflow): match propagated-history names and app IDs case-insensitively

Workflow / activity names register through WorkflowsFactory with
StringComparer.OrdinalIgnoreCase, and app IDs are matched case-insensitively
on the invocation path. Align the propagated history lookups (GetAppIds,
GetWorkflowsByName, TryGetLastWorkflowByName, GetWorkflowsByAppId,
GetActivitiesByName, TryGetLastActivityByName, GetChildWorkflowsByName,
TryGetLastChildWorkflowByName) with that contract so callers don't get
surprising misses or duplicate logical IDs that only differ by casing.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* perf(workflow): pre-index completion events in ConvertChunk

ConvertChunk previously rescanned the full event list inside ResolveActivity
and ResolveChildWorkflow, making conversion O(n²) in the number of history
events. Pre-index TaskCompleted / TaskFailed by TaskScheduledId (and the
ChildWorkflowInstance counterparts) up front so each scheduled item resolves
in O(1).

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* refactor(workflow): rename PropagatedHistory backing field to _entries

The private field and ctor parameter on PropagatedHistory are now named
after the value type they hold (PropagatedHistoryEntry) rather than the
'workflows' role those entries play today. Public API surface is
unchanged.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* refactor(workflow): use generic propagated-history method names; add Status enum

Addresses Whit's 2026-05-24 review.

Rename the PropagatedHistory query family to scope-neutral names so the public
surface need not change if propagated history ever carries non-workflow entries:
  GetWorkflows()             -> GetEntries()
  GetWorkflowsByName()       -> FilterByWorkflowName()
  GetWorkflowsByAppId()      -> FilterByAppId()
  GetWorkflowsByInstanceId() -> FilterByInstanceId()

Add PropagatedHistoryTaskStatus (Pending/Completed/Failed) and a computed
Status property on PropagatedHistoryActivityResult and
PropagatedHistoryChildWorkflowResult so callers can switch on a single value.
The Started/Completed/Failed flags are retained for go-sdk/python-sdk parity;
Status is a projection of them, with Failed taking precedence over Completed.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

* test(workflow): fix case-insensitive AppId filter expectation

FilterByAppId matches case-insensitively, so two entries whose app IDs
differ only in casing ("AppA" / "appa") both match a query for "APPA".
The de-duped GetAppIds list collapses to one, but the filter returns both;
assert two matches instead of one.

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>

---------

Signed-off-by: Nelson Parente <nelson_parente@live.com.pt>
Co-authored-by: Whit Waldo <whit.waldo@innovian.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants