TRUNK-6429: Addressing post commit review#6190
Conversation
d9bacac to
02aef6b
Compare
There was a problem hiding this comment.
Merge — no blocking issues. Clean cleanup/enhancement of the unreleased 2.9.0 outbox feature. Two non-blocking items, both inline:
OutboxEvent.getUuid()adds a redundant@Column(name = "uuid")— already mapped by theBaseOpenmrsObject@MappedSuperclassand ignored under field access. Recommend removing.- The persisted payload format gained a
@classtype id —questionto confirm no SNAPSHOT data depends on the old format and thatId.CLASSwas deliberate.
Minor (take or leave): @since was normalized to 2.9.0 across the outbox package, but the six sibling event classes (SaveServiceEvent, VoidServiceEvent, UnvoidServiceEvent, RetireServiceEvent, UnretireServiceEvent, PurgeServiceEvent) still read 2.9.x; the new public OutboxEventService methods lack @since (and getFailingEvents has no Javadoc) now that the class is documented as public API.
Needs action before merge: none.
| @Column(name = "uuid", nullable = false) | ||
| @Override | ||
| public String getUuid() { | ||
| return super.getUuid(); | ||
| } |
There was a problem hiding this comment.
suggestion (non-blocking): This getUuid() override looks redundant. uuid is already mapped by BaseOpenmrsObject, a @MappedSuperclass whose uuid field carries @Column(name = "uuid", unique = true, nullable = false, length = 38, updatable = false). Since @Id is on the id field, this entity uses FIELD access, so Hibernate ignores a @Column placed on a getter — the green build (no duplicate-mapping error) and the passing where uuid = :uuid query both confirm uuid is mapped and queryable without this override.
So it adds nothing today, and it's slightly risky: if the entity ever switched to property access, this definition would override the superclass's and silently drop unique = true, length = 38, and updatable = false. I'd remove the five lines unless they're working around something I'm not seeing.
There was a problem hiding this comment.
I'll remove getUuid() override.
| * @since 2.9.x | ||
| * @since 2.9.0 | ||
| */ | ||
| @JsonTypeInfo(use = JsonTypeInfo.Id.CLASS, include = JsonTypeInfo.As.PROPERTY, property = "@class", defaultImpl = OutboxableEvent.class) |
There was a problem hiding this comment.
question: This changes the persisted payload format — events now serialize with a @class type id ({"@class":"…","uuid":"…"} instead of {"uuid":"…"}, per the updated OutboxEventIT). On replay, readValue expects @class; a row written before this change has no @class, so it falls back to defaultImpl = OutboxableEvent.class — an interface Jackson can't instantiate, so deserialization fails hard and the event exhausts its retries into the new FAILED state.
For unreleased 2.9.0 with no persisted data this is moot — is that the case, or could SNAPSHOT deployments have existing outbox_event rows needing a migration? Also, was Id.CLASS (vs Id.NAME + registered subtypes) deliberate? Low risk since payloads are system-generated, but it bakes FQCNs into stored data and enables class-name-driven deserialization, so worth a conscious call.
There was a problem hiding this comment.
Using class is desired as it simplifies subclassing without a need to define subtypes. The events are public API similar to domain model thus should not change. I'll remove the defaultImpl.
02aef6b to
3282cec
Compare
There was a problem hiding this comment.
Merge. No blockers — the three findings below are a contract/doc question, a missing test, and a design question, all non-blocking. The one worth settling before modules build on this newly-public API is finding 1 (resetOutboxEventRetryCount doesn't re-poll a FAILED event). Scope: unreleased 2.9.0 outbox feature, judged as production API.
resetOutboxEventRetryCount(uuid)leaves statusFAILED. It zeroes the counter but the poller only selectsPENDING/PROCESSING, so for a dead-lettered event no actual retry happens — despite the "to continue retrying" Javadoc. Recovery currently also requiresretryFailedOutboxEvent. (inline)- The
errorCount >= retryLimit->FAILEDtransition has no test — the headline DLQ behavior. The IT's failing listener throws 4x (< the default 16) then succeeds (->COMPLETED);OutboxEventServiceTestonly pre-setsStatus.FAILED. (inline) resetStuckEvent()bypassesretryLimit— it bumpserrorCounton every stuck revert but never dead-letters, so a listener that consistently times out (rather than throws) is retried forever. Question on intent. (inline)
Needs action before merge: none.
| * @param uuid | ||
| */ | ||
| @Transactional | ||
| public void resetOutboxEventRetryCount(String uuid) { |
There was a problem hiding this comment.
question: resetOutboxEventRetryCount zeroes errorCount/errorMessage but leaves status untouched. For an event that's already FAILED, the counter is reset but the status stays FAILED, so the poller never re-selects it — getProcessingAndPendingEvents() only matches PENDING/PROCESSING — meaning no actual retry happens, even though the Javadoc says "to continue retrying." Recovering a dead-lettered event currently also requires retryFailedOutboxEvent(uuid) to flip it back to PENDING, and the cross-reference between the two methods hints that pairing is the intent.
Is that the intended contract? If so, I'd reword this Javadoc to say it only clears the counter and the event must be moved to PENDING (via retryFailedOutboxEvent) to actually retry. Cleaner for callers would be one-call recovery — e.g. have retryFailedOutboxEvent also reset the count, leaving this method for the rarer "clear the counter without changing status" case. Recommended: pick one and document it, since modules are the intended consumers and have only the Javadoc to go on.
| outboxExceptionEvent.setPendingRetry(true); | ||
| } else { | ||
| // Stop retries to process the next item | ||
| item.setStatus(OutboxEvent.Status.FAILED); |
There was a problem hiding this comment.
suggestion (non-blocking): This errorCount >= retryLimit -> FAILED transition is the headline behavior of the PR, but nothing tests it. The IT's onPatientCreatedFailing listener throws only 4 times (< the default limit of 16) and then succeeds, so the event reaches COMPLETED and this else branch is never taken; OutboxEventServiceTest exercises retry/reset/getFailingEvents only on events it pre-sets to Status.FAILED. So neither the boundary (< retryLimit vs >=) nor the pendingRetry=false/retryCount carried on the published OutboxExceptionEvent is covered.
Worth a test that sets outboxevent.retry.limit low (e.g. 1-2) with an always-failing listener and asserts the event lands in FAILED and the emitted OutboxExceptionEvent has pendingRetry=false and the expected retryCount.
There was a problem hiding this comment.
retryLimit is tested in OutboxEventServiceTest. I don't want to extend already long running OutboxEventIT to reach the retryLimit, which is not easy to adjust in tests.
| @@ -67,7 +71,7 @@ public OutboxPollingTaskHandler(OutboxEventRegistry registry, ObjectMapper objec | |||
| @Override | |||
| public void execute(OutboxPollingTaskData taskData, TaskContext taskContext) throws Exception { | |||
| outboxEventService.resetStuckEvent(); | |||
There was a problem hiding this comment.
question: resetStuckEvent() reverts any event stuck in PROCESSING back to PENDING and bumps errorCount (coalesce(errorCount,0)+1), but it never consults retryLimit. So a listener that consistently times out (hangs rather than throws) keeps getting reset to PENDING forever — errorCount climbs without bound and the event is never dead-lettered, so under the strict-ordering guarantee it can block the queue indefinitely. Only listeners that throw (the catch below) ever reach FAILED.
Is dead-lettering intentionally limited to throwing listeners, treating timeouts as always-retryable? If a perpetually-stuck event should also be capped, the retryLimit check needs to apply on this stuck-reset path too.
3282cec to
3af45fa
Compare
|
Addressed 1. and 3. |
|
|
@rkorytkowski this is not a blocker. But just wanted to get your thoughts on it: #6195 |
|
@rkorytkowski what do you also think about this? #6197 |
|
And finally, @rkorytkowski what do you think about this? #6199 |



Description of what I changed
Addresses review comments on #6084
Adds a basic dead letter queue for OutboxEvents failing a configurable number of times (default: 16) with API methods to force retry.
Issue I worked on
see https://issues.openmrs.org/browse/TRUNK-6429
Checklist: I completed these to help reviewers :)
My IDE is configured to follow the code style of this project.
No? Unsure? -> configure your IDE, format the code and add the changes with
git add . && git commit --amendI have added tests to cover my changes. (If you refactored
existing code that was well tested you do not have to add tests)
No? -> write tests and add them to this commit
git add . && git commit --amendI ran
mvn clean packageright before creating this pull request andadded all formatting changes to my commit.
No? -> execute above command
All new and existing tests passed.
No? -> figure out why and add the fix to your commit. It is your responsibility to make sure your code works.
My pull request is based on the latest changes of the master branch.
No? Unsure? -> execute command
git pull --rebase upstream master