Skip to content

fix: Fix deathlock during statically registering services#4657

Merged
balhar-jakub merged 11 commits into
v3.x.xfrom
reboot/eureka-registry-deathlock-fix
Jun 2, 2026
Merged

fix: Fix deathlock during statically registering services#4657
balhar-jakub merged 11 commits into
v3.x.xfrom
reboot/eureka-registry-deathlock-fix

Conversation

@pavel-jares-bcm

Copy link
Copy Markdown
Contributor

Description

When service is statically registered (since #4051) there is one additional lock in the class ApimlInstanceRegistry (see the synchronization removal). In case of multiple threads it could lead to service freezing on start-up during reading the static definition file.

Lock order of registerStatically:

  • org.zowe.apiml.discovery.ApimlInstanceRegistry#registerStatically : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.lock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : leave synchronized block, but it is still locked
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.unlock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : unlock synchronized block

Lock order of register:

  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.lock();
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : synchronized (lock)
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : unlock synchronized block
  • com.netflix.eureka.registry.AbstractInstanceRegistry#register : read.unlock();

Deathlock:
thead1 (register): read.lock()
thead2 (registerStatically): synchronized (lock)
thread1 (register): is waiting for synchronized block to be available (to be done by thread2)
thread2 (registerStatically): is waiting for read.unlock() (to be issued by thread1)

Note: locks are used also by other methods but it should be the issue.

The source is the REST call from any south-bound service during the initialization of static services. The risk is higher when the static definition file or amount of service is bigger.

The additional lock (see the synchronized (lock) in registerStatically) was added because the original method register shouldn't change the property expectedNumberOfClientsSendingRenews. Using the same lock should prevent exclusivity to update the field. It is not correct solution because of orders of locks.

The new solution store information about the change (correction - the oposite operation) in a ThreadLocal, so the locks are not necessary now.

The similar issue was in cancel method as well. But it is not big deal as this method is used very rarelly.

Other changes in the PR becomes from analysis that using reflection is not needed at this version of Eureka (there were very probably changes in the overriden code). We can call the whole method now and it is not necessary to reimplement it via calling private methods via reflection.

Linked to # (issue)
Part of the # (epic)

Type of change

Please delete options that are not relevant.

  • fix: Bug fix (non-breaking change which fixes an issue)
  • feat: New feature (non-breaking change which adds functionality)
  • docs: Change in a documentation
  • refactor: Refactor the code
  • chore: Chore, repository cleanup, updates the dependencies.
  • BREAKING CHANGE or !: Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines of this project
  • PR title conforms to commit message guideline ## Commit Message Structure Guideline
  • I have commented my code, particularly in hard-to-understand areas. In JS I did provide JSDoc
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • The java tests in the area I was working on leverage @nested annotations
  • Any dependent changes have been merged and published in downstream modules

For more details about how should the code look like read the Contributing guideline

Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
@pull-request-size pull-request-size Bot added size/XL and removed size/L labels Jun 1, 2026
@pavel-jares-bcm pavel-jares-bcm marked this pull request as ready for review June 1, 2026 11:48
@pull-request-size pull-request-size Bot added size/L and removed size/XL labels Jun 1, 2026
@taban03

taban03 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Not directly related to your PR, but are we sure that this calculation of adding a fixed +2 in getNumOfRenewsInLastMinis consistent in case of multiple static services registered? Also, could this affect the Eureka metrics (i.e. actuator) that may show more renewal than actually occurred?

@pavel-jares-bcm

Copy link
Copy Markdown
Contributor Author

Not directly related to your PR, but are we sure that this calculation of adding a fixed +2 in getNumOfRenewsInLastMinis consistent in case of multiple static services registered? Also, could this affect the Eureka metrics (i.e. actuator) that may show more renewal than actually occurred?

I am not sure, but I would say it make sense only for Modulith and the number is incorrect, there should be 3 for apiml (CS, Catalog, GW) and maybe also amount of statically registered services, but I don't consider it as part of this PR. Maybe we can create a issue for verification.

Signed-off-by: Pavel Jareš <Pavel.Jares@broadcom.com>
@sonarqubecloud

sonarqubecloud Bot commented Jun 2, 2026

Copy link
Copy Markdown

@balhar-jakub balhar-jakub merged commit 05164bb into v3.x.x Jun 2, 2026
45 checks passed
@balhar-jakub balhar-jakub deleted the reboot/eureka-registry-deathlock-fix branch June 2, 2026 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Development

Successfully merging this pull request may close these issues.

3 participants