OPENNLP-1844 - Make opennlp-dl components thread-safe#1084
Conversation
There was a problem hiding this comment.
Pull request overview
This PR targets thread-safety for the ONNX Runtime–backed DL components (NameFinderDL, DocumentCategorizerDL, SentenceVectorsDL) so a single initialized instance (and its OrtSession) can be safely shared across concurrent callers.
Changes:
- Centralizes DL component initialization in
AbstractDLvia a constructor that assigns shared inference state once (final fields) and adjusts subclasses to delegate to it. - Prevents external mutation races by defensively copying label/category maps and stops closing the process-wide
OrtEnvironmentfrom instanceclose(). - Adds/updates tests, including a new concurrent
NameFinderDLevaluation test.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| opennlp-eval-tests/src/test/java/opennlp/dl/namefinder/NameFinderDLEval.java | Adds a multi-threaded concurrency test for NameFinderDL.find() on a shared instance. |
| opennlp-core/opennlp-ml/opennlp-dl/src/test/java/opennlp/dl/LoadVocabTest.java | Updates vocab-loading tests for the new static AbstractDL.loadVocab(...) usage. |
| opennlp-core/opennlp-ml/opennlp-dl/src/test/java/opennlp/dl/CreateTokenizerTest.java | Updates tokenizer factory tests to call AbstractDL.createTokenizer(...) directly. |
| opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/vectors/SentenceVectorsDL.java | Delegates initialization to AbstractDL and declares the component @ThreadSafe. |
| opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/namefinder/NameFinderDL.java | Delegates initialization to AbstractDL, defensively copies labels, adds @ThreadSafe docs/annotation. |
| opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/doccat/DocumentCategorizerDL.java | Delegates initialization to AbstractDL, defensively copies categories, adds @ThreadSafe docs/annotation. |
| opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/AbstractDL.java | Introduces constructor-based initialization with final fields; makes vocab/tokenizer helpers static; changes close semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for the PR! A few things I'd like to resolve before merge: 1.
So this PR silently changes 2. Comments drifted with the above. doccat now says 3. API-surface changes — and is Minor: in Happy to look again once 1 & 2 are reconciled. |
|
These are all great points. It's never easy to make things thread safe :) So thank you for your patience. I just updated to addresses items 1 & 2 and the API/close notes. 1. Chunking consistency (
|
|
@rzo1 the documentation step keeps failing - is it a flaky step? I can look more in it but re-triggering made it work - so I suspect it's externally dependent on something. It's ready though.. let me know what you think of the updates and if you need anything else. Once the first two are merged, the third one should work. |
Make NameFinderDL, DocumentCategorizerDL and SentenceVectorsDL safe to share across threads, so a single instance (and its loaded ONNX session) can serve concurrent requests instead of being duplicated or externally synchronized.
Inference already held no per-call instance state and OrtSession.run is concurrency-safe; the gaps were safe publication and a shared-resource leak:
Add a concurrent NameFinderDL eval test that runs find() from many threads on one shared instance and asserts every result matches the single-threaded case.
Thank you for contributing to Apache OpenNLP.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.