Skip to content

Fix: Prevent duplicate AMRFinderPlus database paths#7949

Open
jakobnissen wants to merge 3 commits intogalaxyproject:mainfrom
jakobnissen:amr
Open

Fix: Prevent duplicate AMRFinderPlus database paths#7949
jakobnissen wants to merge 3 commits intogalaxyproject:mainfrom
jakobnissen:amr

Conversation

@jakobnissen
Copy link
Copy Markdown
Contributor

@jakobnissen jakobnissen commented Apr 30, 2026

This PR contains four fixes to AMRFinderPlus and its database manager, fixing three separate bugs. They are made as three separate commits which can be reviewed independently.

The background is that we experienced AMRFinderPlus job failures. The executed command-line contained the option --database '/users/data/Services/galaxy/server/tool-data/amrfinderplus-db/amrfinderplus_V3.12_2024-05-02.2,/users/data/Services/galaxy/server/tool-data/amrfinderplus-db/amrfinderplus_V3.12_2024-05-02.2', which is a bug (note it's the same path, joined with ,)

The underlying issue is that the data manager (DM) would add new rows to the output .loc file every time it was invoked - including if new versions of the data manager was installed. This would result in identical rows in the .loc file.
Then, AMRFinderPlus itself would query that database with <filter type="static_value" value="3.12" column="db_version"/>, which could erroneously result in multiple values being selected.

We fix this here in four steps:

  1. Make AMRFinderPlus resistant to multiple identical rows by adding a unique_value filter when querying the database (and test it).
  2. Making the AMRFinderPlus DM reentrant, by making it check for pre-existing entries in the database and not add new entries if the entry it wanted to add already existed. This approach was copied from the Nextclade data manager. I include both this fix and the above because users might run an older version of the tool with a new version of the database manager or vice versa; so a fix on both ends will be necessary for some.
  3. (implemented with 2 for ease) Add missing db_version column to the DM output, which did not conform to the schema. This is a separate bug.
  4. The current tests for the data manager tests the data version 4, which requires AMRFinderPlus v4. However, this has not been released as a tool yet, so these tests fail. Disable v4 in the data manager's tests. Note that this is a third bug in the test suite not related to these other changes. I include it here, because otherwise CI fails.

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)*

Include db_version in data-manager output rows and make reruns skip databases
whose stable data-table value is already registered. Bump the data-manager
Galaxy version suffix once for the combined data-manager changes.
Filter duplicate database names in the wrapper data-table options and add a
regression fixture/assertion for comma-joined database paths. Bump only the
Galaxy build suffix for the wrapper.
The data manager exposes AMRFinderPlus database 4.0, but amrfindeplus itself
is still tied to the v3.
Because 4.0 was the first select option, tests that did not explicitly set
a database version started exercising the unsupported v4 path
and the duplicate-database test compared a known 3.12 value against
a selected 4.0 database.

Explicitly select the 3.12 test databases in the existing tests and remove
the premature v4 data-manager test case.
This keeps v4 available from the data manager, while avoiding attempting
to test an unready end-to-end path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant