Implement database trigger constraint for slug uniqueness#3826
Implement database trigger constraint for slug uniqueness#3826hannaseithe wants to merge 3 commits intodevelopfrom
Conversation
Summary of options for SolutionOption 1: Database trigger with JOIN
Option 2: Denormalize region and Database trigger
Option 3: Denormalize region & Exclusion constraint (implemented in the PoC)
Option 4: Denormalize region & Create is_current field & Unique Constraint (with partial index) only for is_current version
Pseudo-Option: Materialized View (instead of column denormalization) & Database Trigger:This is (imo) not an option, because we would not guarantee absolute syncronicity of the Materialized View Table |
|
In Django 5.1 we get The generated field on pageTranslation model: and the exclusionConstraint: |
I believe that GeneratedField does not support this use case. In the expression for the generated field, you can only reference fields of the same model, so Apart from that, I think the trigger solution from Option 1 might be the cleanest, because we don't risk other data inconsistencies just to solve another data inconstency :) Also, I would assume, that the join does not have a huge performance impact since it will probably use the index on cms_page.id. For bulk operations, this might be a different story, but it's probably still feasible (?). Edit: On the other hand, I am not sure whether Option 1 would be safe with regard to concurrency (as opposed to the ExclusionConstraint). We might need to acquire table locks (e.g. |
|
Note to self: temporary branch: clean-fix/slug-uniqueness-constraint |
Analysis: Best Solution for Slug Uniqueness ConstraintConfirming: GeneratedField Won't Workmichael-markl is correct. Django's
So Comparison of Viable Options
My Recommendation: Option 3 (current PoC) is the better choiceWhy:
Suggested Improvements to Current PoC
Re: Blocked by #3917The PR mentions it's blocked by #3917 for speed testing. That makes sense - the migration touching all translations could be slow on production data. Consider:
Note: This analysis was done with AI assistance (Claude Code). |
cf55dc0 to
a116c1a
Compare
a116c1a to
2f3fb9b
Compare
dkehne
left a comment
There was a problem hiding this comment.
Looks good to my pr-review skill so i will spend an approval here ;-) The advisory lock approach correctly solves the race condition raised earlier, and the schema stays clean (no denormalization).
Claude wants to do two minor things you can most probably ignore.
- renaming the trigger names from set_region_on_insert to something like enforce_slug_uniqueness (with the inline comment corrected to reflect that the trigger fires on both INSERT and UPDATE)
- and tackle a 99,99% theoretical INSERT-conflict test that could be added for each content type e.g., two consecutive objects.create() calls with the same slug, language, and region, wrapped in pytest.raises(IntegrityError). may never happen in real life...
jarlhengstmengel
left a comment
There was a problem hiding this comment.
Thank you very much for the PR! This was fun, reviewing actual SQL Code :D Code generally looks good to me. I have some nits about the tests. I tested also in the shell manually and that threw the expected error. I tested with a database dump. Result of the first run of the migration: INFO integreat_cms.cms.utils.slug_utils - Finished >make_all_slugs_unique< after: 874.360s with 939 updated slugs. I didn't trigger the migration a second time with the command. Since the database is migrated on the first startup after importing the dump, it should never run into that situation a second time. So as long as we are aware that it takes long time to run when we trigger the command manually, this shouldn't bother us anymore. (I was worried that it could become difficult to test things with a datadump after this is merged, but that doesn't make sense in hindsight 😅 ).
With the dump I also tested if there is a noticeable difference in performance with bulk actions. I couldn't detect a difference. With a lot of objects selected it takes long either way.
| @pytest.mark.order("last") | ||
| @pytest.mark.django_db(transaction=True) | ||
| def test_db_trigger_prevents_duplicate_slug_on_event_translations() -> None: | ||
| region = Region.objects.create(name="trigger-test-region") |
There was a problem hiding this comment.
| region = Region.objects.create(name="trigger-test-region") | |
| region = Region.objects.create(slug="trigger-test-region") |
The tests work, but when I tested with some similar code in the shell, creating a region with 'name', created problems with the database. I'm not entirely sure what happened but I my impression was that the creating a region only with name has the potential to overwrite another region (which feels like that it shouldn't be possible but that's not part of this PR) When creating with slug it should be on the save side I think since slug is unique.
| def test_db_trigger_prevents_duplicate_slug_on_page_translations( | ||
| create_page: Callable[..., Page], | ||
| ) -> None: | ||
| region = Region.objects.create(name="trigger-test-region") |
There was a problem hiding this comment.
| region = Region.objects.create(name="trigger-test-region") | |
| region = Region.objects.create(slug="trigger-test-region") |
See comment above
| @pytest.mark.order("last") | ||
| @pytest.mark.django_db(transaction=True) | ||
| def test_db_trigger_prevents_duplicate_slug_on_poi_translations() -> None: | ||
| region = Region.objects.create(name="trigger-test-region") |
There was a problem hiding this comment.
| region = Region.objects.create(name="trigger-test-region") | |
| region = Region.objects.create(slug="trigger-test-region") |
See comment above
I am not entirely sure I got this point, but I added a test each that tests on .bulk_create() . Create() itself will never run into a IntegrityError since it passes through save() |
That is strange, because when I ran the command on the Testsystem I got a time duration of 45s with about 600 updated slugs. I think I will have a look into this again, just so we know what exactly to expect during deployment. |
Short description
This is part 2 of this PR #3784. I use a database trigger before
INSERTandUPDATEon pagetranslations, eventtranslations and poitranslations to assure that the slug does not already and if it does I raise a database exception.In part 1 I have implemented application level safeguards, but as mentioned there, they are not really sufficient to guarantee a slugs uniqueness.
Proposed changes
Why is a simple DB constraint not implementable as
UniqueConstraint?A unique constraint basically creates an Index (can be partial when used with condition). But we need the condition "where page_id differs from other page_id". Meaning we need to be comparing two rows, which is not doable neither in Django nor SQL.
But there are ways to ensure uniqueness of slugs on the database level especially with PostgreSQL. The different tools we need are
A database trigger allows us to (before-)hook into every
INSERTorUPDATEon for example the cms_pagetranslation table and then call a function that checks for the uniqueness of the slug against the other pagetranslation rows and throws an exception if its not unique.Alternative solutions
I have doubts though that we wont see noticable performance decrease on production level, especially with bulk actions. If that is the case, another option is to denormalize the region field onto the translations.
Denormalization would look like that
Create a field region on pagetranslation (or rather on AbstractContentTranslation) like this
region = models.ForeignKey("cms.Region", null=True, editable=False, on_delete=models.CASCADE)Both rare cases, but we need to be safe
Once we would have region denormalized on pagetranslation, we then can either chose to implement the trigger from above without the
JOINor use theExclusionConstraintHow to test
Inside the shell:
All these three should throw a database error
Pull Request Review Guidelines
Side effects
make_slugs_uniqueruns less than 1 minutes with current production data the first time and then less than 10 minutes the second time when all slugs are unique already. See the follow up sectio here: Management Command for async script to make all slugs unique #4082Faithfulness to issue description and design
There are no intended deviations from the issue and design.
Resolved issues
Fixes: #3060
Relates to: #3917
2026-03-31: Relevant thread about the integration of migration or running it as a management command: https://chat.tuerantuer.org/digitalfabrik/pl/ry4szsbk8frqbcgt7aw1zhe8br
The run of the (non dry-run) command took 45.461s for 675 updated slugs, this should be doable as a migration
Pull Request Review Guidelines