Skip to content

Implement database trigger constraint for slug uniqueness#3826

Open
hannaseithe wants to merge 3 commits intodevelopfrom
fix/slug-uniqueness-exclusion-constraint
Open

Implement database trigger constraint for slug uniqueness#3826
hannaseithe wants to merge 3 commits intodevelopfrom
fix/slug-uniqueness-exclusion-constraint

Conversation

@hannaseithe
Copy link
Copy Markdown
Contributor

@hannaseithe hannaseithe commented Aug 6, 2025

Short description

This is part 2 of this PR #3784. I use a database trigger before INSERT and UPDATE on pagetranslations, eventtranslations and poitranslations to assure that the slug does not already and if it does I raise a database exception.

In part 1 I have implemented application level safeguards, but as mentioned there, they are not really sufficient to guarantee a slugs uniqueness.

Proposed changes

Why is a simple DB constraint not implementable as UniqueConstraint?

A unique constraint basically creates an Index (can be partial when used with condition). But we need the condition "where page_id differs from other page_id". Meaning we need to be comparing two rows, which is not doable neither in Django nor SQL.

But there are ways to ensure uniqueness of slugs on the database level especially with PostgreSQL. The different tools we need are

  • Database Triggers
  • Denormalization of a column (not so much a tool but a pattern)
  • ExclusionConstraint

A database trigger allows us to (before-)hook into every INSERTor UPDATE on for example the cms_pagetranslation table and then call a function that checks for the uniqueness of the slug against the other pagetranslation rows and throws an exception if its not unique.

Alternative solutions

I have doubts though that we wont see noticable performance decrease on production level, especially with bulk actions. If that is the case, another option is to denormalize the region field onto the translations.

Denormalization would look like that
Create a field region on pagetranslation (or rather on AbstractContentTranslation) like this region = models.ForeignKey("cms.Region", null=True, editable=False, on_delete=models.CASCADE)

  • make it uneditable so it wont show up in forms
  • and create two triggers that keep it in sync with page.region:
  1. When page should ever change its region
  2. If a pagetranslation should ever change its page
    Both rare cases, but we need to be safe

Once we would have region denormalized on pagetranslation, we then can either chose to implement the trigger from above without the JOIN or use the ExclusionConstraint

How to test

Inside the shell:

PageTranslation.objects.filter(id=xyz).update(slug='slug-already-in-use-by other-page*')
EventTranslation.objects.filter(id=xyz).update(slug='slug-already-in-use-by other-event*')
POITranslation.objects.filter(id=xyz).update(slug='slug-already-in-use-by other-poi*')

*In the same region and language

All these three should throw a database error


Pull Request Review Guidelines

Side effects

  • This might decrease performance significantly. Bulk Actions / Management Commands that create or modifiy pages, events and pois should be tested
  • the migration make_slugs_unique runs less than 1 minutes with current production data the first time and then less than 10 minutes the second time when all slugs are unique already. See the follow up sectio here: Management Command for async script to make all slugs unique #4082

Faithfulness to issue description and design

There are no intended deviations from the issue and design.

Resolved issues

Fixes: #3060
Relates to: #3917

2026-03-31: Relevant thread about the integration of migration or running it as a management command: https://chat.tuerantuer.org/digitalfabrik/pl/ry4szsbk8frqbcgt7aw1zhe8br

The run of the (non dry-run) command took 45.461s for 675 updated slugs, this should be doable as a migration


Pull Request Review Guidelines

@hannaseithe
Copy link
Copy Markdown
Contributor Author

hannaseithe commented Aug 11, 2025

Summary of options for Solution

Option 1: Database trigger with JOIN

  • on every *-translation insert/update
  • check all rows (and do a JOIN for each row), that there isnt another *-translation with a different page, the same language and the same region => otherwise throw database exception

Option 2: Denormalize region and Database trigger

  • denormalize region from * into *-translation table (use two triggers to keep them in sync)
  • create database trigger like in Option 1 without the JOIN

Option 3: Denormalize region & Exclusion constraint (implemented in the PoC)

  • denormalize like in Option 2
  • create an Exclusion Constraint

Option 4: Denormalize region & Create is_current field & Unique Constraint (with partial index) only for is_current version

  • denormalize as in Option 2
  • create is_current field on *-translation
  • create unique constraint mit WHERE is_current=TRUE bzw. condition=Q(is_current=True)

Pseudo-Option: Materialized View (instead of column denormalization) & Database Trigger:

This is (imo) not an option, because we would not guarantee absolute syncronicity of the Materialized View Table

@hannaseithe
Copy link
Copy Markdown
Contributor Author

hannaseithe commented Sep 8, 2025

In Django 5.1 we get GeneratedField which should be a better substitute than the denormalization solution. We would not need a trigger at all and could use the GeneratedField in combination with an exclusionConstraint (Generated Field basically has internal triggers that would keep the column up to date). Since there is a PR for upgrading to Django5.2 (#3837) . I will set a BLOCKED label and defer the implementation until we have upgraded to Django5.2

The generated field on pageTranslation model:

    page_region_id = models.GeneratedField(expression=F("page__region"), stored=True)

and the exclusionConstraint:

            ExclusionConstraint(
                name="exclude_same_slug_lang_region_diff_page",
                expressions=[
                    (F("slug"), RangeOperators.EQUAL),
                    (F("language"), RangeOperators.EQUAL),
                    (F("page_region_id"), RangeOperators.EQUAL),
                    (F("page"), RangeOperators.NOT_EQUAL),
                ],
                index_type="gist",
            ),

@michael-markl
Copy link
Copy Markdown
Member

michael-markl commented Nov 3, 2025

In Django 5.1 we get GeneratedField which should be a better substitute than the denormalization solution

I believe that GeneratedField does not support this use case. In the expression for the generated field, you can only reference fields of the same model, so page__region would not be allowed, right?

Apart from that, I think the trigger solution from Option 1 might be the cleanest, because we don't risk other data inconsistencies just to solve another data inconstency :) Also, I would assume, that the join does not have a huge performance impact since it will probably use the index on cms_page.id. For bulk operations, this might be a different story, but it's probably still feasible (?).

Edit: On the other hand, I am not sure whether Option 1 would be safe with regard to concurrency (as opposed to the ExclusionConstraint). We might need to acquire table locks (e.g. LOCK TABLE cms_page, cms_pagetranslations IN EXCLUSIVE MODE;) inside the trigger if that is possible (which I believe it is).

@hannaseithe hannaseithe removed the blocked Blocked by external dependency label Jan 14, 2026
@hannaseithe hannaseithe added the blocked Blocked by external dependency label Jan 19, 2026
@hannaseithe
Copy link
Copy Markdown
Contributor Author

Note to self: temporary branch: clean-fix/slug-uniqueness-constraint

@dkehne
Copy link
Copy Markdown
Collaborator

dkehne commented Jan 26, 2026

Analysis: Best Solution for Slug Uniqueness Constraint

Confirming: GeneratedField Won't Work

michael-markl is correct. Django's GeneratedField only supports expressions referencing fields within the same model:

"The expressions should be deterministic and only reference fields within the model (in the same database table)."

So F("page__region") won't work - this is a database-level limitation, not just Django.


Comparison of Viable Options

Aspect Option 1: Trigger + JOIN Option 3: Denormalize + ExclusionConstraint (current PoC)
Data integrity ✅ No redundancy ⚠️ Redundant region column (kept in sync via triggers)
Constraint enforcement Via trigger logic ✅ Native PostgreSQL constraint
Concurrency safety ⚠️ Needs careful handling (locks or SERIALIZABLE) ✅ ExclusionConstraint is ACID-safe
Performance JOIN on indexed FK should be fast Slightly faster (no JOIN needed)
Complexity Simpler schema, complex trigger More complex schema, simpler constraint
PostgreSQL dependency Trigger syntax ExclusionConstraint + btree_gist
Error messages Custom exception text PostgreSQL constraint violation

My Recommendation: Option 3 (current PoC) is the better choice

Why:

  1. Concurrency is the killer argument: ExclusionConstraint is enforced at transaction commit with proper ACID guarantees. A trigger-based check (Option 1) has a race condition window - two concurrent transactions could both pass the check, then both commit, resulting in duplicates. Fixing this requires:

    • SERIALIZABLE isolation level (performance impact), or
    • Explicit table locks in the trigger (complexity + contention)
  2. The denormalization risk is well-managed: Your PoC already has the three triggers needed to keep region in sync:

    • set_region_on_insert (BEFORE INSERT on PageTranslation)
    • set_region_on_page_change (BEFORE UPDATE on PageTranslation when page_id changes)
    • update_translations_on_region_change (AFTER UPDATE on Page when region_id changes)
  3. Error handling is cleaner: PostgreSQL will raise a clear constraint violation that Django can catch and handle, vs. custom exception handling from a trigger.


Suggested Improvements to Current PoC

  1. Add an index for the constraint (if not implicit):

    indexes = [
        GistIndex(fields=["slug", "language", "region", "page"], name="slug_uniqueness_gist_idx")
    ]
  2. Consider adding a CHECK constraint to ensure region always matches page.region:
    This is a safety net if triggers ever fail. Can be done via a trigger that runs periodically or a management command for verification.

  3. Apply to EventTranslation and POITranslation too: The PoC currently only does PageTranslation. The same pattern should be applied to the other translation models.

  4. Migration performance: The make_slugs_unique migration iterates all translations with .save() on each. For large databases, consider using bulk_update() or raw SQL for better performance.


Re: Blocked by #3917

The PR mentions it's blocked by #3917 for speed testing. That makes sense - the migration touching all translations could be slow on production data. Consider:

  • Running on a production dump to measure actual time
  • Adding a progress indicator or batching for the migration

Note: This analysis was done with AI assistance (Claude Code).

@osmers osmers added this to the Next milestone Mar 17, 2026
@jarlhengstmengel jarlhengstmengel removed this from the Next milestone Mar 17, 2026
@hannaseithe hannaseithe removed the blocked Blocked by external dependency label Mar 31, 2026
@hannaseithe hannaseithe force-pushed the fix/slug-uniqueness-exclusion-constraint branch from cf55dc0 to a116c1a Compare April 20, 2026 08:57
@hannaseithe hannaseithe changed the title PoC: Add an Exclusion Constraint for pagetranslation's slug uniqueness Implement database trigger constraint for slug uniqueness Apr 20, 2026
@hannaseithe hannaseithe force-pushed the fix/slug-uniqueness-exclusion-constraint branch from a116c1a to 2f3fb9b Compare April 20, 2026 11:58
@hannaseithe hannaseithe marked this pull request as ready for review April 20, 2026 11:59
@jarlhengstmengel jarlhengstmengel self-requested a review April 22, 2026 11:46
@jarlhengstmengel jarlhengstmengel self-assigned this Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@dkehne dkehne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to my pr-review skill so i will spend an approval here ;-) The advisory lock approach correctly solves the race condition raised earlier, and the schema stays clean (no denormalization).

Claude wants to do two minor things you can most probably ignore.

  • renaming the trigger names from set_region_on_insert to something like enforce_slug_uniqueness (with the inline comment corrected to reflect that the trigger fires on both INSERT and UPDATE)
  • and tackle a 99,99% theoretical INSERT-conflict test that could be added for each content type e.g., two consecutive objects.create() calls with the same slug, language, and region, wrapped in pytest.raises(IntegrityError). may never happen in real life...

Copy link
Copy Markdown
Contributor

@jarlhengstmengel jarlhengstmengel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the PR! This was fun, reviewing actual SQL Code :D Code generally looks good to me. I have some nits about the tests. I tested also in the shell manually and that threw the expected error. I tested with a database dump. Result of the first run of the migration: INFO integreat_cms.cms.utils.slug_utils - Finished >make_all_slugs_unique< after: 874.360s with 939 updated slugs. I didn't trigger the migration a second time with the command. Since the database is migrated on the first startup after importing the dump, it should never run into that situation a second time. So as long as we are aware that it takes long time to run when we trigger the command manually, this shouldn't bother us anymore. (I was worried that it could become difficult to test things with a datadump after this is merged, but that doesn't make sense in hindsight 😅 ).
With the dump I also tested if there is a noticeable difference in performance with bulk actions. I couldn't detect a difference. With a lot of objects selected it takes long either way.

@pytest.mark.order("last")
@pytest.mark.django_db(transaction=True)
def test_db_trigger_prevents_duplicate_slug_on_event_translations() -> None:
region = Region.objects.create(name="trigger-test-region")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
region = Region.objects.create(name="trigger-test-region")
region = Region.objects.create(slug="trigger-test-region")

The tests work, but when I tested with some similar code in the shell, creating a region with 'name', created problems with the database. I'm not entirely sure what happened but I my impression was that the creating a region only with name has the potential to overwrite another region (which feels like that it shouldn't be possible but that's not part of this PR) When creating with slug it should be on the save side I think since slug is unique.

def test_db_trigger_prevents_duplicate_slug_on_page_translations(
create_page: Callable[..., Page],
) -> None:
region = Region.objects.create(name="trigger-test-region")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
region = Region.objects.create(name="trigger-test-region")
region = Region.objects.create(slug="trigger-test-region")

See comment above

@pytest.mark.order("last")
@pytest.mark.django_db(transaction=True)
def test_db_trigger_prevents_duplicate_slug_on_poi_translations() -> None:
region = Region.objects.create(name="trigger-test-region")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
region = Region.objects.create(name="trigger-test-region")
region = Region.objects.create(slug="trigger-test-region")

See comment above

@hannaseithe
Copy link
Copy Markdown
Contributor Author

hannaseithe commented Apr 27, 2026

  • and tackle a 99,99% theoretical INSERT-conflict test that could be added for each content type e.g., two consecutive objects.create() calls with the same slug, language, and region, wrapped in pytest.raises(IntegrityError). may never happen in real life...

I am not entirely sure I got this point, but I added a test each that tests on .bulk_create() . Create() itself will never run into a IntegrityError since it passes through save()

@hannaseithe
Copy link
Copy Markdown
Contributor Author

Result of the first run of the migration: INFO integreat_cms.cms.utils.slug_utils - Finished >make_all_slugs_unique< after: 874.360s with 939 updated slugs.

That is strange, because when I ran the command on the Testsystem I got a time duration of 45s with about 600 updated slugs. I think I will have a look into this again, just so we know what exactly to expect during deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple events have the same path

5 participants