Skip to content

Bug report: Set Difference and Set Intersection preserve duplicates from first sample #2241

@williballenthin

Description

@williballenthin

Describe the bug
The Set Difference and Set Intersection operations preserve duplicate items from the first sample, violating mathematical set semantics and producing unexpected results.

src/core/operations/SetDifference.mjs, runSetDifference()
src/core/operations/SetIntersection.mjs, runIntersect()

Both operations perform a simple .filter() on array a:

// SetDifference
runSetDifference(a, b) {
    return a
        .filter((item) => {
            return b.indexOf(item) === -1;
        })
        .join(this.itemDelimiter);
}

// SetIntersection
runIntersect(a, b) {
    return a
        .filter((item) => {
            return b.indexOf(item) > -1;
        })
        .join(this.itemDelimiter);
}

This preserves all occurrences of items that pass the filter. If a = ["x", "x", "y"] and b = ["y"], Set Difference returns "x,x". If a = ["y", "y", "z"] and b = ["y"], Set Intersection returns "y,y". Set operations should return each element at most once.

To Reproduce
configure both operations with sample delimiter \n\n and item delimiter ,:

  • Set Difference: input red,red,blue\n\nblue — expected red, actual red,red
  • Set Intersection: input red,red,blue\n\nred,blue — expected red,blue, actual red,red,blue

Expected behaviour
This is as much a design question as a bug. Users familiar with set theory expect A ∩ B and A - B to produce proper sets, but the current behavior treats these as "filter array A by membership in array B", preserving order and multiplicity. Set Union in the same module already deduplicates its output, which suggests the original intent was mathematical set semantics.

Screenshots

Image

Additional context
Suggested fix:

// SetDifference
runSetDifference(a, b) {
    const excluded = new Set(b);
    const seen = new Set();

    return a
        .filter((item) => {
            if (excluded.has(item) || seen.has(item)) {
                return false;
            }
            seen.add(item);
            return true;
        })
        .join(this.itemDelimiter);
}

// SetIntersection
runIntersect(a, b) {
    const included = new Set(b);
    const seen = new Set();

    return a
        .filter((item) => {
            if (!included.has(item) || seen.has(item)) {
                return false;
            }
            seen.add(item);
            return true;
        })
        .join(this.itemDelimiter);
}

This change would alter behavior for users relying on duplicate-preserving output. If backward compatibility matters, consider adding a "Deduplicate results" boolean argument (default: true), or renaming these to "List Difference" / "List Intersection" and creating separate true set operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions