Skip to content

[branch-52] Cherry-pick apache/datafusion#21104#95

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intoDataDog:branch-52from
dd-david-levin:david.levin/cherry-pick/apache-pr-21104-20260325
Mar 26, 2026
Merged

[branch-52] Cherry-pick apache/datafusion#21104#95
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intoDataDog:branch-52from
dd-david-levin:david.levin/cherry-pick/apache-pr-21104-20260325

Conversation

@dd-david-levin
Copy link
Copy Markdown

cherry-picks apache#21104

…mpatibility (apache#21104)

## Problem

`string_to_array` was returning incorrect results for empty string input
— both when the delimiter is non-empty and when the delimiter is itself
an empty string. This diverges from PostgreSQL behavior.

| Query | DataFusion (before) | PostgreSQL (expected) |
|---|---|---|
| `string_to_array('', ',')` | `['']` | `{}` |
| `string_to_array('', '')` | `['']` | `{}` |
| `string_to_array('', ',', 'x')` | `['']` | `{}` |
| `string_to_array('', '', 'x')` | `['']` | `{}` |

Results from datafusion-cli
<img width="1435" height="104" alt="Screenshot 2026-03-23 at 9 14 08 AM"
src="https://github.com/user-attachments/assets/2eaae366-7f8a-4220-87d2-f0b311c26712"
/>

**Root cause:** Rust's `str::split()` on an empty string always yields
one empty-string element, so `"".split(",")` produces `[""]`.
Additionally, the empty-delimiter branch unconditionally appended the
(empty) string value. This is subtle because Arrow's text display format
appears not to quote strings, so `[""]` renders as `[]` —
indistinguishable from an actual empty array. Using `cardinality()`
reveals the current length is 1, not 0.

**PostgreSQL reference:**
[db-fiddle](https://www.db-fiddle.com/f/oCF8EPaZFkDNKSg28rVVWy/3)

## Fix

In `datafusion/functions-nested/src/string.rs`:

- **Non-empty delimiter** `(Some(string), Some(delimiter))`: added `if
!string.is_empty()` guard to skip splitting when input is empty.
- **Empty delimiter** `(Some(string), Some(""))`: added `if
!string.is_empty()` guard so the string value is only appended when
non-empty.

Both the plain variant and the `null_value` variant are fixed (4 checks
total).

## Tests

Added sqllogictest cases in
`datafusion/sqllogictest/test_files/array.slt` using `cardinality()` to
unambiguously verify the arrays are truly empty (not just displaying as
empty):

```sql
SELECT cardinality(string_to_array('', ','))    -- 0
SELECT cardinality(string_to_array('', ''))     -- 0
SELECT cardinality(string_to_array('', ',', 'x'))  -- 0
SELECT cardinality(string_to_array('', '', 'x'))   -- 0
```

Each test covers one of the four `is_empty` guard checks. All four fail
without the fix (returning 1 instead of 0).

(cherry picked from commit cdaecf0)
@dd-david-levin dd-david-levin force-pushed the david.levin/cherry-pick/apache-pr-21104-20260325 branch from dabfa4d to 9742ac7 Compare March 26, 2026 15:46
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit dbb2ab0 into DataDog:branch-52 Mar 26, 2026
30 checks passed
@gabotechs gabotechs changed the title Cherry-pick apache/datafusion#21104 [branch-52] Cherry-pick apache/datafusion#21104 Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants