[branch-52] Cherry-pick apache/datafusion#21104#95
Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit intoDataDog:branch-52from Mar 26, 2026
Conversation
LiaCastaneda
approved these changes
Mar 26, 2026
…mpatibility (apache#21104) ## Problem `string_to_array` was returning incorrect results for empty string input — both when the delimiter is non-empty and when the delimiter is itself an empty string. This diverges from PostgreSQL behavior. | Query | DataFusion (before) | PostgreSQL (expected) | |---|---|---| | `string_to_array('', ',')` | `['']` | `{}` | | `string_to_array('', '')` | `['']` | `{}` | | `string_to_array('', ',', 'x')` | `['']` | `{}` | | `string_to_array('', '', 'x')` | `['']` | `{}` | Results from datafusion-cli <img width="1435" height="104" alt="Screenshot 2026-03-23 at 9 14 08 AM" src="https://github.com/user-attachments/assets/2eaae366-7f8a-4220-87d2-f0b311c26712" /> **Root cause:** Rust's `str::split()` on an empty string always yields one empty-string element, so `"".split(",")` produces `[""]`. Additionally, the empty-delimiter branch unconditionally appended the (empty) string value. This is subtle because Arrow's text display format appears not to quote strings, so `[""]` renders as `[]` — indistinguishable from an actual empty array. Using `cardinality()` reveals the current length is 1, not 0. **PostgreSQL reference:** [db-fiddle](https://www.db-fiddle.com/f/oCF8EPaZFkDNKSg28rVVWy/3) ## Fix In `datafusion/functions-nested/src/string.rs`: - **Non-empty delimiter** `(Some(string), Some(delimiter))`: added `if !string.is_empty()` guard to skip splitting when input is empty. - **Empty delimiter** `(Some(string), Some(""))`: added `if !string.is_empty()` guard so the string value is only appended when non-empty. Both the plain variant and the `null_value` variant are fixed (4 checks total). ## Tests Added sqllogictest cases in `datafusion/sqllogictest/test_files/array.slt` using `cardinality()` to unambiguously verify the arrays are truly empty (not just displaying as empty): ```sql SELECT cardinality(string_to_array('', ',')) -- 0 SELECT cardinality(string_to_array('', '')) -- 0 SELECT cardinality(string_to_array('', ',', 'x')) -- 0 SELECT cardinality(string_to_array('', '', 'x')) -- 0 ``` Each test covers one of the four `is_empty` guard checks. All four fail without the fix (returning 1 instead of 0). (cherry picked from commit cdaecf0)
dabfa4d to
9742ac7
Compare
dbb2ab0
into
DataDog:branch-52
30 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cherry-picks apache#21104