Spotted in https://github.com/narwhals-dev/narwhals/actions/runs/24605500399/job/71950954205?pr=3552
Repro:
def test_inner_join_nan(constructor: Constructor) -> None:
data = {
"a": [0, 0, 0],
"b": [0, 0, 0],
"c": [0.0, 0.0, float("nan")],
}
join_cols = ["a", "c"]
frame = from_native_lazy(constructor(data))
result = frame.join(frame, on=join_cols, how="inner").sort("c", nulls_last=True).collect()
zero_cols = ("a", "b", "b_right")
for col in zero_cols:
assert (result.get_column(col)==0).all()
assert (result.get_column("c").is_nan().sum())==1
# NOTE: polars results in the following:
"""
expected = {
"a": [0, 0, 0, 0, 0],
"b": [0, 0, 0, 0, 0],
"c": [0., 0., 0., 0., float("nan")],
"b_right": [0, 0, 0, 0, 0],
}
How can we sort the data to use:
assert_equal_data(result, expected)
"""
will result in failures for (any) pandas like, dask, and duckdb
$ pytest tests/frame/join_test.py -k inner_join_nan --all-cpu-constructors
FAILED tests/frame/join_test.py::test_inner_join_nan[pandas] - AssertionError: assert np.int64(0) == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[pandas[nullable]] - AssertionError: assert np.int64(0) == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[modin[pyarrow]] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[pandas[pyarrow]] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[duckdb] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[dask] - AssertionError: assert np.int64(0) == 1
I think for:
Spotted in https://github.com/narwhals-dev/narwhals/actions/runs/24605500399/job/71950954205?pr=3552
Repro:
will result in failures for (any) pandas like, dask, and duckdb
$ pytest tests/frame/join_test.py -k inner_join_nan --all-cpu-constructorsI think for:
float('nan')as value in join for duckdb #3555 )