Skip to content

fix: Handle numpy.ndarray from Arrow/Athena in Array(String) materialisation#6388

Open
Abhishek8108 wants to merge 1 commit intofeast-dev:masterfrom
Abhishek8108:fix/athena-array-string-numpy-ndarray-proto-conversion
Open

fix: Handle numpy.ndarray from Arrow/Athena in Array(String) materialisation#6388
Abhishek8108 wants to merge 1 commit intofeast-dev:masterfrom
Abhishek8108:fix/athena-array-string-numpy-ndarray-proto-conversion

Conversation

@Abhishek8108
Copy link
Copy Markdown
Contributor

Summary

Closes #6325

Materialising feature views with Array(String) columns via the Athena offline store fails with TypeError or ValueError. Arrow/Athena deserialises these columns as numpy.ndarray with object dtype instead of plain Python lists, and two code paths in type_map.py do not handle this.

Root causes

1. _validate_collection_item_typesTypeError on None elements inside ndarrays

Nullable Arrow columns embed None elements inside the ndarray. The validator iterates over the ndarray directly and checks type(item) in valid_types. For STRING_LIST, valid_types = [np.str_, str], and type(None) is not in that set, so it raises TypeError: bad argument type for built-in operation.

2. _convert_list_values_to_proto generic path — TypeError from protobuf

The generic list conversion calls StringList(val=value) where value is a numpy.ndarray. Protobuf rejects non-list inputs with TypeError: bad argument type for built-in operation.

Fix

  • In _validate_collection_item_types: coerce ndarray to a plain Python list via .tolist() before element-level checks, and skip None elements (nullable elements are valid in Arrow columns).
  • In the generic conversion path of _convert_list_values_to_proto: apply the same .tolist() coercion so protobuf always receives a plain Python list. None elements are stripped (protobuf fixed-type lists cannot hold nulls).

The fix covers all three failure modes from #6325:

  • np.array(['a', 'b', 'c'])['a', 'b', 'c']
  • np.array(['a', None, 'c'])['a', 'c'] (None stripped) ✓
  • np.array([])[]

Changes

  • sdk/python/feast/type_map.py — two targeted edits, no API surface change
  • sdk/python/tests/unit/test_type_map.py — four new unit tests in TestAthenaArrayStringConversion

Test plan

  • TestAthenaArrayStringConversion::test_string_list_from_ndarray — plain ndarray converts correctly
  • TestAthenaArrayStringConversion::test_string_list_from_ndarray_with_none_elements — None elements stripped, no TypeError
  • TestAthenaArrayStringConversion::test_string_list_from_empty_ndarray — empty ndarray yields empty list, no ValueError
  • TestAthenaArrayStringConversion::test_string_list_mixed_null_and_ndarray_rows — mix of null rows and ndarray rows
  • ruff format --check and ruff check — clean on both modified files
  • Existing test_type_map.py tests unaffected

…isation

Athena (and other Arrow-backed offline stores) deserialises Array(String)
feature columns as numpy.ndarray with object dtype rather than plain Python
lists.  Two code paths in type_map.py did not handle this:

1. _validate_collection_item_types iterated over the ndarray directly.
   Nullable Arrow columns can embed None elements, and type(None) is not in
   the valid_types set for STRING_LIST ([np.str_, str]), causing TypeError.

2. The generic list conversion path in _convert_list_values_to_proto passed
   the raw ndarray to StringList(val=...).  Protobuf rejects non-list inputs
   with TypeError: bad argument type for built-in operation.

Fix:
- Coerce ndarray to a plain Python list via .tolist() before type validation,
  and skip None elements (nullable elements cannot be held in protobuf
  fixed-type lists and are stripped).
- In the generic conversion path, apply the same coercion so protobuf always
  receives a plain list.

Adds four unit tests covering: plain ndarray, ndarray with None elements,
empty ndarray, and mixed None/ndarray rows.

Fixes feast-dev#6325

Signed-off-by: Abhishek8108 <87538407+Abhishek8108@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: TypeError / ValueError when materializing Array(String) feature views with Athena offline store

1 participant