fix: Handle numpy.ndarray from Arrow/Athena in Array(String) materialisation#6388
Open
Abhishek8108 wants to merge 1 commit intofeast-dev:masterfrom
Open
Conversation
…isation Athena (and other Arrow-backed offline stores) deserialises Array(String) feature columns as numpy.ndarray with object dtype rather than plain Python lists. Two code paths in type_map.py did not handle this: 1. _validate_collection_item_types iterated over the ndarray directly. Nullable Arrow columns can embed None elements, and type(None) is not in the valid_types set for STRING_LIST ([np.str_, str]), causing TypeError. 2. The generic list conversion path in _convert_list_values_to_proto passed the raw ndarray to StringList(val=...). Protobuf rejects non-list inputs with TypeError: bad argument type for built-in operation. Fix: - Coerce ndarray to a plain Python list via .tolist() before type validation, and skip None elements (nullable elements cannot be held in protobuf fixed-type lists and are stripped). - In the generic conversion path, apply the same coercion so protobuf always receives a plain list. Adds four unit tests covering: plain ndarray, ndarray with None elements, empty ndarray, and mixed None/ndarray rows. Fixes feast-dev#6325 Signed-off-by: Abhishek8108 <87538407+Abhishek8108@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #6325
Materialising feature views with
Array(String)columns via the Athena offline store fails withTypeErrororValueError. Arrow/Athena deserialises these columns asnumpy.ndarraywithobjectdtype instead of plain Python lists, and two code paths intype_map.pydo not handle this.Root causes
1.
_validate_collection_item_types—TypeErroronNoneelements inside ndarraysNullable Arrow columns embed
Noneelements inside the ndarray. The validator iterates over the ndarray directly and checkstype(item) in valid_types. ForSTRING_LIST,valid_types = [np.str_, str], andtype(None)is not in that set, so it raisesTypeError: bad argument type for built-in operation.2.
_convert_list_values_to_protogeneric path —TypeErrorfrom protobufThe generic list conversion calls
StringList(val=value)wherevalueis anumpy.ndarray. Protobuf rejects non-list inputs withTypeError: bad argument type for built-in operation.Fix
_validate_collection_item_types: coerce ndarray to a plain Python list via.tolist()before element-level checks, and skipNoneelements (nullable elements are valid in Arrow columns)._convert_list_values_to_proto: apply the same.tolist()coercion so protobuf always receives a plain Python list.Noneelements are stripped (protobuf fixed-type lists cannot hold nulls).The fix covers all three failure modes from #6325:
np.array(['a', 'b', 'c'])→['a', 'b', 'c']✓np.array(['a', None, 'c'])→['a', 'c'](None stripped) ✓np.array([])→[]✓Changes
sdk/python/feast/type_map.py— two targeted edits, no API surface changesdk/python/tests/unit/test_type_map.py— four new unit tests inTestAthenaArrayStringConversionTest plan
TestAthenaArrayStringConversion::test_string_list_from_ndarray— plain ndarray converts correctlyTestAthenaArrayStringConversion::test_string_list_from_ndarray_with_none_elements— None elements stripped, no TypeErrorTestAthenaArrayStringConversion::test_string_list_from_empty_ndarray— empty ndarray yields empty list, no ValueErrorTestAthenaArrayStringConversion::test_string_list_mixed_null_and_ndarray_rows— mix of null rows and ndarray rowsruff format --checkandruff check— clean on both modified filestest_type_map.pytests unaffected