Skip to content

Commit 728aa2e

Browse files
authored
feat: Support non-string map key types (#6382) (#6383)
1 parent bedc0ef commit 728aa2e

41 files changed

Lines changed: 5776 additions & 4538 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/getting-started/concepts/feast-types.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Feast supports the following categories of data types:
1212
- **UUID types**: `Uuid` and `TimeUuid` for universally unique identifiers. Stored as strings at the proto level but deserialized to `uuid.UUID` objects in Python.
1313
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`, `Array(Uuid)`.
1414
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. Set types are not inferred by any backend and must be explicitly declared. They are best suited for online serving use cases.
15-
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
15+
- **Map types**: dictionary-like structures. `Map` has string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`. `ScalarMap` has non-string scalar keys (int, float, bool, UUID, Decimal, bytes, datetime) — Feast infers `ScalarMap` automatically when the first key is not a string. `ScalarMap` must be explicitly declared in schema and is not inferred by any backend.
1616
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
1717
- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.
1818

@@ -41,8 +41,8 @@ Map, JSON, and Struct types are supported across all major Feast backends:
4141
| Spark | `struct<...>` | `Struct` |
4242
| Spark | `array<struct<...>>` | `Array(Struct(...))` |
4343
| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` |
44-
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` |
45-
| Redis | Proto bytes | `Map`, `Json`, `Struct` |
44+
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct`, `ScalarMap` |
45+
| Redis | Proto bytes | `Map`, `Json`, `Struct`, `ScalarMap` |
4646
| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` |
4747

4848
**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided.

docs/reference/type-system.md

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,13 @@ Map types allow storing dictionary-like data structures:
116116
|------------|-------------|-------------|
117117
| `Map` | `Dict[str, Any]` | Dictionary with string keys and values of any supported Feast type (including nested maps) |
118118
| `Array(Map)` | `List[Dict[str, Any]]` | List of dictionaries |
119+
| `ScalarMap` | `Dict[Any, Any]` | Dictionary with non-string scalar keys (int, float, bool, UUID, Decimal, bytes, datetime) and values of any supported Feast type |
119120

120-
**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.
121+
**Note:** `Map` keys must always be strings. `ScalarMap` supports non-string scalar keys — Feast infers `ScalarMap` automatically when the first key of a dict is not a string. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.
122+
123+
{% hint style="warning" %}
124+
`ScalarMap` is **not** inferred from any backend schema. You must declare it explicitly in your feature view schema. It is best suited for online serving use cases where the online store serializes proto bytes directly (e.g., Redis, DynamoDB, SQLite).
125+
{% endhint %}
121126

122127
**Backend support for Map:**
123128

@@ -129,7 +134,7 @@ Map types allow storing dictionary-like data structures:
129134
| Spark | `map<string,string>` | `map<>``Map`, `array<map<>>``Array(Map)` |
130135
| Athena | `map` | Inferred as `Map` |
131136
| MSSQL | `nvarchar(max)` | Serialized as string |
132-
| DynamoDB / Redis | Proto bytes | Full proto Map support |
137+
| DynamoDB / Redis | Proto bytes | Full proto Map and ScalarMap support |
133138

134139
### JSON Type
135140

@@ -197,7 +202,7 @@ from datetime import timedelta
197202
from feast import Entity, FeatureView, Field, FileSource
198203
from feast.types import (
199204
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
200-
Uuid, TimeUuid, Decimal, Array, Set, Map, Json, Struct
205+
Uuid, TimeUuid, Decimal, Array, Set, Map, ScalarMap, Json, Struct
201206
)
202207

203208
# Define a data source
@@ -257,6 +262,7 @@ user_features = FeatureView(
257262
Field(name="user_preferences", dtype=Map),
258263
Field(name="metadata", dtype=Map),
259264
Field(name="activity_log", dtype=Array(Map)),
265+
Field(name="event_counts", dtype=ScalarMap), # non-string keys, e.g. {1001: 5, 1002: 12}
260266

261267
# Nested collection types
262268
Field(name="weekly_scores", dtype=Array(Array(Float64))),
@@ -383,7 +389,7 @@ Field(name="grouped_tags", dtype=Array(Set(Array(String))))
383389
Maps can store complex nested data structures:
384390

385391
```python
386-
# Simple map
392+
# Simple map (string keys)
387393
user_preferences = {
388394
"theme": "dark",
389395
"language": "en",
@@ -411,6 +417,44 @@ activity_log = [
411417
]
412418
```
413419

420+
### ScalarMap Type Usage Examples
421+
422+
`ScalarMap` supports non-string keys. Feast infers it automatically when the first dict key is not a string:
423+
424+
```python
425+
import uuid
426+
import decimal
427+
428+
# Integer keys — e.g., category ID → item count
429+
event_counts = {1001: 5, 1002: 12, 1003: 0}
430+
431+
# UUID keys — e.g., session ID → score
432+
import uuid
433+
session_scores = {
434+
uuid.UUID("6ba7b810-9dad-11d1-80b4-00c04fd430c8"): 0.95,
435+
uuid.UUID("a8098c1a-f86e-11da-bd1a-00112444be1e"): 0.87,
436+
}
437+
438+
# Decimal keys — e.g., price bucket → product name
439+
price_tier = {
440+
decimal.Decimal("9.99"): "budget",
441+
decimal.Decimal("49.99"): "standard",
442+
decimal.Decimal("99.99"): "premium",
443+
}
444+
445+
# Type inference: Feast automatically picks SCALAR_MAP when the key is non-string
446+
from feast.type_map import python_type_to_feast_value_type
447+
from feast.value_type import ValueType
448+
449+
python_type_to_feast_value_type({1: "a"}) # → ValueType.SCALAR_MAP
450+
python_type_to_feast_value_type({"a": 1}) # → ValueType.MAP
451+
python_type_to_feast_value_type({}) # → ValueType.MAP (empty dict defaults to MAP)
452+
```
453+
454+
{% hint style="warning" %}
455+
`ScalarMap` must be **explicitly declared** in your feature view schema — it is never inferred from backend type schemas. It is best suited for online serving via stores that use proto byte serialization (e.g., Redis, DynamoDB, SQLite). Materialization paths that use PyArrow (e.g., BigQuery, Snowflake, Redshift, Spark) do not have native `ScalarMap` support.
456+
{% endhint %}
457+
414458
### JSON Type Usage Examples
415459

416460
Feast's `Json` type stores values as JSON strings at the proto level. You can pass either a
@@ -461,7 +505,7 @@ Each of these columns must be associated with a Feast type, which requires conve
461505
* `source_datatype_to_feast_value_type` calls the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.
462506

463507
{% hint style="info" %}
464-
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `Decimal`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
508+
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `Decimal`, `ScalarMap`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
465509
{% endhint %}
466510

467511
### Materialization

protos/feast/types/Value.proto

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ message ValueType {
6868
DECIMAL = 44;
6969
DECIMAL_LIST = 45;
7070
DECIMAL_SET = 46;
71+
SCALAR_MAP = 47;
7172
}
7273
}
7374

@@ -118,6 +119,7 @@ message Value {
118119
string decimal_val = 44;
119120
StringList decimal_list_val = 45;
120121
StringSet decimal_set_val = 46;
122+
ScalarMap scalar_map_val = 47;
121123
}
122124
}
123125

@@ -194,3 +196,30 @@ message MapList {
194196
message RepeatedValue {
195197
repeated Value val = 1;
196198
}
199+
200+
// Map key for maps with non-string keys.
201+
// Excludes string (handled by Map) and all collection types (not valid as keys).
202+
message MapKey {
203+
oneof key {
204+
int32 int32_key = 1;
205+
int64 int64_key = 2;
206+
float float_key = 3;
207+
double double_key = 4;
208+
bool bool_key = 5;
209+
int64 unix_timestamp_key = 6;
210+
bytes bytes_key = 7;
211+
string uuid_key = 8;
212+
string time_uuid_key = 9;
213+
string decimal_key = 10;
214+
}
215+
}
216+
217+
message ScalarMapEntry {
218+
MapKey key = 1;
219+
Value value = 2;
220+
}
221+
222+
// Map with non-string keys. For string-keyed maps use Map.
223+
message ScalarMap {
224+
repeated ScalarMapEntry val = 1;
225+
}

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ dependencies = [
3939
"uvicorn-worker",
4040
"gunicorn; platform_system != 'Windows'",
4141
"dask[dataframe]>=2024.2.1",
42-
"prometheus_client",
42+
"prometheus_client>=0.20.0,<0.25.0",
4343
"psutil",
4444
"bigtree>=0.19.2",
4545
"pyjwt",

sdk/python/feast/protos/feast/core/Aggregation_pb2.pyi

Lines changed: 35 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,44 +2,49 @@
22
@generated by mypy-protobuf. Do not edit manually!
33
isort:skip_file
44
"""
5-
import builtins
6-
import google.protobuf.descriptor
7-
import google.protobuf.duration_pb2
8-
import google.protobuf.message
5+
6+
from google.protobuf import descriptor as _descriptor
7+
from google.protobuf import duration_pb2 as _duration_pb2
8+
from google.protobuf import message as _message
9+
import builtins as _builtins
910
import sys
11+
import typing as _typing
1012

11-
if sys.version_info >= (3, 8):
12-
import typing as typing_extensions
13+
if sys.version_info >= (3, 10):
14+
from typing import TypeAlias as _TypeAlias
1315
else:
14-
import typing_extensions
16+
from typing_extensions import TypeAlias as _TypeAlias
1517

16-
DESCRIPTOR: google.protobuf.descriptor.FileDescriptor
18+
DESCRIPTOR: _descriptor.FileDescriptor
1719

18-
class Aggregation(google.protobuf.message.Message):
19-
DESCRIPTOR: google.protobuf.descriptor.Descriptor
20+
@_typing.final
21+
class Aggregation(_message.Message):
22+
DESCRIPTOR: _descriptor.Descriptor
2023

21-
COLUMN_FIELD_NUMBER: builtins.int
22-
FUNCTION_FIELD_NUMBER: builtins.int
23-
TIME_WINDOW_FIELD_NUMBER: builtins.int
24-
SLIDE_INTERVAL_FIELD_NUMBER: builtins.int
25-
NAME_FIELD_NUMBER: builtins.int
26-
column: builtins.str
27-
function: builtins.str
28-
@property
29-
def time_window(self) -> google.protobuf.duration_pb2.Duration: ...
30-
@property
31-
def slide_interval(self) -> google.protobuf.duration_pb2.Duration: ...
32-
name: builtins.str
24+
COLUMN_FIELD_NUMBER: _builtins.int
25+
FUNCTION_FIELD_NUMBER: _builtins.int
26+
TIME_WINDOW_FIELD_NUMBER: _builtins.int
27+
SLIDE_INTERVAL_FIELD_NUMBER: _builtins.int
28+
NAME_FIELD_NUMBER: _builtins.int
29+
column: _builtins.str
30+
function: _builtins.str
31+
name: _builtins.str
32+
@_builtins.property
33+
def time_window(self) -> _duration_pb2.Duration: ...
34+
@_builtins.property
35+
def slide_interval(self) -> _duration_pb2.Duration: ...
3336
def __init__(
3437
self,
3538
*,
36-
column: builtins.str = ...,
37-
function: builtins.str = ...,
38-
time_window: google.protobuf.duration_pb2.Duration | None = ...,
39-
slide_interval: google.protobuf.duration_pb2.Duration | None = ...,
40-
name: builtins.str = ...,
39+
column: _builtins.str = ...,
40+
function: _builtins.str = ...,
41+
time_window: _duration_pb2.Duration | None = ...,
42+
slide_interval: _duration_pb2.Duration | None = ...,
43+
name: _builtins.str = ...,
4144
) -> None: ...
42-
def HasField(self, field_name: typing_extensions.Literal["slide_interval", b"slide_interval", "time_window", b"time_window"]) -> builtins.bool: ...
43-
def ClearField(self, field_name: typing_extensions.Literal["column", b"column", "function", b"function", "name", b"name", "slide_interval", b"slide_interval", "time_window", b"time_window"]) -> None: ...
45+
_HasFieldArgType: _TypeAlias = _typing.Literal["slide_interval", b"slide_interval", "time_window", b"time_window"] # noqa: Y015
46+
def HasField(self, field_name: _HasFieldArgType) -> _builtins.bool: ...
47+
_ClearFieldArgType: _TypeAlias = _typing.Literal["column", b"column", "function", b"function", "name", b"name", "slide_interval", b"slide_interval", "time_window", b"time_window"] # noqa: Y015
48+
def ClearField(self, field_name: _ClearFieldArgType) -> None: ...
4449

45-
global___Aggregation = Aggregation
50+
Global___Aggregation: _TypeAlias = Aggregation # noqa: Y015

0 commit comments

Comments
 (0)