Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/getting-started/concepts/feast-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Feast supports the following categories of data types:
- **UUID types**: `Uuid` and `TimeUuid` for universally unique identifiers. Stored as strings at the proto level but deserialized to `uuid.UUID` objects in Python.
- **Array types**: ordered lists of any primitive type, e.g. `Array(Int64)`, `Array(String)`, `Array(Uuid)`.
- **Set types**: unordered collections of unique values for any primitive type, e.g. `Set(String)`, `Set(Int64)`. Set types are not inferred by any backend and must be explicitly declared. They are best suited for online serving use cases.
- **Map types**: dictionary-like structures with string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`.
- **Map types**: dictionary-like structures. `Map` has string keys and values that can be any supported Feast type (including nested maps), e.g. `Map`, `Array(Map)`. `ScalarMap` has non-string scalar keys (int, float, bool, UUID, Decimal, bytes, datetime) — Feast infers `ScalarMap` automatically when the first key is not a string. `ScalarMap` must be explicitly declared in schema and is not inferred by any backend.
- **JSON type**: opaque JSON data stored as a string at the proto level but semantically distinct from `String` — backends use native JSON types (`jsonb`, `VARIANT`, etc.), e.g. `Json`, `Array(Json)`.
- **Struct type**: schema-aware structured type with named, typed fields. Unlike `Map` (which is schema-free), a `Struct` declares its field names and their types, enabling schema validation, e.g. `Struct({"name": String, "age": Int32})`.

Expand Down Expand Up @@ -41,8 +41,8 @@ Map, JSON, and Struct types are supported across all major Feast backends:
| Spark | `struct<...>` | `Struct` |
| Spark | `array<struct<...>>` | `Array(Struct(...))` |
| MSSQL | `nvarchar(max)` | `Map`, `Json`, `Struct` |
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct` |
| Redis | Proto bytes | `Map`, `Json`, `Struct` |
| DynamoDB | Proto bytes | `Map`, `Json`, `Struct`, `ScalarMap` |
| Redis | Proto bytes | `Map`, `Json`, `Struct`, `ScalarMap` |
| Milvus | `VARCHAR` (serialized) | `Map`, `Json`, `Struct` |

**Note**: When the backend native type is ambiguous (e.g., `jsonb` could be `Map`, `Json`, or `Struct`), the **schema-declared Feast type takes precedence**. The backend-to-Feast type mappings above are only used for schema inference when no explicit type is provided.
Expand Down
54 changes: 49 additions & 5 deletions docs/reference/type-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,13 @@ Map types allow storing dictionary-like data structures:
|------------|-------------|-------------|
| `Map` | `Dict[str, Any]` | Dictionary with string keys and values of any supported Feast type (including nested maps) |
| `Array(Map)` | `List[Dict[str, Any]]` | List of dictionaries |
| `ScalarMap` | `Dict[Any, Any]` | Dictionary with non-string scalar keys (int, float, bool, UUID, Decimal, bytes, datetime) and values of any supported Feast type |

**Note:** Map keys must always be strings. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.
**Note:** `Map` keys must always be strings. `ScalarMap` supports non-string scalar keys — Feast infers `ScalarMap` automatically when the first key of a dict is not a string. Map values can be any supported Feast type, including primitives, arrays, or nested maps at the proto level. However, the PyArrow representation is `map<string, string>`, which means backends that rely on PyArrow schemas (e.g., during materialization) treat Map as string-to-string.

{% hint style="warning" %}
`ScalarMap` is **not** inferred from any backend schema. You must declare it explicitly in your feature view schema. It is best suited for online serving use cases where the online store serializes proto bytes directly (e.g., Redis, DynamoDB, SQLite).
{% endhint %}

**Backend support for Map:**

Expand All @@ -129,7 +134,7 @@ Map types allow storing dictionary-like data structures:
| Spark | `map<string,string>` | `map<>` → `Map`, `array<map<>>` → `Array(Map)` |
| Athena | `map` | Inferred as `Map` |
| MSSQL | `nvarchar(max)` | Serialized as string |
| DynamoDB / Redis | Proto bytes | Full proto Map support |
| DynamoDB / Redis | Proto bytes | Full proto Map and ScalarMap support |

### JSON Type

Expand Down Expand Up @@ -197,7 +202,7 @@ from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import (
Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp,
Uuid, TimeUuid, Decimal, Array, Set, Map, Json, Struct
Uuid, TimeUuid, Decimal, Array, Set, Map, ScalarMap, Json, Struct
)

# Define a data source
Expand Down Expand Up @@ -257,6 +262,7 @@ user_features = FeatureView(
Field(name="user_preferences", dtype=Map),
Field(name="metadata", dtype=Map),
Field(name="activity_log", dtype=Array(Map)),
Field(name="event_counts", dtype=ScalarMap), # non-string keys, e.g. {1001: 5, 1002: 12}

# Nested collection types
Field(name="weekly_scores", dtype=Array(Array(Float64))),
Expand Down Expand Up @@ -383,7 +389,7 @@ Field(name="grouped_tags", dtype=Array(Set(Array(String))))
Maps can store complex nested data structures:

```python
# Simple map
# Simple map (string keys)
user_preferences = {
"theme": "dark",
"language": "en",
Expand Down Expand Up @@ -411,6 +417,44 @@ activity_log = [
]
```

### ScalarMap Type Usage Examples

`ScalarMap` supports non-string keys. Feast infers it automatically when the first dict key is not a string:

```python
import uuid
import decimal

# Integer keys — e.g., category ID → item count
event_counts = {1001: 5, 1002: 12, 1003: 0}

# UUID keys — e.g., session ID → score
import uuid
session_scores = {
uuid.UUID("6ba7b810-9dad-11d1-80b4-00c04fd430c8"): 0.95,
uuid.UUID("a8098c1a-f86e-11da-bd1a-00112444be1e"): 0.87,
}

# Decimal keys — e.g., price bucket → product name
price_tier = {
decimal.Decimal("9.99"): "budget",
decimal.Decimal("49.99"): "standard",
decimal.Decimal("99.99"): "premium",
}

# Type inference: Feast automatically picks SCALAR_MAP when the key is non-string
from feast.type_map import python_type_to_feast_value_type
from feast.value_type import ValueType

python_type_to_feast_value_type({1: "a"}) # → ValueType.SCALAR_MAP
python_type_to_feast_value_type({"a": 1}) # → ValueType.MAP
python_type_to_feast_value_type({}) # → ValueType.MAP (empty dict defaults to MAP)
```

{% hint style="warning" %}
`ScalarMap` must be **explicitly declared** in your feature view schema — it is never inferred from backend type schemas. It is best suited for online serving via stores that use proto byte serialization (e.g., Redis, DynamoDB, SQLite). Materialization paths that use PyArrow (e.g., BigQuery, Snowflake, Redshift, Spark) do not have native `ScalarMap` support.
{% endhint %}

### JSON Type Usage Examples

Feast's `Json` type stores values as JSON strings at the proto level. You can pass either a
Expand Down Expand Up @@ -461,7 +505,7 @@ Each of these columns must be associated with a Feast type, which requires conve
* `source_datatype_to_feast_value_type` calls the appropriate method in `type_map.py`. For example, if a `SnowflakeSource` is being examined, `snowflake_python_type_to_feast_value_type` from `type_map.py` will be called.

{% hint style="info" %}
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `Decimal`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
**Types that cannot be inferred:** `Set`, `Json`, `Struct`, `Decimal`, `ScalarMap`, `PdfBytes`, and `ImageBytes` types are never inferred from backend schemas. If you use these types, you must declare them explicitly in your feature view schema.
{% endhint %}

### Materialization
Expand Down
29 changes: 29 additions & 0 deletions protos/feast/types/Value.proto
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ message ValueType {
DECIMAL = 44;
DECIMAL_LIST = 45;
DECIMAL_SET = 46;
SCALAR_MAP = 47;
}
}

Expand Down Expand Up @@ -118,6 +119,7 @@ message Value {
string decimal_val = 44;
StringList decimal_list_val = 45;
StringSet decimal_set_val = 46;
ScalarMap scalar_map_val = 47;
}
}

Expand Down Expand Up @@ -194,3 +196,30 @@ message MapList {
message RepeatedValue {
repeated Value val = 1;
}

// Map key for maps with non-string keys.
// Excludes string (handled by Map) and all collection types (not valid as keys).
message MapKey {
oneof key {
int32 int32_key = 1;
int64 int64_key = 2;
float float_key = 3;
double double_key = 4;
bool bool_key = 5;
int64 unix_timestamp_key = 6;
bytes bytes_key = 7;
string uuid_key = 8;
string time_uuid_key = 9;
string decimal_key = 10;
}
}

message ScalarMapEntry {
MapKey key = 1;
Value value = 2;
}

// Map with non-string keys. For string-keyed maps use Map.
message ScalarMap {
repeated ScalarMapEntry val = 1;
}
65 changes: 35 additions & 30 deletions sdk/python/feast/protos/feast/core/Aggregation_pb2.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -2,44 +2,49 @@
@generated by mypy-protobuf. Do not edit manually!
isort:skip_file
"""
import builtins
import google.protobuf.descriptor
import google.protobuf.duration_pb2
import google.protobuf.message

from google.protobuf import descriptor as _descriptor
from google.protobuf import duration_pb2 as _duration_pb2
from google.protobuf import message as _message
import builtins as _builtins
import sys
import typing as _typing

if sys.version_info >= (3, 8):
import typing as typing_extensions
if sys.version_info >= (3, 10):
from typing import TypeAlias as _TypeAlias
else:
import typing_extensions
from typing_extensions import TypeAlias as _TypeAlias

DESCRIPTOR: google.protobuf.descriptor.FileDescriptor
DESCRIPTOR: _descriptor.FileDescriptor

class Aggregation(google.protobuf.message.Message):
DESCRIPTOR: google.protobuf.descriptor.Descriptor
@_typing.final
class Aggregation(_message.Message):
DESCRIPTOR: _descriptor.Descriptor

COLUMN_FIELD_NUMBER: builtins.int
FUNCTION_FIELD_NUMBER: builtins.int
TIME_WINDOW_FIELD_NUMBER: builtins.int
SLIDE_INTERVAL_FIELD_NUMBER: builtins.int
NAME_FIELD_NUMBER: builtins.int
column: builtins.str
function: builtins.str
@property
def time_window(self) -> google.protobuf.duration_pb2.Duration: ...
@property
def slide_interval(self) -> google.protobuf.duration_pb2.Duration: ...
name: builtins.str
COLUMN_FIELD_NUMBER: _builtins.int
FUNCTION_FIELD_NUMBER: _builtins.int
TIME_WINDOW_FIELD_NUMBER: _builtins.int
SLIDE_INTERVAL_FIELD_NUMBER: _builtins.int
NAME_FIELD_NUMBER: _builtins.int
column: _builtins.str
function: _builtins.str
name: _builtins.str
@_builtins.property
def time_window(self) -> _duration_pb2.Duration: ...
@_builtins.property
def slide_interval(self) -> _duration_pb2.Duration: ...
def __init__(
self,
*,
column: builtins.str = ...,
function: builtins.str = ...,
time_window: google.protobuf.duration_pb2.Duration | None = ...,
slide_interval: google.protobuf.duration_pb2.Duration | None = ...,
name: builtins.str = ...,
column: _builtins.str = ...,
function: _builtins.str = ...,
time_window: _duration_pb2.Duration | None = ...,
slide_interval: _duration_pb2.Duration | None = ...,
name: _builtins.str = ...,
) -> None: ...
def HasField(self, field_name: typing_extensions.Literal["slide_interval", b"slide_interval", "time_window", b"time_window"]) -> builtins.bool: ...
def ClearField(self, field_name: typing_extensions.Literal["column", b"column", "function", b"function", "name", b"name", "slide_interval", b"slide_interval", "time_window", b"time_window"]) -> None: ...
_HasFieldArgType: _TypeAlias = _typing.Literal["slide_interval", b"slide_interval", "time_window", b"time_window"] # noqa: Y015
def HasField(self, field_name: _HasFieldArgType) -> _builtins.bool: ...
_ClearFieldArgType: _TypeAlias = _typing.Literal["column", b"column", "function", b"function", "name", b"name", "slide_interval", b"slide_interval", "time_window", b"time_window"] # noqa: Y015
def ClearField(self, field_name: _ClearFieldArgType) -> None: ...

global___Aggregation = Aggregation
Global___Aggregation: _TypeAlias = Aggregation # noqa: Y015
Loading
Loading