Conversation
Also implemented container_size to pre-alloc containers
Previously they were holding a Vec<u8> which required an extra alloc
Also fix bug with JSON deserialization where blobs weren't base64 decoded
Discovered quite a few bugs when applying to real SDK models. Will fix separately
This comment was marked as resolved.
This comment was marked as resolved.
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
ysaito1001
left a comment
There was a problem hiding this comment.
Great work! Leaving intermediate comments. Circling back for another round of review.
| headers: Vec<(String, String)>, | ||
| query_params: Vec<(String, String)>, | ||
| labels: Vec<(String, String)>, |
There was a problem hiding this comment.
Before release, do we need to revisit use of Strings here as opposed Cow or &'a str?
There was a problem hiding this comment.
Gave this a shot today. The ShapeSerializer trait methods take schema: &Schema with an anonymous lifetime that can't be tied to 'a, making constructing Cow<'a, str> impractical without significant trait changes. The allocation cost for these short strings is negligible is fairly small, so I don't know that it is worth it. I'll sit on it over the weekend and try to come up with a better way to do it.
There was a problem hiding this comment.
Yeah, no worries. If profiling later shows String needs to be revisited, we could try a short string optimization (like this one), assuming these strings are relatively short, e.g. us-east-1, 1234567890, and application/json.
| /// Writes a null value (for sparse collections). | ||
| fn write_null(&mut self, schema: &Schema) -> Result<(), SerdeError>; | ||
|
|
||
| // --- Collection helper methods --- |
There was a problem hiding this comment.
Is there not a better way to do this?
This is an open-ended pattern. Today it's string_list, blob_list, integer_list, long_list, string_string_map, before long it's timestamp_list, blob_blob_map, string_integer_map, etc. The trait surface grows with each common collection pattern. Wonder if this could be split into extension trait methods at least to keep it off core serializer/deserializer.
There was a problem hiding this comment.
This was kind of a side quest. It wasn't necessary, I had these all implemented in terms of the simpler operations. But these types kept popping up and having these helper methods cut down on code size substantially (20-30% if I remember correctly?). I'll try moving them into extension traits to keep it cleaner.
There was a problem hiding this comment.
Played around a bit with move these to an extension trait and it doesn't really work out.
The primary issue I hit is that these methods are called inside serialize_members(&self, serializer: &mut dyn ShapeSerializer), so every generated struct's serialization goes through a trait object. Extension trait methods aren't available on dyn Trait, and there is no support for dyn TraitA + TraitB for non-auto traits.
The implementations need to be overridable on the core trait. The serializer helpers use the default impls today (JsonSerializer doesn't override them), but the corresponding deserializer helpers are overridden by JsonDeserializer to avoid per-element vtable dispatch — calling self.read_string() directly instead of going through &mut dyn ShapeDeserializer. This is measurable, the deserialization helpers are called in tight loops over every element in a list or map. Keeping them on the core trait preserves the ability for codec implementations to specialize.
Although technically open-ended since these collections aren't specified in the SEP we can treat this as a closed set. The 5 helpers (string_list, blob_list, integer_list, long_list, string_string_map) cover the collection patterns that appear most frequently in AWS models. They were chosen by analyzing which patterns generate the most boilerplate (ex: DynamoDB saw a 43% reduction in generated deserialize body lines 19,235 → 10,953). New collection patterns (ex: timestamp_list, string_integer_map) would use the generic write_list/read_list with closures. I'll add a doc comment making this explicit.
Alternatives I looked at:
- Extension trait with blanket impl: can't call through
&mut dyn ShapeSerializer - Supertrait (
trait ShapeSerializer: CollectionHelpers): still requires every implementor to
provide the methods, no different from having them on the core trait - Remove helpers entirely: +43% generated code size for DDB and likely similar for other collection-heavy models, and loses the deserializer override performance optimization
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
|
For personal code review aid to build mental model (take it with grain of salt) Smithy Model into Static SchemasCodec Selection ConfigurationThere are three levels where the protocol (and thus codec) gets selected: Level 1 — Service default (generated code, // Generated in sdk/dynamodb/src/config.rs by SchemaProtocolCustomization
let mut cfg = Layer::new("DynamoDB_20120810");
// Only set the default if the customer hasn't already set one
if _service_config.protocol().is_none() {
cfg.store_put(SharedClientProtocol::new(
AwsJsonRpcProtocol::aws_json_1_0("DynamoDB_20120810"),
));
}The codec is baked inside the protocol. // rust-runtime/aws-smithy-json/src/protocol/aws_json_rpc.rs
let codec = JsonCodec::new(
JsonCodecSettings::builder()
.use_json_name(false) // awsJson ignores @jsonName
.default_timestamp_format(EpochSeconds) // awsJson default
.build(),
);
HttpRpcProtocol::new(protocol_id, codec, "application/x-amz-json-1.0")Level 2 — Customer override via service config builder let conf = aws_sdk_dynamodb::config::Builder::new()
.protocol(AwsJsonRpcProtocol::aws_json_1_1("DynamoDB_20120810"))
.build();Because this stores it in the service config, the Level 1 check Level 3 — Customer override via SdkConfig let sdk_config = aws_config::defaults(BehaviorVersion::latest())
.protocol(AwsRestJsonProtocol::new())
.load()
.await;
let client = aws_sdk_dynamodb::Client::new(&sdk_config);Selection order: There's no separate "codec selection" — the codec is an implementation detail of the protocol. You pick a protocol, and the protocol knows which codec to use and how to configure it. Who decides to use HttpRpcProtocol? Can it be swapped?The pub struct AwsJsonRpcProtocol {
inner: HttpRpcProtocol<JsonCodec>, // ← hardcoded at the type level
target_prefix: String,
}
The architecture:
pub struct AwsRestJsonProtocol {
inner: HttpBindingProtocol<JsonCodec>, // ← different base: handles @httpHeader, @httpQuery, etc.
}A customer can't swap the inner base at runtime, but they can swap the entire protocol by implementing #[derive(Debug)]
struct MyCustomProtocol;
impl ClientProtocol for MyCustomProtocol {
fn serialize_request(&self, input, schema, endpoint, cfg) -> Result<Request, SerdeError> {
// Use MessagePack, Protobuf, carrier pigeon — whatever you want
}
fn deserialize_response(&self, response, schema, cfg) -> Result<Box<dyn ShapeDeserializer>, SerdeError> {
// ...
}
// ...
}
let client = aws_sdk_dynamodb::config::Builder::new()
.protocol(MyCustomProtocol)
.build();Runtime Serialization SequenceConcrete scenario: Inside Inside Back up the stack — after Runtime Deserialization SequenceConcrete scenario: DynamoDB PutItem response with awsJson1_0. Key difference from serialization: deserialization is a two-phase process. deserialize_with_response reads HTTP-bound members (headers, status code) directly from the response first, then delegates body-only parsing to the JsonDeserializer. This avoids runtime trait checks on every member — the generated code knows at compile time which members are HTTP-bound. Component Responsibility |
| headers: Vec<(String, String)>, | ||
| query_params: Vec<(String, String)>, | ||
| labels: Vec<(String, String)>, |
There was a problem hiding this comment.
Yeah, no worries. If profiling later shows String needs to be revisited, we could try a short string optimization (like this one), assuming these strings are relatively short, e.g. us-east-1, 1234567890, and application/json.
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
|
Failing CI tasks are all related to: Need to merge main to get the changes to fix that. Going to merge this back to the feature branch first so I don't have to pull the changes forward through a bunch of different branches |
Schema-Based Protocol Serialization and Deserialization
This PR implements the runtime protocol layer from the
Serialization and Schema Decoupling SEP,
building on the schema and codec foundations already merged to
feature/schema.It enables protocol-agnostic request serialization and response deserialization
driven by runtime
Schemaobjects and aClientProtocoltrait, replacing theper-shape/per-protocol code generation with a single generic path for
restJson1,awsJson1_0, andawsJson1_1protocols.What's implemented
Runtime crates (Rust)
ClientProtocoltrait (aws-smithy-schema/src/schema/protocol.rs) — thetop-level object-safe trait for serializing requests and deserializing
responses. Wraps as
SharedClientProtocol(viaArc<dyn ClientProtocol>) forstorage in
ConfigBag.HttpBindingProtocol<C>(
aws-smithy-schema/src/schema/http_protocol/binding.rs) — REST-styleprotocol implementation that routes members to HTTP headers, query params, URI
labels, or body based on HTTP binding traits (
@httpHeader,@httpQuery,@httpLabel,@httpPayload,@httpPrefixHeaders,@httpQueryParams).Includes
HttpBindingSerializerthat interceptsShapeSerializercalls andan
HttpBindingDeserializerfor responses.HttpRpcProtocol<C>(aws-smithy-schema/src/schema/http_protocol/rpc.rs) —RPC-style protocol that serializes everything to the body, ignoring HTTP
bindings.
AwsRestJsonProtocol(aws-smithy-json/src/protocol/aws_rest_json_1.rs) —thin wrapper constructing
HttpBindingProtocol<JsonCodec>withuse_json_name: trueandepoch-secondsdefault timestamps.AwsJsonRpcProtocol(aws-smithy-json/src/protocol/aws_json_rpc.rs) —unified type for both 1.0 and 1.1, wrapping
HttpRpcProtocol<JsonCodec>withuse_json_name: false. SetsX-Amz-Targetheader fromMetadatainConfigBag.FinishSerializertrait — separated fromShapeSerializerto preserve objectsafety, since
finish(self) -> Vec<u8>consumesself.ShapeDeserializermade object-safe —read_struct/read_list/read_mapchanged from generic
F: FnMut(T, ...)to&mut dyn FnMut(...). This enablescomposite deserializers (HTTP binding + body) to transparently delegate
without the consumer knowing the concrete type, which is essential for runtime
protocol swapping.
ShapeSerializer/ShapeDeserializer—write_string_list,read_string_list,read_blob_list,read_integer_list,read_long_list,read_string_string_map, etc. Thesereduce generated code size by ~43% for collection-heavy models (DynamoDB:
19,235 → 10,953 deserialize body lines).
deserialize_with_response— generated method on output types that readsHTTP-bound members (headers, status code, prefix headers) directly from the
response, then delegates body members to the
ShapeDeserializer. This avoidsthe overhead of checking HTTP binding traits at runtime for every body member.
serialize_bodyandsupports_http_bindingsonClientProtocol— for RESTprotocols that return
supports_http_bindings() == true,serialize_body()serializes only body members via the codec, returning a request that generated
code then populates with HTTP-bound members directly. Protocols that return
false(the default) cause generated code to fall back toserialize_request(), giving the protocol full control. This enables correctruntime protocol swapping.
SdkConfigand per-service config — customers canoverride the default protocol at runtime via
client.config().protocol(MyCustomProtocol). The service runtime plugin onlystores the default protocol if the customer hasn't set one, ensuring the
override takes effect.
Code generation (Kotlin)
SchemaDecoratorexpanded — storesSharedClientProtocolin both the serviceconfig layer and each operation's runtime plugin config layer. Adds
protocol()getter/setter to service config builders.SchemaSerdeAllowlist— controls which protocols use the schema-based pathexclusively (currently
restJson1,awsJson1_0,awsJson1_1). Services onthe allowlist generate no legacy
protocol_serdecode.RequestSerializerGenerator— for allowlisted protocols, generates aSerializeRequestimpl that loadsSharedClientProtocolfromConfigBagandcalls
protocol.serialize_request()orprotocol.serialize_body(). For RESTprotocols with HTTP bindings, generates direct header/query/label writes at
compile time instead of routing through
HttpBindingSerializerat runtime.ResponseDeserializerGenerator— generates schema-based deserialization thatcalls
protocol.deserialize_response()thenOutput::deserialize_with_response(). Error deserialization uses the protocolto deserialize error bodies, with error code dispatch matching the legacy
path.
SchemaGenerator— extended with union serialization/deserialization,synthetic member support (e.g.,
_request_idfromx-amzn-requestidheader),streaming member handling,
@httpPayloadblob/string support, anddeserialize_with_responsegeneration.OperationGenerator— addsINPUT_SCHEMAandOUTPUT_SCHEMAconstants toeach operation type.
DeserializeResponsetrait —deserialize_nonstreamingnow takes&ConfigBagso the deserializer can loadSharedClientProtocol.JSON deserializer optimizations
container_sizepre-scan — was scanning the entire JSON token streamto count elements before deserializing, effectively parsing data twice.
read_string,read_boolean,read_integer,read_float,read_structkey parsing, andread_mapkey parsing allreplaced
json_token_iter()iterator construction with direct byte-levelscanning.
parse_keyreturnsCow<'a, str>— avoids heap allocation for JSON keyswithout escape sequences (the common case).
read_struct—nullvalues are skipped before calling theconsumer, eliminating the need for per-field
is_null()checks in generatedcode.
JsonFieldMappermade non-locking — removedLazyLockoverhead from the@jsonNamereverse mapping.How this differs from the SEP
The SEP is written with Java idioms in mind. Key Rust-specific adaptations:
Object-safe
ClientProtocol— the SEP uses associated types forRequest/Response. We hardcodeaws_smithy_runtime_api::http::{Request, Response}because HTTP assumptionsare deeply baked into the SDK. The trait is
dyn-compatible, stored as aSharedClientProtocolcontaining anArc<dyn ClientProtocol>inConfigBag.Object-safe
ShapeDeserializer— the SEP's consumer pattern usesgenerics (
<T, F: FnMut(T, ...)>). We use&mut dyn FnMut(...)withexternal state capture instead, trading some monomorphization for the ability
to compose deserializers at runtime (HTTP binding wrapper → body
deserializer).
FinishSerializerseparated fromShapeSerializer—finish(self)consumes the serializer, which is incompatible with object safety. The
Codectrait requiresSerializer: ShapeSerializer + FinishSerializer, butprotocol code calls
finish()on the concrete type after using&mut dyn ShapeSerializerfor member writes.Hybrid codegen for REST HTTP bindings — the SEP envisions the protocol
handling all HTTP binding routing at runtime. For performance, when the
protocol reports
supports_http_bindings() == true, generated code usesserialize_body()for the payload and writes HTTP-bound members (headers,query params, URI labels) directly at compile time, avoiding per-member
runtime trait checks. When a customer swaps in a protocol that returns
false(e.g., an RPC protocol), generated code falls back toserialize_request(), giving the protocol full control over memberplacement. This preserves the performance optimization for the default
protocol while maintaining correct runtime protocol swapping.
deserialize_with_responseinstead ofHttpBindingDeserializer— ratherthan wrapping the body deserializer in an HTTP binding deserializer that
checks traits on every member, we generate a method that reads HTTP-bound
members directly from the response, then delegates body-only deserialization
to the codec. This eliminates the per-member overhead for responses.
No
ClientTransportabstraction — the SEP defines aClientTransportinterface. We skip this since HTTP is deeply embedded in the existing
orchestrator.
update_endpointtakes&Endpoint— the SEP takes a URI string. We takethe full
Endpointobject (which includes headers) and handleEndpointPrefixfrom theConfigBag, matching the existing SRA endpointresolution flow.
Benchmarks
Benchmarks were run on macOS arm64 with rustc 1.91.1. Two benchmark suites were
used:
DynamoDB-specific benchmarks (Criterion, aws-sdk-dynamodb crate):
mainschema-protocolStandardized cross-service benchmarks (custom harness,
SerdeBenchmarkTestGenerator):Generated from Smithy protocol test models for
awsJson1_0andrestJson1,covering S/M/L payloads with strings, binary, nested maps, shallow maps, and
mixed types. Latest results:
Per-benchmark highlights:
Key observations:
direct byte parsing optimizations in the JSON deserializer.
the trait interface is offset by the JSON parsing improvements.
schema path avoids the legacy per-shape deserialization functions.
many HTTP-bound response members, and the
deserialize_with_responsepath hasoverhead from header parsing that the legacy codegen avoided by inlining.
construction overhead has been eliminated by creating the protocol once at
service config time.
Binary size (aws-sdk-dynamodb):
The size increase comes from schema constants (274 statics),
serialize_membersmethods (263), and
deserializemethods (320) that now live on the typesthemselves. The legacy
protocol_serdemodule is eliminated (reduced to 8lines).
Build time (aws-sdk-dynamodb, clean builds):
Runtime protocol swapping integration tests
(
aws/sdk/integration-tests/dynamodb/tests/protocol-swap.rs):Three tests verify that protocols can be swapped at runtime via the
.protocol()config setter on a DynamoDB client:default_protocol_serializes_correctly— baseline: the defaultawsJson1_0protocol produces the correct
Content-Type,X-Amz-Target, and JSON body.swapped_protocol_changes_content_type— swapping toawsJson1_1changes theContent-Typetoapplication/x-amz-json-1.1while preserving the samerequest body.
swap_to_rest_json_protocol— swapping torestJson1(a fundamentallydifferent protocol class) produces
Content-Type: application/json, noX-Amz-Targetheader, and a correctly serialized JSON body. This exercisesthe
supports_http_bindings()fallback path.Breaking changes
DeserializeResponse::deserialize_nonstreamingnow takes&ConfigBagas asecond parameter.
SharedClientProtocolmust be present in theConfigBagfor schema-basedprotocols to function. This is automatically set by generated service runtime
plugins.
What's not yet implemented
AwsRestXmlProtocolRpcV2CborProtocol(currently falls back to legacy error parsing)
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.