fix: DH-22265: increment Kafka column index for complex spec#7887
fix: DH-22265: increment Kafka column index for complex spec#7887devinrsmith merged 5 commits intodeephaven:mainfrom
Conversation
This fixes a bug where the column indexing is incorrectly calculated; it only manifests when the key spec is "complex" and the value spec is "simple". Fixes deephaven#7884
No docs changes detected for 5dc6e69 |
There was a problem hiding this comment.
Pull request overview
Fixes Kafka consume column-index calculation when combining a complex key spec with a simple value spec, addressing the “Chunk sizes don’t match” ingestion error reported in #7884.
Changes:
- Add
KeyOrValueSpec.ingestDataAndIncrement(...)to adjustnextColumnIndexafter complex ingest specs. - Switch
getConsumeStructto use the new helper for both key and value ingest-data creation. - Add a small helper (
KeyOrValueIngestData.isSimple()) to distinguish simple vs non-simple ingest specs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }); | ||
|
|
||
| final KeyOrValueIngestData keyIngestData = keySpec.getIngestData(KeyOrValue.KEY, | ||
| final KeyOrValueIngestData keyIngestData = keySpec.ingestDataAndIncrement(KeyOrValue.KEY, | ||
| schemaRegistryClient, configs, nextColumnIndex, columnDefinitions); | ||
| final KeyOrValueIngestData valueIngestData = valueSpec.getIngestData(KeyOrValue.VALUE, | ||
| final KeyOrValueIngestData valueIngestData = valueSpec.ingestDataAndIncrement(KeyOrValue.VALUE, | ||
| schemaRegistryClient, configs, nextColumnIndex, columnDefinitions); |
There was a problem hiding this comment.
This change fixes subtle column-indexing behavior that only shows up with a complex key spec + simple value spec (per DH-22265 / #7884), but there doesn’t appear to be regression coverage exercising getConsumeStruct/consumeToTable with that spec combination. Adding a unit/integration test that builds a TableDefinition via getTableDefinition (or runs a minimal ingest path) for complex-key + simple-value and asserts the resulting indices/column ordering would help prevent future regressions.
There was a problem hiding this comment.
Testing will be in a follow-up PR.
This adds KafkaTools integration testing via TestContainers. There are four different Kafka images we are testing against: `cp-kafka`, `kafka`, `kafka-native`, and `redpanda`. This includes a test case for DH-22265 / deephaven#7887
rcaudy
left a comment
There was a problem hiding this comment.
I think we're being a bit brittle.
| TableDefinition tableDef, | ||
| KeyOrValueIngestData data); | ||
|
|
||
| private KeyOrValueIngestData ingestDataAndIncrement( |
There was a problem hiding this comment.
Maybe:
protected KeyOrValueIngestData getIngestData(
KeyOrValue keyOrValue,
SchemaRegistryClient schemaRegistryClient,
Map<String, ?> configs,
int nextColumnIndex,
List<ColumnDefinition<?>> columnDefinitionsOut) {
return getIngestData(keyOrValue, schemaRegistryClient, configs, new MutableInt(nextColumnIndex),
columnDefinitionsOut);
}
private KeyOrValueIngestData getIngestDataAndIncrementColumnIndex(
final KeyOrValue keyOrValue,
final SchemaRegistryClient schemaRegistryClient,
Map<String, ?> configs,
MutableInt nextColumnIndexMut,
List<ColumnDefinition<?>> columnDefinitionsOut) {
final int ixPre = nextColumnIndexMut.get();
final int sizePre = columnDefinitionsOut.size();
// It's unfortunate we have to do this; but the getIngestData signature and design of KeyOrValueSpec
// would need a breaking change to improve in these regards.
final KeyOrValueIngestData ingestData = getIngestData(keyOrValue, schemaRegistryClient, configs,
ixPre, columnDefinitionsOut);
final int addedColumns = columnDefinitionsOut.size() - sizePre;
if (ingestData == null) {
// ignore case (or, possible a complex impl that chooses to return null when it's empty):
// expecting no modifications
if (addedColumns != 0) {
throw new IllegalStateException(
"KeyOrValueSpec getIngestData is modifying out-variables without returning ingest data");
}
} else {
nextColumnIndexMut.set(ixPre + addedColumns);
}
return ingestData;
}
}
rcaudy
left a comment
There was a problem hiding this comment.
Approving, but I liked my simpler version that absolved the implementations of any responsibility for mutating the index quite a lot more. This version has an awful lot of validation and warning trace that only benefits anyone if the current nonsense implementation is desirable long-term.
rcaudy
left a comment
There was a problem hiding this comment.
I think the commented-out warnings are ugly, but I'm willing to compromise here.
This fixes a bug where the column indexing is incorrectly calculated; it only manifests when the key spec is "complex" and the value spec is "simple".
Fixes #7884