#2879 highlighted a bunch of bugs/glitches associated with Cardinality aggregation missing key
handling.
Right now, we collect numbers with a different salt, to decrease the chances of collision.
The trouble is that before #2879, we DID consider a missing key that could be a number for str column.
That missing key was then added to the cardinality sketch, but without going through the salt treatment used for other values.
As a result, merging
- segment with str column and missing value 1
- segment with int column and value 1
Would count value 1 twice.
Similarly (but there are probably very few good workaround there) the number type detection impacts cardinality, strongly.
If a column is an i64 in one segment and a u64 in another, the computed cardinality will be inflated.
#2879 highlighted a bunch of bugs/glitches associated with Cardinality aggregation missing key
handling.
Right now, we collect numbers with a different salt, to decrease the chances of collision.
The trouble is that before #2879, we DID consider a missing key that could be a number for str column.
That missing key was then added to the cardinality sketch, but without going through the salt treatment used for other values.
As a result, merging
Would count value 1 twice.
Similarly (but there are probably very few good workaround there) the number type detection impacts cardinality, strongly.
If a column is an i64 in one segment and a u64 in another, the computed cardinality will be inflated.