Add Exemplar support to Metrics proto#159
Conversation
|
Closing in favor of #162 (which implements exemplars) |
|
updated to not make structual changes and to add raw_value_data_points. Still need to specify the type of RawValue in some way, which will be unblocked after #168 |
| repeated DoubleDataPoint double_data_points = 3; | ||
| repeated HistogramDataPoint histogram_data_points = 4; | ||
| repeated SummaryDataPoint summary_data_points = 5; | ||
| repeated RawValue raw_data_points = 6; |
There was a problem hiding this comment.
I think this should be removed for the moment, this is not exemplar, is supporting "RawMeasurements" which is out of scope for this PR.
|
The generated code was removed, please remove that as well. Rebase the PR please. |
| message RawValue { | ||
| // The set of labels that were dropped by the aggregator, but recorded | ||
| // alongside the original measurement. Only labels that were dropped by the aggregator should be included | ||
| repeated opentelemetry.proto.common.v1.StringKeyValue labels = 1; |
There was a problem hiding this comment.
What is the relationship between these labels and the labels in the DataPoint:
- Do we duplicate them?
- Do we extract these labels and the actual set of labels is the combination of these + datapoint.lables?
There was a problem hiding this comment.
For exemplars these labels are only the labels not included in the DataPoint's labels. I would change this field to be dropped_labels but if RawValue is used as a data point itself the labels would include all labels
There was a problem hiding this comment.
@jmacd I know you tried to share the messages, can you help define the behavior here?
There was a problem hiding this comment.
I think you recommended calling these "dropped_labels". This sounds good to me.
|
|
||
| // (Optional) List of exemplars collected from | ||
| // measurements that were used to form the data point | ||
| repeated RawValue exemplars = 7; |
There was a problem hiding this comment.
Would this just be a list of random samples from the whole window? Open question in the OTEP:
We don’t have a strong grasp on how the sketch aggregator works in terms of implementation - so we don’t have enough information to design how exemplars should work properly.
There was a problem hiding this comment.
The proto does not define how the exemplars were sampled, not sure your question?
There was a problem hiding this comment.
At the recent OTLP discussion meeting, we agreed to remove the sample_count field from the current proposal. We also agreed to move the Exemplars into the DataPoints so that they can refer to dropped labels, not include full label sets in each point.
There was a problem hiding this comment.
@bogdandrutu does this sound right to you, for now?
There was a problem hiding this comment.
We also agreed to move the Exemplars into the DataPoints so that they can refer to dropped labels, not include full label sets in each point.
I think we agreed that you will evaluate what is better for performance/semantics:
- Having a
repeated RawValue exemplarsin theMetricthat applies to all data points (user may need to do another remapping to every data point) vsrepeated RawValue exemplarsin every point (if we go with every point then "dropped_labels" is better name for that).
My point is that I don't have a strong opinion between the both, and was trying to make you investigate and decide which way. Here are my thoughts:
- Having
repeated RawValue exemplarsin the Metrics:- Pros:
- Saves some memory in the internal representation (have extra 24 bytes per data point).
- Same message may be able to be re-used with raw-measurements because labels don't represent dropped.
- Cons:
- Duplicate labels on the wire.
- User needs to re-map every exemplar to the data point by doing the labels matching.
- Pros:
I feel cons are more "significant" than pros, so personally I would go with exemplars in every DataPoint as you suggested.
There was a problem hiding this comment.
If we go with exemplars in every DataPoint I would say to rename the message to "Exemplar" :)
There was a problem hiding this comment.
This discussion makes me want a way to intern label sets to avoid the "re-map every exemplar" problem.
I don't feel inclined to invest time in this now, so we should probably choose "repeated RawValue exemplars in every point".
There was a problem hiding this comment.
This discussion makes me want a way to intern label sets to avoid the "re-map every exemplar" problem.
Even with an "intern label" you still need to map every exemplar to a point (that mapping may be faster if we have "intern label" but still needs some work)
|
I left a lengthy remark on this topic here: I am worried that my request for @cnnradams to explore and implement statistical sampling for exemplars has led to some confusion, and (with my apologies) I am willing to omit it, but as noted in the comment, there are many related questions and even if we take the statistical question out of it, we're left with tough questions. |
|
@cnnradams I know your internship is over, but let us know if you're willing to make these changes. |
|
sure. from reading the discussion, it seems like the only things I need to change are |
done all 3. |
* Add exemplars to proto * handle just exemplars, nit fixes * comments * rawvalue -> exemplar, remove sample_count
* Add exemplars to proto * handle just exemplars, nit fixes * comments * rawvalue -> exemplar, remove sample_count
Adds support for OTEP#113. This should also handle duplicate labels in exemplars by bringing labels up a level and making exemplars only hold
additional_labels.Proto question (since I'm new to this): I defined a
measurement_typeenum for the RawValue type, but couldn't create a new enum with just INT64 and DOUBLE (because I can't share names with other enums). So right now I'm usingType, which means things other than INT64 and DOUBLE can be picked formeasurement_type. What is the right solution for this?