Add validation for Trace ID#1992
Conversation
|
How does this work with pluggable ID generators? Will we be locking users out of using ID generators that do not strictly follow Otel spec? //cc @NathanielRN |
|
When we just focus on the current implementations and other Python pluggable ID generators, this may introduce some breaking changes. But as I mentioned in #1991, some other languages such as Go and Java already introduced strict validation for the Trace ID format and the current Python implementation is causing incompatibility with other languages in that sense. |
| trace_id != INVALID_TRACE_ID | ||
| and span_id != INVALID_SPAN_ID |
There was a problem hiding this comment.
Should we add move these two checks into _validate_trace_id() so it's all together?
There was a problem hiding this comment.
Instead, we should just put the code of _validate_trace_id here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id does is trace_id < 2 ** 128.
| """ | ||
| if not trace_id: | ||
| return False | ||
| if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: |
There was a problem hiding this comment.
Would this work and be a bit faster?
# constant somewhere
_MAX_TRACE_ID = (1 << 128) - 1| if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | |
| if trace_id > _MAX_TRACE_ID: |
There was a problem hiding this comment.
Yes, I also prefer to make sure the trace id value is lesser than a certain value. Also, instead of (1 << 128) - 1 we can do 2 ** 128 - 1 which is subjectively easier to understand.
|
@owais Thanks for the ping! Based on the PR right now, I think this should be fine, since it's only being strict about the length of the ID. It isn't giving restrictions about the bytes that actually make up the ID. For example, in the @staticmethod
def generate_trace_id() -> int:
trace_time = int(time.time())
trace_identifier = random.getrandbits(96)
return (trace_time << 96) + trace_identifierWe remove some randomness from the ID in order to use some bits for the time stamp, but as far as OTel Python is concerned, this trace ID just as valid as an all random trace ID. Then on the AWS backend, we can parse out this "OTel" ID as a "AWS ID" having expected the user used I think pluggable ID generators should follow the restrictions on ID length (so that we can count on this in the rest of OTel), but how systems encode/decode those bits should not be restricted so that they can add information they want. |
| if not trace_id: | ||
| return False | ||
| if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | ||
| return False | ||
|
|
||
| return True |
There was a problem hiding this comment.
Building off of what @aabmass said. I'm okay with constant/no constant since it's only used once but having a _MAX_TRACE_ID_LENGTH is probably easier to read!
| if not trace_id: | |
| return False | |
| if len(format(trace_id, "032x")) != _TRACE_ID_HEX_LENGTH: | |
| return False | |
| return True | |
| return trace_id and trace_id < (1 << 128) - 1 and trace_id != INVALID_TRACE_ID |
|
As far as I could tell, Go only validates the ID during propagated context injection/extraction and I'm sure Python does it too already. Do we need this additional check? |
ocelotl
left a comment
There was a problem hiding this comment.
Please avoid single use functions and single use constants. Remember that the validation of trace_id is just making sure it is not greater than a certain specific value.
| trace_id != INVALID_TRACE_ID | ||
| and span_id != INVALID_SPAN_ID |
There was a problem hiding this comment.
Instead, we should just put the code of _validate_trace_id here instead of having a single-use function. Keep in mind that the only thing that _validate_trace_id does is trace_id < 2 ** 128.
|
Thank you for reviewing. Now I changed the code accordingly. |
I believe we only convert it to a base 16 int but not actually check the length. |
I think this should be a requirement as it is in the OTel spec, OTLP proto semantics, and the W3C trace context. Whether or not it was wise to lock down the spec, idk. @ymotongpoo pointed out that Go API uses a fixed 16 byte array here. @owais did you understand something different from the Go code? |
@lzchen we check it on extract() in the regex only, not inject: |
| is_valid = ( | ||
| trace_id != INVALID_TRACE_ID | ||
| and span_id != INVALID_SPAN_ID | ||
| and trace_id < 2 ** 128 - 1 |
There was a problem hiding this comment.
@ocelotl nit regarding the single use constant, I think it's worth having the constant as it's easier to understand what the magic number means and for the speedup of not calculating the value every time in this hot code path.
(I was curious so checked and CPython is not smart enough to optimize this into a constant on its own (funny enough, it does do it for the bit shifting approach)):
In [2]: def f(trace_id):
...: return trace_id < 2 ** 128 - 1
...:
In [3]: dis(f)
2 0 LOAD_FAST 0 (trace_id)
2 LOAD_CONST 1 (2)
4 LOAD_CONST 2 (128)
6 BINARY_POWER
8 LOAD_CONST 3 (1)
10 BINARY_SUBTRACT
12 COMPARE_OP 0 (<)
14 RETURN_VALUEThere was a problem hiding this comment.
Ok, that's a good point. In that case, the constant should be added as a private attribute of the class to keep it as close as where it is being used.
There was a problem hiding this comment.
Not sure if we have specs somewhere, but my 2 cents is that I think (1 << 128) - 1 is easier to read.
I can go either way on the constant though!
There was a problem hiding this comment.
ok I put the private const variable for readabiliy. As per bit shift vs multiplication, I leave the decision.
Please try again, @ymotongpoo |
|
@ocelotl Thanks! Now EasyCLA is fine. |
Description
This adds the validator in the constructor of SpanContext so that we can detect the invalid trace ID as early as possible.
Fixes #1991
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
I added the section in TestSpanContext in
opentelemetry-api/tests/trace/test_span_context.pyto validate Trace ID.Does This PR Require a Contrib Repo Change?
Checklist: