Skip to content

Enforce cursor be serializable#73

Merged
sambostock merged 1 commit into
masterfrom
enforce-string-cursor
Apr 19, 2021
Merged

Enforce cursor be serializable#73
sambostock merged 1 commit into
masterfrom
enforce-string-cursor

Conversation

@sambostock

@sambostock sambostock commented Apr 7, 2021

Copy link
Copy Markdown
Contributor

Context & Problem

The cursor_position is serialized by the job adapter (Resque/Sidekiq) and bypasses Active Job's fancy serialization. Both Resque and Sidekiq effectively use JSON.dump to serialize it and JSON.parse to deserialize it.

In cases where a non-JSON-compatible cursor is used, it is silently turned into a String via to_s, which can lead to weird errors when the job resumes after interruption. These are not always straightforward to debug, and don't reliably show up in tests, as Active Support adds implicit coercions to some object's == method.

time = Time.at(0).utc               # =>  1970-01-01 00:00:00 UTC  (Time)
time.to_s                           # => "1970-01-01 00:00:00 UTC" (String)
require 'json'
JSON.parse(JSON.dump(time))         # => "1970-01-01 00:00:00 UTC" (String)
time == JSON.parse(JSON.dump(time)) # => false                     (no coercion)
require 'active_support/all'
JSON.parse(JSON.dump(time))         # => "1970-01-01 00:00:00 UTC" (String)
time == JSON.parse(JSON.dump(time)) # => true                      (implicitly coerced)
Real world example In one of our repos, we use a custom enumerator around a collection of events fetched from an API and ordered by time. We used the time the event occurred at as a cursor, which was "corrupted" by being serialized into a String. The API expected an ISO8601 timestamp (its wrapper gem converts `Time` objects automatically), but our improperly deserialized time strings were in the wrong format, so it blew up.

This went unnoticed for some time, due to the job not being interrupted, and was tricky to debug due to the reasons outlined above.

What this PR does about it

This prevents these errors by enforcing that the cursors be composed of basic Ruby objects.

This is enforced by (recursively) analyzing the each cursor yielded by the enumerator before each_iteration. If it is composed of anything other than the following classes, a descriptive CursorError is raised:

  • Array
  • Hash
  • String
  • Integer
  • Float
  • NilClass (nil)
  • TrueClass (true)
  • FalseClass (false)

Rationale

  • It is better to verify each cursor and ensure we'd be able to interrupt before having started any work, than to finding out the cursor is unsafe after interruption.
  • It is fastest and most reliable to simply analyze the component classes of the cursor (rather than check if serializing and deserializing yields the same object, due to implicit conversions).
  • Since most cursors are simple Strings, Integers, or nil, the performance penalty of checking each cursor is negligible.

Original Description The cursor ends up being serialized as a `String`, regardless of if it is a `String` or not (by calling `to_s`).

This can lead to subtle bugs often not found during testing.

For instance, if a Time is used as a cursor and the job is requeued, it instead receives the Time#to_s representation as its new input cursor.

Let's be explicit and save developers the headache of debugging that: let's enforce that the cursor be an instance of String or NilClass.

@sambostock

Copy link
Copy Markdown
Contributor Author

Turns out both Sidekiq and Resque encode job params as JSON, so technically as long as

cursor == JSON.load(JSON.dump(cursor))

it should be a (de-)serializable cursor.

However, we need to watch out for arbitrary Ruby objects as noted in Sidekiq's docs, that validation isn't quite good enough.

@sambostock sambostock force-pushed the enforce-string-cursor branch from a3a6db5 to 0b38ffd Compare April 14, 2021 02:24
@sambostock sambostock changed the title Raise if cursor not a String Enforce cursor be serializable Apr 14, 2021
@sambostock sambostock force-pushed the enforce-string-cursor branch 3 times, most recently from a95c464 to 6da0b85 Compare April 14, 2021 03:15
@sambostock sambostock added bug Something isn't working enhancement New feature or request labels Apr 14, 2021
@sambostock sambostock marked this pull request as ready for review April 14, 2021 03:50

@kirs kirs left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting the efforts to describe the problem and how you've run into it in your app.
Code LGTM!

Comment thread lib/job-iteration/iteration.rb Outdated
Comment thread test/integration/sidekiq_test.rb Outdated
@sambostock

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @GustavoCaso! I have

  • gone with the simple CLASSES.any? { |klass| object.instance_of?(klass) } approach
  • added a Resque cursor corruption test

I also noticed the integration tests are basically identical for Resque and Sidekiq, so I extracted a common test module in #75, to keep it independent of this change.

@sambostock sambostock requested a review from GustavoCaso April 17, 2021 18:18
TL;DR: This enforces that cursors can only be composed of Objects of
built-in JSON compatible classes: Array, FalseClass, Float, Hash,
Integer, NilClass, String, and TrueClass.

Both Resque and Sidekiq serialize job arguments as JSON. Because
`cursor_position` is not part of the ActiveJob arguments, it is subject
to this serialization method.

Objects other than the basic types JSON supports are serialized by
calling `.to_s`, which means they are subject to one way serialization;
once serialized, there is no way (out of the box) for them to be
deserialized.

One such example is Time objects, which are serialized to a reasonable
String representation, but when deserialized simply result in a String
containing that textual representation.

Sneakily enough, because ActiveSupport "augments" `Time#==` adding
automatic coercion, `time == JSON.parse(JSON.dump(time))`, which means
it isn't always noticed in tests.

Nonetheless, this can result in unexpected errors, especially since it
only "matters" if the job is interrupted (as otherwise the cursor is
never serialized.

Therefore, we should enforce that the (de)serialization round trip
results in the same cursor it was given without corruption. The simplest
way to do this is to traverse the cursor looking for unsupported
classes. Since cursors should typically be very small, this should not
add any significant performance penalty.
@sambostock sambostock force-pushed the enforce-string-cursor branch from 95a7f38 to 28e74ee Compare April 19, 2021 18:17
@sambostock sambostock merged commit c3c5242 into master Apr 19, 2021
@sambostock sambostock deleted the enforce-string-cursor branch April 19, 2021 18:24
@rafaelfranca

Copy link
Copy Markdown
Member

Should not Symbols be allowed?

@sambostock

Copy link
Copy Markdown
Contributor Author

Symbol serialization is "lossy" as they are turned into string JSON keys, which means they are deserialized into Strings.

$ ruby -rjson -e 'p JSON.parse(JSON.dump(:symbol))'
"symbol"

Therefore they are disallowed.

@rafaelfranca

Copy link
Copy Markdown
Member

Ok, that makes sense but this is a breaking change, so it should not released as a patch release.

@sambostock sambostock mentioned this pull request Apr 21, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants