You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This epic-style issue tracks our work on migrating the internals of Hypothesis from a bytestring to the typed choice sequence.
This is a huge change that touches every part of Hypothesis. The short version is that we have encountered several shortcomings of Hypothesis internals over time, which have limited the efficiency of input generation and shrinking, and made alternative backends infeasible. We expect this migration to give moderate efficiency gains and pave a clear path forward to future integration with alternative backends or other powerful features which are currently intractable.
Bytestring
Hypothesis currently works at the level of a bytestring (a sequence of bytes).
To generate inputs, view strategies as a parser of the bytestring which interpret bytes as a series of random choices
Store inputs in the db as their bytestring representation
Shrink inputs by shrinking their corresponding bytestring ("internal shrinking").
Generate novel inputs by generating a novel prefix of the bytestring (via DataTree internally) and randomly generating the remaining bytes.
But, the bytestring has its limitations.
Redundancy. The mapping of bytes ↦ input is not injective, so an input may have many byte representations. For instance, 0 is represented by many different bytestrings, so any strategy using st.integers() effectively wastes some number of inputs (except for detecting flakiness). See also generate_novel_prefix interacts poorly with biased_coin (and lists) #1574.
Precision. Effecting predictable changes in the input via changes in the bytestring is difficult, in e.g. shrinking. For instance, we would like to be able to "naturally" shrink floats, eg by exponentially ramping up division or truncation. Doing this in the byte representation is quite nasty. In fact, we currently hack around this by parsing bytes which look like they could represent floats into a float, shrinking that, and serializing it back into the bytestring.
These limitations extend equally to alternative backends over the bytestring. e.g. for CrossHair (#3086) integration, bytes is simply too low level to get an efficient SMT algorithm out of.
Typed Choice Sequence
Enter the typed choice sequence, which replaces the bytestring. We lift up the representation from bytes to a slightly higher representation at the level of five types: boolean, integer, float, string, and bytes.
classPrimitiveProvider(abc.ABC):
@abc.abstractmethoddefdraw_boolean(
self,
p: float=0.5,
) ->bool:
...
@abc.abstractmethoddefdraw_integer(
self,
min_value: int|None=None,
max_value: int|None=None,
*,
# weights are for choosing an element index from a bounded rangeweights: Sequence[float] |None=None,
shrink_towards: int=0,
) ->int:
...
@abc.abstractmethoddefdraw_float(
self,
*,
min_value: float=-math.inf,
max_value: float=math.inf,
allow_nan: bool=True,
smallest_nonzero_magnitude: float,
) ->float:
...
@abc.abstractmethoddefdraw_string(
self,
intervals: IntervalSet,
*,
min_size: int=0,
max_size: int|None=None,
) ->str:
...
@abc.abstractmethoddefdraw_bytes(
self,
min_size: int=0,
max_size: int|None=None,
) ->bytes:
...
This improves redundancy (as DataTree operates at this higher level) and precision (as we retain type and shape information about what was previously spans of the bytestring). The end goal is rebuilding input generation, shrinking, targeted pbt, the explain phase, the database, etc on the typed choice sequence, and we expect to see performance gains in each (except the database).
This epic-style issue tracks our work on migrating the internals of Hypothesis from a bytestring to the typed choice sequence.
This is a huge change that touches every part of Hypothesis. The short version is that we have encountered several shortcomings of Hypothesis internals over time, which have limited the efficiency of input generation and shrinking, and made alternative backends infeasible. We expect this migration to give moderate efficiency gains and pave a clear path forward to future integration with alternative backends or other powerful features which are currently intractable.
Bytestring
Hypothesis currently works at the level of a bytestring (a sequence of bytes).
DataTreeinternally) and randomly generating the remaining bytes.But, the bytestring has its limitations.
0is represented by many different bytestrings, so any strategy usingst.integers()effectively wastes some number of inputs (except for detecting flakiness). See also generate_novel_prefix interacts poorly with biased_coin (and lists) #1574.These limitations extend equally to alternative backends over the bytestring. e.g. for CrossHair (#3086) integration, bytes is simply too low level to get an efficient SMT algorithm out of.
Typed Choice Sequence
Enter the typed choice sequence, which replaces the bytestring. We lift up the representation from bytes to a slightly higher representation at the level of five types:
boolean,integer,float,string, andbytes.This improves redundancy (as
DataTreeoperates at this higher level) and precision (as we retain type and shape information about what was previously spans of the bytestring). The end goal is rebuilding input generation, shrinking, targeted pbt, theexplainphase, the database, etc on the typed choice sequence, and we expect to see performance gains in each (except the database).Roadmap
float,integer,string,byte, andbooleandrawing logic intoPrimitiveProvider#3788DataTreeto the new IR #3818crosshairbackend #3806Floatshrinker to the ir #3899generate_mutations_from)weights=#4138extendforcached_test_function_ir#4159explainphase to the typed choice sequence #4162Optimiserto the typed choice sequence #4163generate_novel_prefixto the typed choice sequence #4172choice_{to, from}_index#4209sort_key_ir#4215ConjectureData.for_choices#4219@reproduce_failure#4220lower_blocks_together->lower_integers_together#4235BytestringProviderinfuzz_one_input#4221extend: int = 0andNodeTemplatein terms of choice count #4249ConjectureData.buffer#4250TODO_IR