feat: DH-18351: Add CumCountWhere() and RollingCountWhere() features to UpdateBy#6566
feat: DH-18351: Add CumCountWhere() and RollingCountWhere() features to UpdateBy#6566lbooker42 merged 12 commits intodeephaven:mainfrom lbooker42:nightly/DH-18351-lab-countwhere
CumCountWhere() and RollingCountWhere() features to UpdateBy#6566Conversation
| assertEquals(6L, counts.get(0)); | ||
|
|
||
| // Get a static set table for use in dynamic where filters (contains 0-3) | ||
| final QueryTable setTable = (QueryTable) TableTools.newTable(col("sym", 1, 2, 3)); |
There was a problem hiding this comment.
Noticed that AggCountWhere didn't test DynamicWhereFilter in CI, corrected here.
| * values that pass the supplied {@code filters}. | ||
| * | ||
| * @param resultColumn The {@link Count#column() output column} name | ||
| * @param filters The filters to apply to the input columns |
There was a problem hiding this comment.
Corrected missing param
cpwright
left a comment
There was a problem hiding this comment.
What does code coverage look like?
| return super(Table, self).agg_all_by(agg, by) | ||
|
|
||
| def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], by: Union[str, List[str]]) -> Table: | ||
| def update_by(self, ops: Union[UpdateByOperation, List[UpdateByOperation]], |
There was a problem hiding this comment.
Do we have coverage of the None case?
There was a problem hiding this comment.
We might also need to update the doc string for by with something like defaults to None, meaning...
There was a problem hiding this comment.
@cpwright yes, tested in test_updateby.py for both client and server side.
| col (str): the column to hold the counts of rows that pass the filter condition columns. | ||
| filters (Union[str, Filter, List[str], List[Filter]], optional): the filter condition | ||
| expression(s) or Filter object(s) | ||
| rev_ticks (int): the look-behind window size (in rows/ticks) |
There was a problem hiding this comment.
The default matches formula, but is not listed as a default in the doc.
What ist he intention for = 0; =0 to do?
There was a problem hiding this comment.
Should not have a default for rev_ticks, corrected.
For timed windows, have rev==fwd==0 means that the window will contain all rows with exactly matching (to the nanosecond) timestamps. For ticks, rev==fwd==0 is undefined, it means a zero-size window and will probably break in interesting ways depending on the operator.
|
Tested Boolean column sources to see if performance is affected by re-interpretation, mostly appears in CumSum: |
CumCountWhere() and RollingCountWhere() features to UpdateByCumCountWhere() and RollingCountWhere() features to UpdateBy
|
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#373 |
Groovy Examples
Python Examples
Performance Notes
TL:DR Performance compares very well.
RollingCountWhere()has near identical performance to the comparison benchmarks (can be faster depending on the complexity of the filter.CumCountWhere()also compares well toEma()but can't catch up to zero-keyCumSum(), which is is remarkably fast.Comparing
CumCountWheretoCumSumandEma:Comparing
RollingCountWheretoRollingCountandRollingSum: