| title | Vectorization and the recipe paradigm |
|---|
Deephaven's query engine uses vectorized operations and a declarative "recipe" paradigm to achieve high performance on both static and real-time data. This guide explains the technical foundations of this approach and why it matters for your queries.
In traditional programming languages like Python, C, or Java, you write imperative code that specifies step-by-step instructions. This is Single Instruction, Single Data (SISD) - one instruction processes one piece of data at a time:
# Traditional Python: Imperative, SISD
data = [1, 2, 3, 4, 5]
results = []
for value in data: # One iteration at a time
result = value * value # One calculation
results.append(result) # One appendThis approach:
- Executes instructions sequentially.
- Processes one data element per instruction.
- Requires explicit loops for multiple elements.
- Creates intermediate Python objects for each value.
Deephaven uses a declarative approach based on Single Instruction, Multiple Data (SIMD) - one instruction processes multiple data elements simultaneously:
from deephaven import empty_table
# Deephaven: Declarative, SIMD-capable
result = empty_table(5).update(["X = i + 1", "XSquared = X * X"])This approach:
- Specifies what to compute, not how.
- Processes data in optimized chunks (vectorization).
- Enables CPU-level SIMD instructions when available.
- Avoids intermediate Python objects.
Let's compare the approaches with timing:
import time
import numpy as np
from deephaven import empty_table
from deephaven.numpy import to_numpy
# Create test data
size = 1_000_000
# Time the loop approach (via NumPy for fair comparison)
start = time.time()
x_array = np.arange(size)
result_array = np.empty(size)
for i in range(size):
result_array[i] = x_array[i] * x_array[i]
loop_time = time.time() - start
print(f"Loop approach: {loop_time:.4f} seconds")
# Time the Deephaven recipe approach
start = time.time()
dh_result = empty_table(size).update(["X = (long)i", "XSquared = X * X"])
# Force computation by reading a value
_ = dh_result.head(1)
recipe_time = time.time() - start
print(f"Recipe approach: {recipe_time:.4f} seconds")
print(f"Speedup: {loop_time / recipe_time:.2f}x")
# Store results for display
pandas_result = f"Loop: {loop_time:.4f}s"The recipe approach is typically much faster because:
- Vectorization - Processes multiple values per CPU instruction.
- No Python overhead - Computation stays in compiled code.
- Better memory access - Sequential columnar reads are cache-friendly.
- Parallelization - Engine can split work across cores.
Modern CPUs have special instructions that operate on multiple data elements simultaneously. For example, instead of adding two numbers at a time:
Regular: A + B = C (one addition)
Vectorized CPUs can do:
SIMD: [A1, A2, A3, A4] + [B1, B2, B3, B4] = [C1, C2, C3, C4] (four additions in one instruction)
Deephaven's engine is designed to enable CPU vectorization:
- Columnar storage - Data for a column is stored contiguously in memory.
- Chunk-oriented processing - Operations work on blocks of data at once.
- Type-specific operations - Specialized code for each data type avoids type checks in inner loops.
- JIT compilation - The JVM can optimize and vectorize hot code paths.
By structuring our engine operations as chunk-oriented kernels, we allow the JVM's JIT compiler to vectorize computations where possible.
Deephaven moves data using a structure called a Chunk:
Chunk = contiguous block of typed data (e.g., 4096 doubles)
When you write:
t.update("Y = X * 2")The engine:
- Reads column
Xin chunks (e.g., 4096 values at a time). - Applies the operation to each chunk (vectorized multiplication).
- Writes results to column
Yin chunks.
This approach:
- Amortizes memory access costs.
- Enables vectorization.
- Reduces per-element overhead.
- Works efficiently with CPU caches.
When you write a Deephaven query:
from deephaven import time_table
t1 = time_table("PT1s").update("X = i")
t2 = t1.update("Y = X * 2")You're creating a specification (recipe) that says "Y should always equal X times 2". You're not executing a loop or directly computing values.
The engine builds a Directed Acyclic Graph (DAG) of dependencies:
t1 (source) → t2 (derived)
↓ ↓
X → Y = X * 2
When data ticks:
- New rows arrive in
t1. - Engine detects that
t2depends ont1. - Engine automatically computes
Yfor the new rows. - Updates propagate through the DAG.
This is fundamentally impossible with imperative loops - a loop executes once and stops!
from deephaven import time_table
from deephaven.updateby import cum_sum
# Create a ticking table (adds row every second)
source = time_table("PT1s").update(["X = i", "XSquared = X * X"])
# Add a cumulative sum - updates automatically!
result = source.update_by(cum_sum("SumX = X"))Watch this table in the UI. Every second:
- A new row arrives in
source. XSquaredis computed for the new row.SumXis updated for the new row.- You wrote the recipe once, it runs forever.
Here's a more complex example that demonstrates multiple concepts working together - time manipulation, chained operations, and Java function integration:
from deephaven import time_table
# Create a table that ticks every second -- add a column that is nanos since the epoch
t1 = time_table("PT1s").update(["TsEpochNs = epochNanos(Timestamp)"])
# Create a new table that adds a Java instant column from the TsEpochNs column
# epochNanosToInstant is a Java function from DateTimeUtils
t2 = t1.update("TS2 = epochNanosToInstant(TsEpochNs)")
# Do some time operations
t3 = t2.update(
[
"TS3 = epochNanosToInstant(TsEpochNs + 2*SECOND)",
"TS4 = Timestamp + 'PT2s'",
"D3 = TS3-Timestamp",
"D4 = TS4-Timestamp",
]
)This example illustrates several key concepts:
- Declarative recipes - Each
.update()specifies what to compute, not how to loop. - Automatic propagation - All three tables (
t1,t2,t3) update every second. - Chained operations - Tables build on each other through the DAG.
- Real-time execution - New rows trigger automatic recomputation.
- Java integration - Using
epochNanosToInstant()from DateTimeUtils. - Type conversions - Converting between epoch nanos, Instants, and timestamps.
Every second, a new row arrives and all formulas execute automatically. The engine handles:
- Dependency tracking between
t1→t2→t3. - Type conversions and time arithmetic.
- Efficient execution of all operations.
Under the hood, Deephaven:
- Parses your query string into an Abstract Syntax Tree (AST).
- Analyzes the AST to determine dependencies and types.
- Generates optimized Java code (or uses pre-compiled classes for simple operations).
- Compiles the generated code.
- Executes the compiled code on chunks of data.
For example, "Y = X * 2" might become:
// Generated Java code (simplified)
class GeneratedFormula {
void apply(DoubleChunk input, WritableDoubleChunk output) {
for (int i = 0; i < input.size(); i++) {
output.set(i, input.get(i) * 2.0);
}
}
}This compiled code:
- Has no Python overhead.
- Can be JIT-optimized by the JVM.
- Can be vectorized by the CPU.
- Runs at native speed.
The recipe paradigm makes real-time processing trivial. Compare:
# ❌ This only runs once!
results = []
for row in source.iter_tuple():
results.append(row.X * 2)
# What happens when new data arrives? Nothing!# ✓ This updates automatically!
result = source.update("Y = X * 2")
# New data arrives? Y is computed for new rows automatically!The engine is smart about updates. It doesn't recompute everything - it only processes what changed:
from deephaven import time_table
# Ticking source with multiple operations
source = time_table("PT1s").update(["X = i", "Y = X * X", "Z = Y + 10", "W = Z * 2"])When a new row arrives:
- Only the new row is processed.
- All formulas are evaluated for that row.
- Results are appended to output columns.
- Nothing else is recomputed.
For updates or modifications:
- Only affected rows are recomputed.
- Dependencies are tracked automatically.
- Downstream tables update accordingly.
from deephaven import time_table
from deephaven.updateby import rolling_avg_tick
# Streaming data with rolling statistics
trades = time_table("PT0.1s").update(
[
"Symbol = (i % 3 == 0) ? `AAPL` : (i % 3 == 1) ? `GOOGL` : `MSFT`",
"Price = 100 + randomGaussian(0, 5)",
"Size = randomInt(1, 100)",
]
)
# Calculate 10-row rolling average - updates in real-time!
result = trades.update_by(
rolling_avg_tick("AvgPrice = Price", rev_ticks=10), by="Symbol"
)This query:
- Processes streaming trade data.
- Maintains separate rolling averages per symbol.
- Updates automatically as new data arrives.
- Would be extremely difficult to implement with loops.
Loop approach creates Python objects:
# Creates 1,000,000 Python int objects!
results = [x * x for x in range(1_000_000)]Recipe approach stays in native memory:
# No Python objects created for data!
result = empty_table(1_000_000).update("XSquared = i * i")Deephaven uses smart memory management:
from deephaven import empty_table
t1 = empty_table(1_000_000).update("X = i")
# t2 shares the X column with t1 - no copy!
t2 = t1.update("Y = X * 2")
# t3 also shares the X column - still no copy!
t3 = t1.where("X > 500000")A table may share its RowSet with any other table in its update graph that contains the same row keys... This sharing capability represents an important optimization that avoids some data processing or copying work.
Row-oriented (like Python lists of dicts):
[{X: 1, Y: 2}, {X: 3, Y: 4}, {X: 5, Y: 6}]
- Accessing column X requires skipping Y values.
- Poor cache locality for column operations.
- Can't vectorize efficiently.
Columnar (like Deephaven):
X: [1, 3, 5]
Y: [2, 4, 6]
- Column X is contiguous in memory.
- Excellent cache locality.
- Enables vectorization.
from deephaven import empty_table
result = empty_table(10).update(
[
"X = i",
"Y = i * 10",
"Z = sqrt(X * X + Y * Y)", # Pythagorean theorem
]
)Engine execution:
- Reads
XandYcolumns in chunks. - Applies vectorized operations chunk-by-chunk.
- Writes results to
Zcolumn. - No Python overhead, no intermediate objects.
from deephaven import empty_table
result = empty_table(10).update(
["X = i", "Category = (X < 3) ? `Small` : (X < 7) ? `Medium` : `Large`"]
)The ternary operator compiles to:
- Efficient branch prediction.
- No Python if/else overhead.
- Vectorized where possible.
from deephaven import empty_table
from deephaven.updateby import cum_sum, rolling_avg_tick
result = (
empty_table(10)
.update("X = i + 1")
.update_by([cum_sum("CumSum = X"), rolling_avg_tick("RollingAvg = X", rev_ticks=3)])
)These operations:
- Maintain state efficiently.
- Update incrementally when data ticks.
- Cannot be expressed with simple loops.
- Are highly optimized in the engine.
from deephaven import empty_table
source = empty_table(5).update(["X = i", "Y = X * 2"])
# Extracting data to Python - loops are fine here
for row in source.iter_tuple():
# Process in Python, make API calls, etc.
print(f"X={row.X}, Y={row.Y}")This is extraction, not transformation. The data is leaving Deephaven.
from deephaven import empty_table
source = empty_table(100).update(
[
"Symbol = (i % 3 == 0) ? `AAPL` : (i % 3 == 1) ? `GOOGL` : `MSFT`",
"Price = 100 + randomGaussian(0, 5)",
]
)
# Using loops for control flow - fine!
tables = {}
for symbol in ["AAPL", "GOOGL", "MSFT"]:
tables[symbol] = source.where(f"Symbol = `{symbol}`")You're using loops to control table creation, not to transform table data.
# ❌ NEVER do this!
results = []
for row in source.iter_tuple():
results.append(row.X * row.Y)
# Now what? How do you get this back into a table?
# And what happens when data ticks?Use .update() instead!
✅ Good - Vectorizable:
from deephaven import empty_table
good = empty_table(10).update(
[
"X = i",
"Y = X * 2 + 5", # Simple arithmetic - vectorizes well
]
)from deephaven import empty_table
def complex_calculation(x):
# Complex Python function - called per value, not vectorized
result = 0
for i in range(int(x)):
result += i**2
return result
careful = empty_table(10).update(
[
"X = i + 1",
"Y = complex_calculation(X)", # Python function - not vectorized
]
)❌ Slow - Calls Python for every row:
def python_func(x):
return x * 2
t.update("Y = python_func(X)") # Python call per row - slow!✅ Fast - Stays in compiled code:
t.update("Y = X * 2") # Compiled code - fast!For rolling calculations, use update_by:
from deephaven import empty_table
from deephaven.updateby import rolling_avg_tick
result = (
empty_table(100)
.update("X = i")
.update_by(rolling_avg_tick("AvgX = X", rev_ticks=10))
)For aggregations, use dedicated methods:
from deephaven import empty_table
result = (
empty_table(100)
.update(["Group = i % 5", "Value = randomDouble(0, 100)"])
.avg_by("Group")
)from deephaven import empty_table
# Filter first to minimize data processed
result = (
empty_table(1_000_000)
.update("X = i")
.where("X > 900000") # Filter early!
.update("Y = X * X") # Only processes 100k rows
)The Java Virtual Machine (JVM) uses Just-In-Time (JIT) compilation to optimize hot code paths. For Deephaven queries:
- Initial execution - Code is interpreted.
- Profiling - JVM identifies hot methods.
- Compilation - Hot methods are compiled to native code.
- Optimization - Compiler applies vectorization, loop unrolling, etc.
This means:
- First execution may be slower (compilation overhead).
- Subsequent executions are much faster.
- Long-running queries benefit most.
- Think declaratively - Specify what to compute, not how to iterate.
- Recipes enable real-time - Declarative queries update automatically.
- Vectorization = performance - SIMD operations process multiple elements at once.
- No Python overhead - Computation stays in compiled code.
- Use loops for extraction, not transformation - Get data out, don't transform inside loops.
The paradigm shift:
- Old way: "For each row, multiply X by 2 and store in Y".
- Deephaven way: "Y should always equal X times 2".
This shift unlocks:
- High performance through vectorization.
- Automatic real-time updates.
- Cleaner, more maintainable code.
- Efficient memory usage.