Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries.
Endpoint changes:
| Old Path | New Path | Compatible With |
|---|---|---|
/api/v1/write |
/write |
InfluxDB 1.x clients, Telegraf |
/api/v1/write/influxdb |
/api/v2/write |
InfluxDB 2.x clients |
Supported clients (no code changes required):
- Go:
github.com/influxdata/influxdb-client-go - Python:
influxdb-client - JavaScript/Node.js:
@influxdata/influxdb-client - Java:
influxdb-client-java - C#:
InfluxDB.Client - PHP, Ruby, and other official InfluxDB clients
- Telegraf (InfluxDB output plugin)
- Node-RED:
node-red-contrib-influxdb
Usage: Point your existing InfluxDB client at Arc's URL - it just works.
# Python example - works unchanged
from influxdb_client import InfluxDBClient
client = InfluxDBClient(url="http://localhost:8000", token="your-token", org="myorg")
write_api = client.write_api()
write_api.write(bucket="mydb", record="cpu,host=server01 usage=90.5")Authentication methods supported:
| Method | Example | Notes |
|---|---|---|
| Bearer token | Authorization: Bearer <token> |
Standard OAuth2 style |
| Token header | Authorization: Token <token> |
InfluxDB 2.x style |
| API key header | x-api-key: <token> |
Simple header auth |
| Query parameter | ?p=<token> |
InfluxDB 1.x compatibility (username u= is ignored) |
# InfluxDB 1.x style auth with token in password parameter
curl -X POST "http://localhost:8000/write?db=mydb&u=ignored&p=your-token" \
-d 'cpu,host=server01 usage=90.5'Arc-native endpoint preserved:
/api/v1/write/line-protocol- Usesx-arc-databaseheader (no query params)
Arc now supports native MQTT subscription for IoT and edge data ingestion. Connect directly to MQTT brokers to ingest time-series data without requiring additional infrastructure.
Key features:
- Subscribe to multiple MQTT topics with wildcard support (
+,#) - Dynamic subscription management via REST API
- TLS/SSL connections with certificate validation
- Authentication via username/password or client certificates
- Connection auto-reconnect with exponential backoff
- Per-subscription statistics and monitoring
- Passwords encrypted at rest
- Auto-start subscriptions on server restart
Enable MQTT in arc.toml:
[mqtt]
enabled = trueREST API for subscription management:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/mqtt/subscriptions |
Create a new subscription |
GET |
/api/v1/mqtt/subscriptions |
List all subscriptions |
GET |
/api/v1/mqtt/subscriptions/:id |
Get subscription details |
PUT |
/api/v1/mqtt/subscriptions/:id |
Update a subscription |
DELETE |
/api/v1/mqtt/subscriptions/:id |
Delete a subscription |
POST |
/api/v1/mqtt/subscriptions/:id/start |
Start a subscription |
POST |
/api/v1/mqtt/subscriptions/:id/stop |
Stop a subscription |
POST |
/api/v1/mqtt/subscriptions/:id/restart |
Restart a subscription |
GET |
/api/v1/mqtt/subscriptions/:id/stats |
Get subscription statistics |
GET |
/api/v1/mqtt/stats |
Aggregate stats for all subscriptions |
GET |
/api/v1/mqtt/health |
MQTT service health check |
Example - Create subscription via API:
curl -X POST "http://localhost:8000/api/v1/mqtt/subscriptions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "factory-sensors",
"broker": "tcp://mqtt.example.com:1883",
"topics": ["sensors/+/temperature", "sensors/+/humidity"],
"qos": 1,
"database": "iot",
"username": "arc",
"password": "secret",
"auto_start": true,
"topic_mapping": {
"sensors/+/temperature": "temperature",
"sensors/+/humidity": "humidity"
}
}'Example - List subscriptions:
curl -H "Authorization: Bearer $TOKEN" \
http://localhost:8000/api/v1/mqtt/subscriptionsSubscription options:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Unique subscription name |
broker |
string | Yes | Broker URL (tcp://, ssl://, ws://) |
topics |
array | Yes | Topics to subscribe (supports wildcards) |
database |
string | Yes | Target Arc database |
qos |
int | No | QoS level: 0, 1, or 2 (default: 1) |
client_id |
string | No | MQTT client ID (auto-generated if not set) |
username |
string | No | MQTT username |
password |
string | No | MQTT password (encrypted at rest) |
tls_enabled |
bool | No | Enable TLS/SSL |
auto_start |
bool | No | Start on creation and server restart |
topic_mapping |
object | No | Map topics to measurement names |
Arc now supports optional in-memory caching of S3 Parquet files via DuckDB's cache_httpfs community extension. This can significantly improve query performance (5-10x) for workloads with repeated file access.
When this helps:
- CTEs (Common Table Expressions) that read the same table multiple times
- Subqueries accessing the same data
- Grafana dashboards with multiple panels querying similar time ranges
Configuration:
[query]
# Enable S3 file caching (disabled by default)
enable_s3_cache = true
# Cache size (default: 128MB)
s3_cache_size = "256MB"
# Cache TTL in seconds (default: 3600 = 1 hour)
s3_cache_ttl_seconds = 3600Environment variables:
ARC_QUERY_ENABLE_S3_CACHE- Enable/disable cachingARC_QUERY_S3_CACHE_SIZE- Cache size (e.g., "128MB", "256MB")ARC_QUERY_S3_CACHE_TTL_SECONDS- TTL in seconds
Key features:
- In-memory only - No disk caching, preserves Arc's stateless compute philosophy
- Opt-in - Disabled by default, no impact unless enabled
- Configurable - Tune cache size and TTL for your workload
- Graceful degradation - If extension fails to load, Arc continues without caching
Trade-off: Increases memory usage by the configured cache size (default 128MB).
Contributed by @khalid244
Queries using relative time expressions like NOW() - INTERVAL now benefit from partition pruning, dramatically reducing query times for time-filtered queries.
Previously: Queries with relative time filters scanned ALL parquet files because the partition pruner only recognized literal timestamp strings.
Now: Relative time expressions are evaluated at query time and converted to absolute timestamps for proper partition pruning.
Supported expressions:
| Expression | Status |
|---|---|
time >= '2024-03-15' |
✓ Works (existing) |
time > NOW() - INTERVAL '20 days' |
✓ Now works |
time >= CURRENT_TIMESTAMP - INTERVAL '24 hours' |
✓ Now works |
time < NOW() - INTERVAL '7 days' |
✓ Now works |
time > NOW() + INTERVAL '1 day' |
✓ Now works |
Supported time units: seconds, minutes, hours, days, weeks, months
Example - Before vs After:
-- This query now prunes to only relevant partitions
SELECT * FROM production.cpu
WHERE time > NOW() - INTERVAL '4 minutes'
AND time < NOW() - INTERVAL '2 minutes'
-- EXPLAIN ANALYZE shows proper time bounds:
-- Filters: time>'2026-01-07 17:18:02'::TIMESTAMP WITH TIME ZONE
-- AND time<'2026-01-07 17:20:02'::TIMESTAMP WITH TIME ZONEFixed an issue where measurement names containing ASCII control characters (0x01-0x08, 0x0B-0x0C, 0x0E-0x1F) would cause S3 ListObjectsV2 operations to fail with XML parsing errors.
Cause: S3 returns XML 1.0 responses which forbid certain control characters. Measurement names from Line Protocol, MsgPack, and Continuous Query endpoints were not validated, allowing invalid characters to be used as S3 key prefixes.
Fix: Added strict validation for measurement names across all ingestion endpoints:
- Must start with a letter (a-z, A-Z)
- May only contain alphanumeric characters, underscores, and hyphens
- Maximum length of 128 characters
Affected endpoints:
/writeand/api/v2/write(Line Protocol)/api/v1/write/msgpack(MsgPack)/api/v1/continuous-queries(Continuous Query create/update)
Invalid measurement names now return a 400 Bad Request with a descriptive error message.
Fixed an issue where time-filtered queries would fail with "No files found" errors when the requested time range included partitions that don't exist in S3 storage. This particularly affected queries for recent data (< 24 hours) before daily compaction has run.
Cause: The partition pruner generated paths for all hours in a time range without verifying existence. For local storage, it used filepath.Glob() to filter paths, but for S3/Azure storage, paths were passed directly to DuckDB which threw errors for missing partitions.
Additionally, for day-level paths (year/month/day/*.parquet), the pruner only checked if the directory existed (which passes when hourly subdirectories exist), but didn't verify that actual .parquet files exist at the day level.
Fix: Extended filterExistingPaths() to handle S3/Azure storage:
- Uses
ListDirectories()to verify which partition paths actually exist - For day-level paths (5 segments), verifies that
.parquetfiles exist directly at that level (not just in subdirectories) - Filters out non-existent partitions before passing to DuckDB
- Also fixed a pre-existing bug where
filepath.Join()was mangling S3 URLs (s3://bucket→s3:/bucket)
Result: Queries on sparse datasets (with gaps in time partitions) now succeed and return data from existing partitions instead of failing. Grafana dashboards querying recent data (< 24 hours) on S3 now work correctly.
Day-level file verification contributed by @khalid244
Fixed an issue where server.read_timeout and server.write_timeout configuration values were ignored, with the server always using hardcoded 30-second timeouts.
Cause: The timeout values in cmd/arc/main.go were hardcoded to 30 seconds instead of using the loaded configuration values.
Fix: Now uses cfg.Server.ReadTimeout and cfg.Server.WriteTimeout from the configuration, allowing users to customize timeouts via arc.toml or environment variables (ARC_SERVER_READ_TIMEOUT, ARC_SERVER_WRITE_TIMEOUT).
Note: Default values remain at 30 seconds for backward compatibility.
Fixed an issue where large ingestion requests (>4MB) would fail with 413 Request Entity Too Large even though server.max_payload_size was configured to allow larger payloads (default: 1GB).
Cause: The MaxPayloadSize config value was not being passed to Fiber's BodyLimit setting, so Fiber used its default 4MB limit.
Fix: Now passes the configured MaxPayloadSize to Fiber's BodyLimit, allowing payloads up to the configured limit (default 1GB).
Fixed a bug where time.Time values in query results could be returned with the server's local timezone instead of UTC. This caused timestamp inconsistencies when servers were running in non-UTC timezones.
Before: Timestamps in query results used the server's local timezone, potentially causing mismatches with stored UTC data.
After: All timestamps in query results are now explicitly normalized to UTC via .UTC() before formatting, ensuring consistency regardless of server timezone.
Impact: Users querying data will now always receive UTC timestamps. To display in local timezone, convert client-side or use DuckDB's AT TIME ZONE in queries:
SELECT time AT TIME ZONE 'America/New_York' as local_time, value
FROM mydb.cpuFixed SSL certificate validation errors when querying data from Azure Blob Storage on Linux. The DuckDB azure extension was failing to find CA certificates due to path resolution issues with static linking.
Fix: On Linux, Arc now sets azure_transport_option_type = 'curl' which uses the system's curl library for SSL handling instead of DuckDB's built-in implementation.
Contributed by @schotime
Fixed timezone inconsistency in compacted file naming. Files were being named using the server's local timezone instead of UTC, which could cause issues when servers in different timezones processed the same data.
Changes:
- Compacted filenames now use UTC timestamps consistently (
time.Now().UTC()) - Updated daily file parsing to handle the full timestamp format (
YYYYMMDD_HHMMSS) - Removed unused
GetCompactedFilenamemethods from tier implementations
Contributed by @schotime
Fixed compaction failures on S3-compatible storage services (Hetzner, MinIO, etc.) due to missing subprocess configuration.
Issue 1: Credentials not forwarded
- Subprocess fell back to AWS EC2 IMDS for credentials, timing out on non-AWS environments
- Error:
"operation error S3: GetObject, exceeded maximum number of attempts" - Fix: S3 credentials now passed via
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables
Issue 2: SSL config not forwarded
- Subprocess defaulted to HTTP when main process used HTTPS (port mismatch)
- Fix:
use_sslconfig now included in subprocess configuration
Fixed query failures caused by non-UTF-8 characters in ingested data. Users ingesting rsyslog messages or data containing binary/non-UTF-8 content would encounter DuckDB query errors like "is not valid UTF8" when querying the data.
Root cause: Arc had no UTF-8 validation in the ingestion pipeline. Non-UTF-8 bytes passed through to Parquet files, which DuckDB then rejected at query time.
Fix: Added automatic UTF-8 sanitization during ingestion:
- Invalid UTF-8 sequences are replaced with the Unicode replacement character (U+FFFD)
- Applies to both MessagePack and Line Protocol string fields
- Optimized fast-path: valid UTF-8 (99%+ of data) adds only ~6-25ns overhead with zero allocations
- Batch-level logging warns when sanitization occurs (avoids log spam)
Impact: Data with invalid UTF-8 is now queryable. Users see a warning log when sanitization occurs, making it visible without breaking the ingestion flow.
Added support for nanosecond-precision timestamps in the MessagePack ingestion endpoint. This is particularly important for users migrating from InfluxDB, which uses nanosecond timestamps by default.
Root cause: The timestamp auto-detection logic only handled seconds, milliseconds, and microseconds. Nanosecond timestamps (19-digit values like 1737298800000000000) were incorrectly treated as microseconds, resulting in dates far in the future (year ~57,000).
Fix: Extended timestamp detection to recognize nanosecond precision:
- Seconds: < 1e10 (10 digits) → multiply by 1,000,000
- Milliseconds: < 1e13 (13 digits) → multiply by 1,000
- Microseconds: < 1e16 (16 digits) → no conversion
- Nanoseconds: >= 1e16 (19 digits) → divide by 1,000
Note: Line Protocol already correctly handles nanoseconds per the InfluxDB specification.
Fixed an issue where the partition pruner failed to extract time ranges from multi-line SQL queries, causing full table scans instead of targeted partition access.
Root cause: The whereClausePattern regex used .+? which does not match newlines in Go's regex engine. Multi-line queries with WHERE clauses spanning multiple lines would fail to extract time bounds.
Symptoms:
- Queries with newlines in the WHERE clause skipped partition pruning entirely
- Debug logs showed "No time range found in query, skipping partition pruning"
- Resulted in
/**/*.parquetglob patterns (full table scan) instead of targeted partitions - Increased S3 costs and query latency
Fix: Changed .+? to [\s\S]+? in the WHERE clause pattern to explicitly match any character including newlines.
Example query now working:
SELECT region, COUNT(*)
FROM metrics
WHERE time >= '2026-01-21T07:00:00Z'
AND time < '2026-01-21T08:00:00Z'
GROUP BY regionContributed by @khalid244
Fixed an issue where string literals containing SQL keywords (GROUP BY, ORDER BY, LIMIT) would cause the WHERE clause regex to terminate prematurely, potentially missing time range extraction.
Root cause: The whereClausePattern regex stopped at SQL keywords without checking if they were inside string literals. A query like WHERE time >= '2024-01-01' AND message LIKE '%GROUP BY%' would only capture time >= '2024-01-01' AND message LIKE '% before hitting the embedded GROUP BY.
Fix: Added string literal masking before regex matching. String literals are replaced with placeholders (__STR_0__, etc.) during WHERE clause boundary detection, then restored for time value extraction.
Example queries now working:
-- String containing GROUP BY
SELECT * FROM logs
WHERE time >= '2024-03-15' AND error LIKE '%GROUP BY%' AND time < '2024-03-16'
GROUP BY host
-- String containing ORDER BY
SELECT * FROM logs
WHERE time >= '2024-03-15' AND query = 'SELECT * ORDER BY id' AND time < '2024-03-16'
ORDER BY time
-- String containing LIMIT
SELECT * FROM logs
WHERE time >= '2024-03-15' AND msg LIKE '%LIMIT 100%' AND time < '2024-03-16'
LIMIT 50Fixed an issue where buffers configured with max_buffer_age_ms would flush significantly later than configured under high throughput scenarios.
Root cause: Under heavy load, the periodic flush goroutine could be starved or delayed by intensive I/O operations and lock contention. With the ticker firing every max_buffer_age_ms, buffers created between ticker fires had to wait for the next check cycle, and under load this delay compounded.
Symptoms:
- Buffers configured with
max_buffer_age_ms=5000flushing at 6700-7000ms - Buffers configured with
max_buffer_age_ms=1000flushing at 2000ms+ - More pronounced under high throughput with frequent size-based flushes
Fix: The ticker now fires every max_buffer_age_ms / 2 (e.g., every 2500ms for 5000ms config) while keeping the age threshold at max_buffer_age_ms. This gives the periodic flush goroutine more opportunities to run even under heavy load, without flushing buffers prematurely.
Impact:
- Before: age=6700-7000ms (with max_buffer_age_ms=5000)
- After: age=5100-5700ms (with max_buffer_age_ms=5000)
- Improvement: ~25% faster flush timing
- Throughput: Minimal overhead (~1%)
Fixed a panic that occurred during high-concurrency writes when batches had different column sets (schema evolution).
Symptoms:
- Panic:
runtime error: index out of range [N] with length M - Occurred in
sliceColumnsByIndices()during flush operations - More likely with high write concurrency (many workers, large batches)
Root cause: When batches with different schemas were merged (e.g., some records had cpu field, others had temperature), the mergeBatches() function created columns of different lengths. When groupByHour() generated indices from the time column, accessing those indices on shorter columns caused an out-of-bounds panic.
Fix:
mergeBatches()now normalizes all columns to the same length, using zero values for sparse positionssliceColumnsByIndices()now includes defensive bounds checking as an additional safety layer
Fixed an issue where empty hour-level partition directories were left behind after daily compaction consolidated files into day-level partitions.
Before: After daily compaction deleted files from hour folders (database/measurement/YYYY/MM/DD/HH/), the empty directories remained, accumulating over time.
After: Empty directories are now automatically cleaned up after compaction:
- Removes empty hour directories after daily compaction
- Walks up the directory tree (hour → day → month → year) removing empty parents
- Stops at measurement level to preserve database structure
- Only applies to local filesystem storage (S3/Azure don't have physical folders)
- Best-effort cleanup - errors don't fail the compaction job
Fixed critical memory issues in the compactor that caused OOM kills and DuckDB segfaults when compacting partitions with large datasets (2B+ rows, thousands of files).
Root causes:
- Memory loading: Downloads and uploads loaded entire files into memory instead of streaming
- DuckDB memory limit: The subprocess wasn't using the configured
database.memory_limit - Too many files: DuckDB segfaults when processing 8000+ files in a single
read_parquet()call
Fixes applied:
- Streaming I/O: Downloads now use
ReadTo()and uploads useWriteReader()to stream directly to/from disk without loading files into memory - Memory limit passthrough: Compaction subprocess now applies the configured
database.memory_limitto DuckDB - File batching: Partitions with >1000 files are automatically split into batches of 1000 files each, processed sequentially to avoid DuckDB limitations
- Adaptive batch sizing on failure: When compaction fails with recoverable errors (segfault, OOM kill, memory allocation errors), the batch is automatically split in half and retried. This continues recursively until either success or minimum batch size (2 files) is reached. Non-recoverable errors (permission denied, file not found) are not retried.
Result: Compaction now handles tables with billions of rows without OOM or segfaults. Even when individual batches fail due to memory pressure, adaptive splitting ensures eventual success with smaller batches. Query performance improved ~2x after successful compaction due to reduced file count.
Optional profiling: Set ARC_COMPACTION_PROFILE=1 to enable heap profiling during compaction (writes to /tmp/arc_compaction_heap.pprof).
Fixed an issue where compaction temp directories (./data/compaction/{job_id}/) accumulated on disk when compaction subprocesses crashed or were OOM-killed.
Root cause: Compaction runs in a subprocess for memory isolation. Each subprocess has a defer cleanupTemp() for cleanup, but when the subprocess is killed (SIGKILL, OOMKilled), the defer never executes.
Fix: Two-layer cleanup strategy:
- Startup cleanup:
CleanupOrphanedTempDirs()runs on Arc startup to remove orphaned directories from previous runs/crashes - Parent-side cleanup: After each subprocess completes (success or failure), the parent process removes the job's temp directory
Result: Temp directories are now cleaned up even when:
- Subprocess crashes or is OOM-killed mid-operation
- Pod crashes and restarts
- Arc is restarted after abnormal shutdown
Fixed an issue where compaction crashes could cause data duplication. If a pod crashed after uploading the compacted file but before deleting the source files, restarting compaction would re-compact the same data.
Root cause: No tracking mechanism existed to know which compaction operations were in progress. After a crash, Arc had no way to determine if a compacted file had been successfully uploaded.
Fix: Manifest-based tracking stored in S3 at _compaction_state/:
- Before compaction starts, a manifest is written with input files and expected output
- After successful upload, the manifest tracks what needs to be deleted
- On startup, orphaned manifests are recovered - either completing deletions or retrying compaction
- Stale manifests (older than 7 days) are automatically deleted with a warning
Features:
- Manifests stored in S3, preserving compute/storage separation
- Size validation detects partial uploads
- Files tracked by manifests are excluded from re-compaction candidates
- New metric:
arc_compaction_manifests_recovered_total
Contributed by @khalid244
Fixed data loss during S3 outages. Previously, when S3 was unavailable, data in the Arrow buffer would be lost after the WAL callback (which preserves data for recovery) was a no-op since ArrowBuffer wasn't available at that point.
Root cause: The WAL recovery callback was registered before ArrowBuffer was initialized, making it impossible to replay records through the normal ingestion path.
Fix: Complete WAL recovery implementation:
- Startup recovery: On startup, any WAL files from previous runs are replayed through
ArrowBuffer.WriteColumnarDirect() - Periodic recovery: Background goroutine runs every
recovery_interval_seconds(default: 300s) to recover from transient S3 failures - Active file protection: Periodic recovery skips the currently active WAL file to avoid reading uncommitted entries
- Backpressure handling: Records are replayed in configurable batches (
recovery_batch_size, default: 10000) to avoid overwhelming the buffer
New configuration options:
[wal]
recovery_interval_seconds = 300 # How often to check for WAL files to recover (default: 5 minutes)
recovery_batch_size = 10000 # Max records per recovery batch (default: 10000)New metrics:
arc_wal_records_preserved_total- Records written to WAL for potential recoveryarc_wal_recovery_total- Number of WAL recovery operationsarc_wal_recovery_records_total- Records successfully recovered from WAL
Contributed by @khalid244
Fixed X-Arc-Database header queries not routing to cold tier data and database listing not showing cold-only databases.
Problem: When data was fully migrated to cold tier (S3/Azure) and no longer existed in hot tier (local), queries using the X-Arc-Database header would fail with "No files found" errors. Additionally, GET /api/v1/databases and SHOW DATABASES wouldn't list databases that only existed in cold storage.
Fix:
- Query routing now checks tiering metadata and builds multi-tier
read_parquet()expressions when cold tier data exists - Database and measurement listing APIs now merge results from both hot and cold tiers
SHOW DATABASESandSHOW TABLESSQL commands now include cold tier data- Added new
tiercolumn toSHOW DATABASESoutput showing 'local', 'hot', 'cold', or 'hot,cold'
New API methods in tiering metadata:
GetAllDatabases()- List all databases across tiersGetMeasurementsByDatabase()- List measurements in a database across tiersGetTiersForDatabase()- Get which tiers contain data for a database
Retention policies now work with all storage backends (local, S3, Azure) instead of just local filesystem.
Problem: When S3 or Azure was configured as the primary storage backend, retention policies silently did nothing because the implementation used filesystem-only operations (filepath.Walk, os.Remove).
Fix:
- Refactored
deleteOldFiles()to use storage backend interface (List(),Delete()) - Refactored
getMeasurementsToProcess()to usestorage.List()for measurement discovery - Added
buildParquetPath()helper to construct correct DuckDB paths for each backend type - Fixed
getFileMaxTimeAndRowCount()to handletime.Timedirectly from DuckDB driver
Supported backends:
- Local filesystem:
/path/to/base/database/measurement/... - S3:
s3://bucket/database/measurement/... - Azure:
azure://container/database/measurement/...
Retention policies now clean up empty directories after deleting parquet files on local filesystem storage.
Problem: When retention policies deleted old parquet files, the empty YYYY/MM/DD/HH directory structure was left behind, causing directory clutter over time.
Fix:
- Added
cleanupEmptyDirectories()using existingDirectoryRemoverinterface - Recursively removes empty parent directories up to measurement level
- Only applies to local filesystem (S3/Azure don't have physical directories)
- Follows same pattern as compaction cleanup
Added configurable query timeout to prevent indefinite hangs when S3 becomes unavailable during query execution.
Problem: When S3 connectivity was lost mid-query, DuckDB would hang waiting for its internal HTTP timeout (120+ seconds), causing queries to appear frozen and client connections to timeout unpredictably.
Fix: New query.timeout configuration with context-based cancellation:
- All query endpoints (JSON, Arrow, Estimate) now respect the timeout
- Returns HTTP 504 Gateway Timeout when exceeded
- Profiled queries also support timeout via new
QueryWithProfileContextmethod
Configuration:
[query]
timeout = 300 # Query timeout in seconds (default: 300s, 0 = no timeout)Environment variable: ARC_QUERY_TIMEOUT
New metric: arc_query_timeouts_total - Counter of queries that exceeded the timeout
Note: The context cancellation signals the timeout but doesn't immediately stop DuckDB's internal HTTP operations. The query will return 504 to the client while DuckDB completes in the background.
Contributed by @khalid244
Server idle timeout and graceful shutdown timeout are now configurable instead of being hardcoded.
New configuration options:
| Setting | Config Key | Environment Variable | Default |
|---|---|---|---|
| Idle Timeout | server.idle_timeout |
ARC_SERVER_IDLE_TIMEOUT |
120 seconds |
| Shutdown Timeout | server.shutdown_timeout |
ARC_SERVER_SHUTDOWN_TIMEOUT |
30 seconds |
Example configuration:
[server]
idle_timeout = 300 # 5 minutes
shutdown_timeout = 60 # 1 minuteThis completes the server timeout configuration options alongside the existing read_timeout and write_timeout settings.
Queries using time_bucket() and date_trunc() are now automatically rewritten to epoch-based arithmetic, providing 2-2.5x performance improvement for GROUP BY queries without any code changes. This optimization is particularly impactful for time-series aggregation queries that group data by time intervals.
How it works:
time_bucket('1 hour', time)→to_timestamp((epoch(time)::BIGINT / 3600) * 3600)time_bucket('30 minutes', time)→to_timestamp((epoch(time)::BIGINT / 1800) * 1800)date_trunc('day', time)→to_timestamp((epoch(time)::BIGINT / 86400) * 86400)date_trunc('hour', time)→to_timestamp((epoch(time)::BIGINT / 3600) * 3600)
Performance results:
| Query | Before | After | Improvement |
|---|---|---|---|
date_trunc('day', time) GROUP BY |
4000ms | 1560ms | 2.6x faster |
date_trunc('hour', time) GROUP BY |
4000ms | 1560ms | 2.6x faster |
time_bucket('1 hour', time) |
2814ms | 1560ms | 1.8x faster |
time_bucket('30 minutes', time) |
2894ms | 1173ms | 2.5x faster |
Supported patterns:
time_bucket()with all interval typesdate_trunc()with second, minute, hour, day, week- 3-argument time_bucket form with origin timestamp
- Multiple time function calls in the same query
Note: time_bucket('1 month', time) and date_trunc('month', time) are preserved as-is because months have variable length.
Fast-path optimization (PR #99): Queries that don't use time_bucket or date_trunc now skip regex processing entirely via a simple strings.Contains check. This eliminates ~21 unnecessary allocations (~44KB) per query, providing an 8.8x speedup for the SQL transformation step on queries without time functions.
Queries spanning multiple time partitions now execute in parallel, providing 2-4x speedup for time-range queries on partitioned data.
How it works:
- When partition pruning identifies 3+ partition paths (configurable), queries execute concurrently
- Each partition query runs in its own goroutine with semaphore-based concurrency control
- Results are merged via a streaming iterator that presents partitions as a single result set
- Default: 4 concurrent partition queries (bounded by DuckDB connection pool)
Example - Query spanning 24 hourly partitions:
SELECT host, AVG(cpu) FROM metrics
WHERE time >= '2024-01-01' AND time < '2024-01-02'
GROUP BY host| Execution Mode | Time |
|---|---|
| Sequential (before) | ~2400ms |
| Parallel 4x (after) | ~600ms |
Configuration:
MinPartitionsForParallel: Minimum partitions to trigger parallel execution (default: 3)MaxConcurrentPartitions: Maximum concurrent partition queries (default: 4)
Benefits:
- Transparent to clients - no query changes required
- Bounded memory usage - results stream incrementally
- Works with existing partition pruning (time-range WHERE clauses)
Aggregation queries on distributed clusters now use two-stage execution, providing 5-20x speedup for cross-shard aggregations.
How it works:
- Scatter phase: Query is rewritten to compute partial aggregates on each shard
- Gather phase: Coordinator merges partial results into final aggregates
Aggregation transformations:
| Function | Shard Query | Coordinator |
|---|---|---|
SUM(x) |
SUM(x) |
SUM(partial_sums) |
COUNT(*) |
COUNT(*) |
SUM(partial_counts) |
AVG(x) |
SUM(x), COUNT(x) |
SUM(sums)/SUM(counts) |
MIN(x) |
MIN(x) |
MIN(partial_mins) |
MAX(x) |
MAX(x) |
MAX(partial_maxes) |
Example - 4-shard cluster:
-- Original query
SELECT region, AVG(latency), COUNT(*) FROM requests GROUP BY region
-- Shard query (sent to each shard)
SELECT region, SUM(latency) AS __partial_sum_0, COUNT(latency) AS __partial_count_0
FROM requests GROUP BY region
-- Coordinator merges: AVG = SUM(__partial_sum_0) / SUM(__partial_count_0)Benefits:
- Reduces network transfer (partial aggregates vs full rows)
- Enables aggregations on datasets larger than any single shard
- Automatic query detection - no hints required
- Falls back to standard execution for unsupported patterns (HAVING, window functions, subqueries)
Added DuckDB configuration settings that improve query performance across all query types. All settings now use SET GLOBAL to ensure they apply consistently across all connections in the DuckDB pool (PR #172).
Settings enabled:
parquet_metadata_cache=true- Caches Parquet file metadata (schema, row group info) to reduce I/O on repeated accessprefetch_all_parquet_files=true- Prefetches all Parquet files for S3 queries, reducing latencypreserve_insertion_order=true- Ensures deterministic results for LIMIT queries
Pool-wide consistency (PR #172):
Previously, DuckDB settings were applied with SET, which only affects the connection that executes the statement. With connection pooling (SetMaxOpenConns), other connections in the pool would not inherit these settings. All settings are now applied with SET GLOBAL to ensure consistent behavior across the entire pool.
Performance impact:
- Aggregation queries (SUM, COUNT, AVG): 18-24% faster
- Full table scans: 2-3% faster
- Repeated queries on same data: benefit from metadata caching
SET GLOBAL fix contributed by @khalid244
Queries using REGEXP_REPLACE or REGEXP_EXTRACT for URL domain extraction are now automatically rewritten to use native string functions, providing 2x+ performance improvement without any code changes.
How it works:
Arc detects common URL domain extraction patterns and rewrites them to equivalent CASE expressions using split_part() and substr():
-- Original query (slow - regex engine overhead)
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS domain
FROM requests
-- Automatically rewritten to (fast - native string functions)
SELECT CASE
WHEN Referer LIKE 'https://www.%' THEN split_part(substr(Referer, 13), '/', 1)
WHEN Referer LIKE 'http://www.%' THEN split_part(substr(Referer, 12), '/', 1)
WHEN Referer LIKE 'https://%' THEN split_part(substr(Referer, 9), '/', 1)
WHEN Referer LIKE 'http://%' THEN split_part(substr(Referer, 8), '/', 1)
ELSE split_part(Referer, '/', 1)
END AS domain
FROM requestsPerformance results:
| Pattern | Before | After | Improvement |
|---|---|---|---|
REGEXP_REPLACE domain extraction |
5700ms | 2600ms | 2.2x faster |
REGEXP_EXTRACT domain extraction |
2600ms | 2100ms | 24% faster |
Supported patterns:
REGEXP_REPLACE(column, '^https?://(?:www\.)?([^/]+)/.*$', '\1')- Full URL to domainREGEXP_EXTRACT(column, '^https?://(?:www\.)?([^/]+)', 1)- Extract domain capture group
Key features:
- Transparent to applications - no query changes required
- Fast-path check skips transformation for queries without regex functions
- Non-matching patterns pass through unchanged (safe fallback)
- Handles all protocol variations:
http://,https://, with/withoutwww.
Query endpoints now support the x-arc-database header for specifying the database context, providing 4-17% performance improvement by skipping database extraction regex patterns.
How it works:
- When
x-arc-databaseheader is set, queries use simple table names (SELECT * FROM cpu) instead ofdb.tablesyntax - The optimized parsing path skips 2 regex pattern matches (
patternDBTable,patternJoinDBTable) - Cross-database queries (
db.tablesyntax) are rejected when header is set to enforce single-database context
Performance results:
| Query Type | Improvement |
|---|---|
| COUNT(*) | 5-6% faster |
| SELECT with LIMIT | 3-7% faster |
| Aggregations (AVG/MIN/MAX) | 10.6% faster |
| GROUP BY queries | 17.3% faster |
| Overall throughput | +5-7.5% |
Usage:
# Header mode (optimized) - simpler SQL, faster parsing
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "x-arc-database: production" \
-d '{"sql": "SELECT * FROM cpu LIMIT 100"}'
# Legacy mode (still supported) - db.table syntax in SQL
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT * FROM production.cpu LIMIT 100"}'Key features:
- Backward compatible - header is optional, existing
db.tablesyntax continues to work - Works with both
/api/v1/query(JSON) and/api/v1/query/arrow(Arrow IPC) endpoints - Cache key includes database from header to ensure correct results
- Follows same pattern as ingestion endpoints which already use
x-arc-databaseheader
When client_id is not specified in the MQTT configuration, Arc now auto-generates a unique client ID using the format arc-{random-suffix}. This prevents client ID collisions when running multiple Arc instances.
Added /api/v1/restart endpoint to restart the MQTT client, allowing configuration changes to be applied without restarting the entire Arc server.
Arc uses a defense-in-depth approach for API token security:
- Storage: All new tokens are hashed with bcrypt (cost factor 10) before storage
- Lookup optimization: SHA256-based prefixes enable O(1) database lookups without exposing tokens
- Cache keys: In-memory cache uses SHA256 for fast key derivation (not security-sensitive)
- Legacy support: Pre-v26 tokens using SHA256 hashes continue to work for backward compatibility
New tokens created since v26 use bcrypt exclusively for storage. The SHA256 usage for cache keys and database indexes is a performance optimization - security is provided by the bcrypt-hashed storage, not the lookup indexes.
The Line Protocol write endpoints have been renamed to match InfluxDB's API paths for client compatibility:
| Old Path (removed) | New Path | Action Required |
|---|---|---|
/api/v1/write |
/write |
Update client config |
/api/v1/write/influxdb |
/api/v2/write |
Update client config |
Impact: If you were using the old Arc-specific paths directly, update your client configuration to use the new paths. If you're using official InfluxDB client libraries, no changes are needed - the new paths are what those clients expect.
Unaffected: /api/v1/write/line-protocol and /api/v1/write/msgpack remain unchanged.
-
MQTT feature: MQTT ingestion is disabled by default. Set
mqtt.enabled = truein your configuration to enable it. -
Empty directory cleanup: The compaction cleanup is automatic and requires no configuration. Existing empty directories from previous compaction runs will not be automatically cleaned up - only new compaction runs will clean up after themselves.
- Added
github.com/eclipse/paho.mqtt.golangfor MQTT client support
Bugfix release addressing Azure Blob Storage backend issues and authentication configuration.
- Fix queries failing with Azure backend - Queries were incorrectly using local filesystem paths (
./data/...) instead of Azure blob paths (azure://...) when using Azure Blob Storage as the storage backend. - Fix compaction subprocess Azure authentication - Compaction subprocess was failing with "DefaultAzureCredential: failed to acquire token" because credentials weren't being passed to the subprocess. Now passes
AZURE_STORAGE_KEYvia environment variable.
- Authentication enabled by default -
auth.enabledis nowtrueby default inarc.tomlfor improved security out of the box.
internal/api/query.go- Add Azure case togetStoragePath()internal/database/duckdb.go- AddconfigureAzureAccess()for DuckDB azure extensioninternal/compaction/manager.go- Pass Azure credentials to subprocess via env varinternal/compaction/subprocess.go- Read Azure credentials from env varinternal/storage/azure.go- AddGetAccountKey()methodarc.toml- Setauth.enabled = trueby default
- If you were relying on authentication being disabled by default, you'll need to explicitly set
auth.enabled = falsein yourarc.toml.
The official Python SDK for Arc is now available on PyPI as arc-tsdb-client.
Installation:
pip install arc-tsdb-client
# With DataFrame support
pip install arc-tsdb-client[pandas] # pandas
pip install arc-tsdb-client[polars] # polars
pip install arc-tsdb-client[all] # all optional dependenciesKey features:
- High-performance MessagePack columnar ingestion (10M+ records/sec)
- Query support with JSON, Arrow IPC, pandas, polars, and PyArrow responses
- Full async API with httpx
- Buffered writes with automatic batching (size and time thresholds)
- Complete management API (retention policies, continuous queries, delete operations, authentication)
- DataFrame integration for pandas, polars, and PyArrow
Quick example:
from arc_client import ArcClient
with ArcClient(host="localhost", token="your-token") as client:
# Write data (columnar format - fastest)
client.write.write_columnar(
measurement="cpu",
columns={
"time": [1633024800000000, 1633024801000000],
"host": ["server01", "server01"],
"usage_idle": [95.0, 94.5],
},
)
# Query to pandas DataFrame
df = client.query.query_pandas("SELECT * FROM default.cpu")Documentation: https://docs.basekick.net/arc/sdks/python
Arc now supports Azure Blob Storage as a storage backend, enabling deployment on Microsoft Azure infrastructure.
Configuration options:
storage_backend = "azure"or"azblob"- Connection string authentication
- Account key authentication
- SAS token authentication
- Managed Identity support (recommended for Azure deployments)
Example configuration:
[storage]
backend = "azure"
azure_container = "arc-data"
azure_account_name = "mystorageaccount"
# Use one of: connection_string, account_key, sas_token, or managed identity
azure_use_managed_identity = trueArc now supports native HTTPS/TLS without requiring a reverse proxy, ideal for users running Arc from native packages (deb/rpm) on bare metal or VMs.
Configuration options:
server.tls_enabled- Enable/disable native TLSserver.tls_cert_file- Path to certificate PEM fileserver.tls_key_file- Path to private key PEM file
Environment variables:
ARC_SERVER_TLS_ENABLEDARC_SERVER_TLS_CERT_FILEARC_SERVER_TLS_KEY_FILE
Example configuration:
[server]
port = 443
tls_enabled = true
tls_cert_file = "/etc/letsencrypt/live/example.com/fullchain.pem"
tls_key_file = "/etc/letsencrypt/live/example.com/privkey.pem"Key features:
- Uses Fiber's built-in
ListenTLS()for direct HTTPS support - Automatic HSTS header (
Strict-Transport-Security) when TLS is enabled - Certificate and key file validation on startup
- Backward compatible - TLS disabled by default
Ingestion concurrency settings are now configurable to support high-concurrency deployments with many simultaneous clients (e.g., 50+ Telegraf agents).
Configuration options:
ingest.flush_workers- Async flush worker pool size (default: 2x CPU cores, min 8, max 64)ingest.flush_queue_size- Pending flush queue capacity (default: 4x workers, min 100)ingest.shard_count- Buffer shards for lock distribution (default: 32)
Environment variables:
ARC_INGEST_FLUSH_WORKERSARC_INGEST_FLUSH_QUEUE_SIZEARC_INGEST_SHARD_COUNT
Example configuration for high concurrency:
[ingest]
flush_workers = 32 # More workers for parallel I/O
flush_queue_size = 200 # Larger queue for burst handling
shard_count = 64 # More shards to reduce lock contentionKey features:
- Defaults scale dynamically with CPU cores (similar to QuestDB and InfluxDB)
- Previously hardcoded values now tunable for specific workloads
- Helps prevent flush queue overflow under high concurrent load
Parquet files are now organized by the data's timestamp instead of ingestion time, enabling proper backfill of historical data.
Key features:
- Historical data lands in correct time-based partitions (e.g., December 2024 data goes to
2024/12/folders, not today's folder) - Batches spanning multiple hours are automatically split into separate files per hour
- Data is sorted by timestamp within each Parquet file for optimal query performance
- Enables accurate partition pruning for time-range queries
How it works:
- Single-hour batches: sorted and written to one file
- Multi-hour batches: split by hour boundary, each hour sorted independently
Example: Backfilling data from December 1st, 2024:
# Before: All data went to ingestion date
data/mydb/cpu/2025/01/04/... (wrong - today's partition)
# After: Data goes to correct historical partition
data/mydb/cpu/2024/12/01/14/... (correct - data's timestamp)
data/mydb/cpu/2024/12/01/15/...Contributed by @schotime
Hourly and daily compaction now have separate schedules and can be triggered manually via API.
API Endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/compaction/hourly |
Trigger hourly compaction |
POST |
/api/v1/compaction/daily |
Trigger daily compaction |
Configuration:
[compaction]
hourly_schedule = "0 * * * *" # Every hour
daily_schedule = "0 2 * * *" # Daily at 2 AMContributed by @schotime
The maximum request payload size for write endpoints is now configurable, with the default increased from 100MB to 1GB.
Configuration options:
server.max_payload_size- Maximum payload size (e.g., "1GB", "500MB")- Environment variable:
ARC_SERVER_MAX_PAYLOAD_SIZE
Example configuration:
[server]
max_payload_size = "2GB"Key features:
- Applies to both compressed and decompressed payloads
- Supports human-readable units: B, KB, MB, GB
- Improved error messages suggest batching when limit is exceeded
- Default increased 10x from 100MB to 1GB to support larger bulk imports
New REST API endpoints for managing databases programmatically, enabling pre-creation of databases before agents send data.
Endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/databases |
List all databases with measurement counts |
POST |
/api/v1/databases |
Create a new database |
GET |
/api/v1/databases/:name |
Get database info |
GET |
/api/v1/databases/:name/measurements |
List measurements in a database |
DELETE |
/api/v1/databases/:name |
Delete a database (requires delete.enabled=true) |
Example usage:
# List databases
curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/api/v1/databases
# Create a database
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "production"}' \
http://localhost:8000/api/v1/databases
# Delete a database (requires confirmation)
curl -X DELETE -H "Authorization: Bearer $TOKEN" \
"http://localhost:8000/api/v1/databases/old_data?confirm=true"Key features:
- Database name validation (alphanumeric, underscore, hyphen; must start with letter; max 64 characters)
- Reserved names protected (
system,internal,_internal) - DELETE respects
delete.enabledconfiguration for safety - DELETE requires
?confirm=truequery parameter - Works with all storage backends (local, S3, Azure)
Arc now configures the DuckDB httpfs extension automatically, enabling direct queries against Parquet files stored in S3.
Key improvements:
- Automatic httpfs extension installation and configuration
- S3 credentials passed to DuckDB for authenticated access
SET GLOBALused to persist credentials across connection pool- Works with standard S3 buckets (note: S3 Express One Zone uses different auth and is not supported by httpfs)
Configuration:
[storage]
backend = "s3"
s3_bucket = "my-bucket"
s3_region = "us-east-2"
# Credentials via environment variables recommended:
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY- Added
ListDirectories()method for efficient partition discovery - Added
ListObjects()method for listing files within partitions - Both local and S3 backends implement the enhanced interface
- Fixed "argument list too long" error when compacting partitions with many files
- Job configuration now passed via stdin instead of command-line arguments
- Supports compaction of partitions with 15,000+ files
- Added row-to-columnar conversion for efficient data ingestion
- Improved buffer management for high-throughput scenarios
- Zstd compression support: Added Zstd decompression for MessagePack payloads. Zstd achieves 9.57M rec/sec with only 5% overhead vs uncompressed (compared to 12% overhead with GZIP at 8.85M rec/sec). Auto-detected via magic bytes - no client configuration required.
- Consolidated type conversion helpers: Extracted common
toInt64(),toFloat64(),firstNonNil()functions, eliminating ~100 lines of duplicate code across the ingestion pipeline. - O(n log n) column sorting: Replaced O(n²) bubble sort with
sort.Slice()for column ordering in schema inference. - Single-pass timestamp normalization: Reduced from 2-3 passes to single pass for timestamp type conversion and unit normalization.
- Result: 7% throughput improvement (9.47M → 10.1M rec/s), 63% p50 latency reduction (8.40ms → 3.09ms), 84% p99 latency reduction (42.29ms → 6.73ms).
- Token lookup index: Added
token_prefixcolumn with database index for O(1) token lookup instead of O(n) full table scan. Reduces bcrypt comparisons from O(n/2) average to O(1-2) per cache miss. - Atomic cache counters: Replaced mutex-protected counters with
atomic.Int64operations, eliminating lock contention on cache hit/miss tracking. - Auth metrics integration: Added Prometheus metrics for authentication requests, cache hits/misses, and auth failures for better observability.
- Consolidated token extraction: Extracted common
ExtractTokenFromRequest()helper eliminating duplicate token header parsing between middleware and auth handler.
- Arrow IPC throughput boost: Arrow IPC query responses now deliver 5.2M rows/sec (80% improvement from 2.88M rows/sec). Full table scans achieve 927M rows/sec (596M records in 685ms).
- SQL transform caching: Added 60-second TTL cache for SQL-to-storage-path transformations. This caches the result of converting table references (e.g.,
FROM mydb.cpu) to DuckDBread_parquet()calls (e.g.,FROM read_parquet('./data/mydb/cpu/**/*.parquet')). Benchmark shows 49-104x speedup on cache hits (~300ns vs 13-37μs per transformation). Particularly beneficial for dashboard refresh scenarios where the same queries are executed repeatedly. - Partition path caching: Added 60-second TTL cache for
OptimizeTablePath()results. Saves 50-100ms per recurring query pattern (significant for dashboard refresh scenarios). - Glob result caching: Added 30-second TTL cache for
filepath.Glob()results. Saves 5-10ms per query for large partition sets by avoiding repeated filesystem operations. - Cache statistics available via
pruner.GetAllCacheStats()for monitoring hit rates.
- Fixed N+1 query pattern in database listing: Listing databases with measurement counts now uses 2 storage calls instead of N+1 (90% reduction for 20 databases).
- Optimized database existence checks: Direct marker file lookup via
storage.Exists()instead of listing all databases (O(1) vs O(n)). - Removed redundant existence checks:
handleListMeasurementsnow combines marker file check with measurement listing in a single flow. - Batch row counting in delete handler: Replaced N individual COUNT queries with single batch query using
read_parquet()with file list. - Combined before/after row counts: Single query with
COUNT(*) FILTERreplaces two separate COUNT queries during delete operations. - Extracted partition pruning helper: Reduced ~190 lines of duplicated code to ~90 lines with
buildReadParquetExpr()helper.
- Fixed DuckDB S3 credentials not persisting across connection pool (changed
SETtoSET GLOBAL) - Fixed compaction subprocess failing with large file counts
- Fixed CTE (Common Table Expressions) support - CTEs now work correctly in queries. Previously, CTE names like
WITH campaign AS (...)were incorrectly converted to physical storage paths, causing "No files found" errors. CTE names are now properly recognized and preserved as virtual table references. - Fixed JOIN clause table resolution -
JOIN database.tablesyntax now correctly converts toread_parquet()paths. Previously onlyFROMclauses were handled. - Fixed string literal corruption in queries - String literals containing SQL keywords (e.g.,
WHERE msg = 'SELECT * FROM mydb.cpu') are no longer incorrectly rewritten. String content is now protected during SQL-to-storage-path conversion. - Fixed SQL comment handling - Comments containing table references (e.g.,
-- FROM mydb.cpu) are no longer incorrectly converted to storage paths. Both single-line (--) and multi-line (/* */) comments are now properly stripped before processing. - Added LATERAL JOIN support -
LATERAL JOIN,CROSS JOIN LATERAL, and other LATERAL join variants now correctly convert table references to storage paths. - Fixed UTC consistency in path generation - Storage paths now consistently use UTC time instead of local timezone, preventing partition misalignment across different server timezones.
Tested at 10.1M records/second with:
- p50 latency: 3.09ms
- p95 latency: 5.16ms
- p99 latency: 6.73ms
- p999 latency: 9.29ms
None
-
S3 credentials: For S3 storage backend, credentials are now also passed to DuckDB for httpfs queries. Ensure
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables are set, or configures3_access_keyands3_secret_keyin the config file. -
Azure backend: New storage backend option. No changes required for existing S3 or local deployments.
-
Token prefix migration: Existing API tokens will be automatically migrated on startup. Legacy tokens are marked with a special prefix and continue to work normally. New tokens and rotated tokens benefit from O(1) lookup performance. No action required.
Thanks to the following contributors for this release:
- @schotime (Adam Schroder) - Data-time partitioning, compaction API triggers, UTC fixes, Azure SSL certificate fix
- @khalid244 - Multi-line WHERE clause regex fix (Issue #146, PR #148), S3 day-level file verification (Issue #144, PR #145), S3 file caching (PR #149), Manifest-based compaction recovery (Issue #157, PR #163), WAL-based S3 recovery (Issue #159, PR #162), Query timeout for S3 disconnection (Issue #151, PR #152)
- Added
github.com/Azure/azure-sdk-for-go/sdk/storage/azblobfor Azure Blob Storage support - Added
github.com/Azure/azure-sdk-for-go/sdk/azidentityfor Azure authentication