Skip to content

Commit 5107057

Browse files
committed
tests/multi-server: switch broker from redpanda to apache kafka
Redpanda's seastar reactor aborts during init with "close() syscall failed: Invalid argument" on the self-hosted CI runners, regardless of: - redpanda image version (v26.1.6 via :latest and v24.3.15 pinned both fail the same way); - sandbox configuration (default, seccomp:unconfined + apparmor:unconfined, and privileged:true all hit the same error); - seastar tuning (--mode dev-container, explicit --overprovisioned / --unsafe-bypass-fsync / --reserve-memory=0M). This is a seastar + runner-kernel interaction we can't unblock from the compose side. apache/kafka:3.9.1 is the official Apache Kafka Docker image, runs the JVM implementation (not seastar), and starts cleanly in the same DinD environment. The wire protocol is identical so kcat on the consumer side and rlm_kafka via librdkafka on the producer side don't care which broker is serving.
1 parent a25df22 commit 5107057

1 file changed

Lines changed: 31 additions & 40 deletions

File tree

src/tests/multi-server/environments/kafka.yml.j2

Lines changed: 31 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -30,53 +30,44 @@ x-common-config: &id001
3030
TEST_SUBNET: {{ test_subnet | default('172.16.0.0/12') }}
3131
services:
3232
kafka:
33-
# Pinned to a specific 24.3.x release instead of :latest - the
34-
# current :latest (v26.1.6) hits a close() EINVAL during seastar
35-
# reactor init on the self-hosted CI runners even when the
36-
# container runs privileged. Bump when a newer tag is verified
37-
# to start cleanly on the runner.
38-
image: docker.redpanda.com/redpandadata/redpanda:v24.3.15
39-
# Redpanda's seastar reactor aborts during init under the default
40-
# Docker sandbox on self-hosted CI runners (close() EINVAL on an
41-
# internal fd). Run privileged so the broker starts reliably; it's
42-
# only exposed on this compose network.
43-
privileged: true
44-
# Override the default command to advertise the broker under its
45-
# compose service name. Without this the broker tells clients to
46-
# reconnect at 127.0.0.1:9092 which only works when client and
47-
# broker share a network namespace.
48-
#
49-
# `--mode dev-container` is Redpanda's blessed single-node dev
50-
# preset: it relaxes IO/memory probing so the broker boots on a
51-
# shared CI runner without needing root, hugepages, or a tuned
52-
# seastar profile. Everything else is the minimum we need on top.
53-
#
54-
command:
55-
- redpanda
56-
- start
57-
- --mode
58-
- dev-container
59-
- --kafka-addr
60-
- PLAINTEXT://0.0.0.0:9092
61-
- --advertise-kafka-addr
62-
- PLAINTEXT://kafka:9092
63-
- --smp=1
64-
- --memory=1G
65-
- --node-id=0
33+
# Apache Kafka (KRaft mode, no ZooKeeper). Previously we used
34+
# redpanda here, but its seastar reactor aborts during init with
35+
# close() EINVAL on the self-hosted CI runners under every
36+
# sandbox/privilege combination we tried, across multiple image
37+
# versions. The apache/kafka JVM image starts cleanly in the
38+
# same environment.
39+
image: apache/kafka:3.9.1
40+
environment:
41+
# Single-node KRaft combined controller + broker. `kafka` is
42+
# the compose service name and is what the other containers
43+
# connect to.
44+
KAFKA_NODE_ID: "0"
45+
KAFKA_PROCESS_ROLES: controller,broker
46+
KAFKA_CONTROLLER_QUORUM_VOTERS: 0@kafka:9093
47+
KAFKA_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093
48+
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
49+
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
50+
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
51+
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
52+
# Single-node, so replicas and the transaction log must all
53+
# fit on one broker.
54+
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: "1"
55+
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: "1"
56+
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: "1"
57+
# Auto-create topics on first produce (kcat + rlm_kafka both
58+
# rely on this in the tests).
59+
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"
6660
restart: unless-stopped
6761
#
68-
# Redpanda needs longer than you'd think on a busy CI runner: the
69-
# first-launch data directory scaffolding can take 30-60s before
70-
# rpk admin is reachable, and the cluster-health probe only returns
71-
# OK once the controller topic has settled. Give it up to ~4 min
72-
# before declaring the container unhealthy.
62+
# JVM startup is faster than seastar's probing but still needs
63+
# a grace period before the controller topic is ready.
7364
#
7465
healthcheck:
75-
test: ["CMD-SHELL", "rpk cluster health --exit-when-healthy"]
66+
test: ["CMD-SHELL", "/opt/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --list >/dev/null 2>&1"]
7667
interval: 5s
7768
timeout: 10s
7869
retries: 24
79-
start_period: 120s
70+
start_period: 60s
8071

8172
kafka-producer1:
8273
image: freeradius-build:latest

0 commit comments

Comments
 (0)