Skip to content

Replace ConnectBox 2 with 3, use DHCP range 192.168.0.0/16#673

Merged
attilaolah merged 22 commits intomainfrom
network-upgrade
Jan 7, 2026
Merged

Replace ConnectBox 2 with 3, use DHCP range 192.168.0.0/16#673
attilaolah merged 22 commits intomainfrom
network-upgrade

Conversation

@attilaolah
Copy link
Copy Markdown
Owner

@attilaolah attilaolah commented Jan 6, 2026

Change uplinx from Sunrise ConnectBox 2 to ConnectBox 3.

The new router is better but less configurable, so the local network is fourced to a /24 block under 192.168../16. Since DHCP reservations actually work now, we can use that, but since the same IP range now has to contain other network equipment, and we are still limited to a /24 range, I've put in an offset so the locker nodes all start at 192.168.1.100..., instead of fragmenting the network even further.

Another improvement is that the ConnectBox supports DuckDNS out of the box. The Inadyn deployment will still keep petting the duck, but the ConectBox will now do so as well, which is an extra layer of redundancy, meaning I get a CNAME that points home even in case the Inadyn deployment fails.

Summary by CodeRabbit

  • Chores

    • Switched primary interfaces to DHCP/automatic IPv6 and removed per-node static interface declarations.
    • Simplified network file handling by using static copies instead of rendered templates; removed template-generated nameserver entries.
    • Revised IPv4/IPv6 addressing and routable ranges, added a node-level offset for host addressing, and updated external/uplink addresses and gateways.
    • Control-plane service host now uses IPv4; DNS selection order adjusted.
  • Documentation

    • Updated networking docs to describe the new IPv4 scheme and global IPv6 prefix.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

Network config moved from template-driven static addressing to DHCP/auto-config: Ansible now deploys a static /etc/network/interfaces (loopback + eth0 DHCP + IPv6 RA), templates generating static IPs and resolv entries were removed, Nix addressing helpers and node offset were added, and Talos per-node static interfaces were deleted.

Changes

Cohort / File(s) Summary
Ansible network files & tasks
ansible/roles/network/files/etc/network/interfaces, ansible/roles/network/tasks/main.yaml, ansible/roles/network/templates/etc/network/interfaces.j2, ansible/roles/network/templates/etc/resolv.conf.j2
Replaced template-driven static interfaces/resolv entries with a copied /etc/network/interfaces snippet (loopback + eth0 DHCP + IPv6 RA). Removed Jinja2 static IPv4/IPv6 blocks and nameserver loop; tasks changed from template to ansible.builtin.copy and removed backup logic.
Ansible hosts template
ansible/roles/network/templates/etc/hosts.j2
Appended FQDN component ({{ index }}.{{ cluster.name }}.{{ cluster.domain }}) to localhost/host lines.
Nix network addressing & helpers
cluster/network.nix, lib/nodes.nix
Introduced offset in node record and local addressing helper; moved node IPv4 range to 192.168.x.y style, adjusted net4/net4Len/net4LenRoutable/net6, re-parameterized external/uplink IPs to local scheme; lib/nodes.nix imports unqualified substring/stringLength, applies node.offset to IPv4 index, and exposes index in extras.
Talos configuration
talos/talconfig.yaml.nix
Removed per-node networkInterfaces static configuration; changed machine.kubelet.network.nameservers ordering from dns4.two ++ dns6.one to dns4.one ++ dns6.two (two occurrences).
Kubernetes / Cilium values
manifests/kube-system/cilium/app/values.yaml.nix
Switched k8sServiceHost selection from control-plane ipv6 to ipv4.
Secrets / domain scoping
manifests/infrastructure/inadyn/app/external-secret.yaml.nix
Introduced with cluster; scoping and replaced direct cluster.domain usage with a local domain variable in inadyn/Cloudflare blocks.
Task files / secret paths
modules/task/taskfiles/flux.yaml.nix
Updated SOPS/GCP secret decryption path to DEVENV_ROOT/bootstrap/${name}/${secret.name}.sops.json.
Documentation
README.md
Rewrote IPv6/networking section to describe a concrete GUA prefix and updated IPv4/IPv6 topology wording to match new addressing scheme.
SOPS rules
.sops.yaml
Adjusted creation_rules path_regex to match bootstrap/external-secrets/.*\.sops\.json instead of previous manifests path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I nibble configs in the night,

DHCP sings and addresses take flight,
Nix counts offsets, hops just right,
Talos drops its static plight,
A rabbit cheers the network's light.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately summarizes the main change: replacing the ConnectBox 2 router with ConnectBox 3 and changing to DHCP with 192.168.0.0/16 addressing, which is the primary objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@attilaolah
Copy link
Copy Markdown
Owner Author

There's also a Cilium issue with IPv6, again, but I need to update Cilium before I can report that, will re-try the bootstrap after the Cilium update.

kubectl -n kube-system logs -f pod/cilium-6b8m8 config
time=2026-01-06T11:34:53.110630034Z level=info msg=Running subsys=cilium-dbg
time=2026-01-06T11:34:53.11549263Z level=error msg="Invoke failed" subsys=cilium-dbg error="unable to create slim k8s client: host must be a URL or a host:port pair: \"https://[2001:1708:2601:d900::468a:5bff:feca:9618]:6443\"" function="client.registerMappingsUpdater (.../k8s/client/rest_config_provider.go:434)"
time=2026-01-06T11:34:53.116445304Z level=info msg="Stopping hive" subsys=cilium-dbg
time=2026-01-06T11:34:53.116783997Z level=info msg="Stopped hive" subsys=cilium-dbg duration=18.765µs
Error: Build config failed: failed to start: failed to populate object graph: unable to create slim k8s client: host must be a URL or a host:port pair: "https://[2001:1708:2601:d900::468a:5bff:feca:9618]:6443"

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI Agents
In @cluster/network.nix:
- Around line 32-33: Fix the typo in the comment: change "nework" to "network"
in the line that reads "and the distance between the nework length and the pod
mask cannot be more than /16" so it correctly reads "and the distance between
the network length and the pod mask cannot be more than /16".
- Line 21: The comment describing net6 as "ULA set by the modem" is incorrect
because net6 = "2001:1708:2601:d900::" is a Global Unicast Address (GUA); update
the comment near the net6 assignment to reflect that it is a GUA (or state the
correct ULA prefix if you intended a ULA, e.g., fd00::/8), and ensure the symbol
net6 and any accompanying comment text are consistent about using a GUA vs ULA.

In @modules/task/taskfiles/flux.yaml.nix:
- Line 222: Update the .sops.yaml rules to include the new secret location by
replacing the old regex manifest path pattern with one that matches the moved
file; specifically change the rule that currently matches
manifests/kube-system/external-secrets/config/.*\.sops\.json to
bootstrap/external-secrets/.*\.sops\.json so files like
bootstrap/external-secrets/gcp-secrets-service-account.sops.json are covered by
the encryption config.
🧹 Nitpick comments (3)
ansible/roles/network/tasks/main.yaml (1)

12-18: Consider restoring backup: true for safer deployments.

The switch from template to copy is appropriate given the static nature of the new interfaces file, and the explicit permissions (0o644) are correct. However, removing backup: true increases risk during network configuration changes—a misconfiguration could cause connectivity loss without an easy rollback path.

🔎 Suggested restoration of backup
 - name: Configure /etc/network/interfaces
   ansible.builtin.copy:
     src: etc/network/interfaces
     dest: /etc/network/interfaces
     owner: root
     group: root
     mode: 0o644
+    backup: true
manifests/kube-system/cilium/app/values.yaml.nix (1)

5-5: Document the IPv6 to IPv4 switch for the Kubernetes service host.

The PR summary mentions this is a workaround for a Cilium bug, but there's no inline comment explaining why IPv4 is used instead of IPv6. Consider adding a comment referencing the Cilium issue or bug tracker for future maintainers.

📝 Suggested documentation
  # Enable KubeProxy replacement.
+  # Workaround: Use IPv4 instead of IPv6 due to Cilium bug.
+  # See: https://github.com/siderolabs/talos/issues/8836
  k8sServiceHost = (builtins.head cluster.nodes.by.controlPlane).ipv4;
lib/nodes.nix (1)

18-19: IPv4 address construction assumes last octet is always ".0".

The logic substring 0 (stringLength node.net4 - 2) node.net4 assumes the base network address always ends with .0 (single-digit zero). While this works for the current configuration (192.168.1.0), it would fail if the last octet were multi-digit (e.g., .10 would incorrectly become .1).

Consider documenting this assumption or using a more robust approach like splitting on "." and replacing the last component.

🔎 More robust alternative
- pre4 = substring 0 (stringLength node.net4 - 2) node.net4;
- ipv4 = "${pre4}.${toString ((toIntBase10 index) + node.offset)}";
+ # Split the network address and replace the last octet
+ parts4 = lib.splitString "." node.net4;
+ pre4 = lib.concatStringsSep "." (lib.take 3 parts4);
+ ipv4 = "${pre4}.${toString ((toIntBase10 index) + node.offset)}";
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b03b647 and 56fc559.

📒 Files selected for processing (11)
  • ansible/roles/network/files/etc/network/interfaces
  • ansible/roles/network/tasks/main.yaml
  • ansible/roles/network/templates/etc/hosts.j2
  • ansible/roles/network/templates/etc/network/interfaces.j2
  • ansible/roles/network/templates/etc/resolv.conf.j2
  • cluster/network.nix
  • lib/nodes.nix
  • manifests/infrastructure/inadyn/app/external-secret.yaml.nix
  • manifests/kube-system/cilium/app/values.yaml.nix
  • modules/task/taskfiles/flux.yaml.nix
  • talos/talconfig.yaml.nix
💤 Files with no reviewable changes (2)
  • ansible/roles/network/templates/etc/resolv.conf.j2
  • ansible/roles/network/templates/etc/network/interfaces.j2
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Snyk scan
  • GitHub Check: Run SonarQube scan
🔇 Additional comments (8)
manifests/infrastructure/inadyn/app/external-secret.yaml.nix (2)

11-20: LGTM! Clean refactoring using with scope.

The introduction of with cluster; simplifies the domain references while maintaining identical functionality. The interpolated domain values in the comments will provide helpful documentation in the generated inadyn configuration.


22-26: LGTM! DuckDNS configuration correctly implements redundancy.

The updated comment accurately reflects the CNAME relationship, and the hostname dornhaus.duckdns.org is intentionally separate from the main domain to provide the redundant DNS update path described in the PR objectives.

ansible/roles/network/templates/etc/hosts.j2 (1)

1-2: No changes needed. The template variables {{ index }}, {{ cluster.name }}, and {{ cluster.domain }} are properly defined through the Nix-based inventory generation system. The cluster.name and cluster.domain variables are set in the inventory group variables (modules/ansible/inventory.yaml.nix), and index is derived per-host from node filenames (lib/nodes.nix). These variables will be available at template rendering time.

Likely an incorrect or invalid review comment.

lib/nodes.nix (1)

6-6: LGTM: Cleaner imports and extended extras.

Simplifying the builtins imports and adding index to the inherited extras improves code clarity and provides additional context for node consumers.

Also applies to: 23-23

cluster/network.nix (3)

4-4: LGTM: Helper function improves maintainability.

The local helper function provides a clean way to construct 192.168.x.y addresses and reduces duplication throughout the network configuration.


7-18: LGTM: Network restructuring aligns with PR objectives.

The changes correctly implement the transition to a 192.168.1.0/24 network with /16 L2 routing and a 100-address offset for DHCP reservations, matching the ConnectBox 3 requirements described in the PR summary.


50-72: LGTM: External addresses and gateways updated correctly.

The external service addresses and uplink gateways have been correctly updated to use the new 192.168.x.y addressing scheme via the local helper function.

talos/talconfig.yaml.nix (1)

98-98: Verify DNS resilience with single IPv4 nameserver.

The nameserver configuration now uses only one IPv4 DNS server (Cloudflare) instead of two, while increasing IPv6 DNS servers from one to two. This reduces IPv4 DNS redundancy.

While the PR summary indicates this change ensures Talos and Alpine share nameservers, please verify that relying on a single IPv4 DNS server meets your availability requirements, especially if IPv6 connectivity is not yet fully reliable in your environment.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @cluster/network.nix:
- Line 35: The comment in cluster/network.nix contains a typo: change "nework"
to "network" in the comment string "# and the distance between the nework length
and the pod mask cannot be more than /16." so it reads "# and the distance
between the network length and the pod mask cannot be more than /16." to correct
the spelling; no code changes required beyond editing that comment.

In @README.md:
- Line 30: Update the incorrect IPv6 prefix string "fd80::/10" to the correct
link-local prefix "fe80::/10" wherever it appears (e.g., the README sentence
mentioning the automatic link-local subnet) so the text correctly identifies
IPv6 link-local addresses as using fe80::/10 instead of the ULA range.
- Line 29: Replace the misleading IPv4 CIDR `192.168.1.0/16` with the correct
network base `192.168.0.0/16` in the README (update the phrase "using
`192.168.1.0/16`" to "using `192.168.0.0/16`") so the CIDR notation accurately
reflects the /16 address range.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56fc559 and f26b115.

📒 Files selected for processing (2)
  • README.md
  • cluster/network.nix
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run SonarQube scan
  • GitHub Check: Run Snyk scan
🔇 Additional comments (5)
cluster/network.nix (5)

4-6: LGTM: Clean helper function for IPv4 addressing.

The local helper function provides a concise way to generate IPv4 addresses within the 192.168.x.x range, making the configuration more maintainable.


9-20: LGTM: Node addressing changes align with PR objectives.

The updated IPv4 addressing scheme correctly reflects the ConnectBox 3 constraints:

  • Node network uses 192.168.1.0/24
  • Routable CIDR correctly spans 192.168.0.0/16
  • Offset of 100 ensures nodes start at 192.168.1.100 as intended

22-23: LGTM: IPv6 comment now correctly identifies GUA prefix.

The comment correctly identifies the prefix as a GUA (Global Unicast Address) rather than ULA, addressing the previous review feedback.


52-68: LGTM: External addresses correctly updated.

The external service addresses now correctly use the local helper function, maintaining consistency with the new 192.168.x.x addressing scheme.


73-74: Confirm the IPv4 gateway address matches the ConnectBox 3 configuration.

The code sets gw4 = local 1 1 (evaluates to 192.168.1.1), but typical ConnectBox 3 devices use 192.168.0.1. Verify this is intentional (e.g., custom network configuration) and confirm the IPv6 link-local address fe80::200:5eff:fe00:103 against the actual device's network status.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Jan 7, 2026

@attilaolah attilaolah changed the title Network upgrade Replace ConnectBox 2 with 3, use DHCP range 192.168.0.0/16 Jan 7, 2026
@attilaolah attilaolah merged commit 8c94309 into main Jan 7, 2026
6 checks passed
@attilaolah attilaolah deleted the network-upgrade branch January 7, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant