Skip to content

Proper TLS termination option #7927

@mazmar

Description

@mazmar

Proposed change

Leave TLS termination to the components designed to handle TLS, and let the client/server behave as if TLS is not required.

Why should the server configuration even include tls_available, and why should the NATS client care about this option?

TLS handshakes should be performed properly at the proxy layer. TLS should then be completely out of scope for the internal components.

When TLS termination is implemented correctly, the backend server does not need to know anything about it.

Currently, if tls: {} and allow_non_tls are set, the server responds to non-TLS requests with tls_available and forces the client to use TLS anyway. This eventually results in a bad request error because no TLS certificate is provided.

In my setup, I have a Kubernetes cluster where external connections are properly handled through a TLS-terminated port 4221 with valid certificates. That port routes traffic to the internal service nats.nats.svc.cluster.local.

However, I also have multiple local clients connecting directly to nats.nats.svc.cluster.local. I do not want to enable TLS for those connections—it adds unnecessary overhead. I also don’t want to manage an internal CA just to issue cluster.local certificates, distribute that CA, or configure local clients to ignore certificate validation.

This separation is standard in other messaging systems such as AMQP implementations, where TLS handling is independent from the messaging solution itself.

Use case

Scenario
A company runs a Kubernetes-based microservices platform that relies on NATS as the internal messaging backbone. The cluster hosts dozens of internal services that communicate through NATS for event streaming, service coordination, and background job processing.

The platform exposes a public endpoint for external clients (partners, SaaS integrations, and remote services) that must connect securely over the internet.

Architecture
The system is designed with TLS termination at the cluster edge, typically handled by a reverse proxy or ingress component such as NGINX, HAProxy, or Traefik.

The architecture looks like this:

External Clients
       │
       │ TLS (valid public certificate)
       ▼
Ingress / Proxy (TLS termination, port 4221)
       │
       │ Plain TCP
       ▼
NATS Service (nats.nats.svc.cluster.local:4222)
       │
       ├── Internal microservices
       ├── Workers
       └── Local tooling / scripts

External Access
External systems connect securely:

tls://nats.example.com:4221

The proxy:
1. Performs the TLS handshake
2. Validates certificates
3. Terminates TLS
4. Forwards plain TCP traffic to the internal NATS service.

This ensures secure communication over the public internet without exposing the internal cluster.

Internal Access
Inside the Kubernetes cluster, services connect directly to:

nats://nats.nats.svc.cluster.local:4222

These connections remain non-TLS because:
• Traffic never leaves the cluster network.
• Kubernetes networking is already isolated.
• Performance is better without unnecessary encryption overhead.
• No internal PKI or certificate distribution is required.

Internal clients include:
• microservices
• background workers
• scheduled jobs
• local debugging tools

Problem Encountered
When TLS is configured on the NATS server with:

tls: {}
allow_non_tls: true

the server advertises tls_available to clients.

This causes clients to attempt upgrading to TLS even when connecting internally without TLS. Since no TLS certificates are configured for internal connections, the handshake fails and results in bad request errors.

Desired Behavior
In this architecture, TLS should be handled entirely by the proxy layer, and the NATS server should behave as if TLS does not exist.

The desired behavior:
• External connections → TLS handled by proxy
• Internal connections → plain NATS
• NATS server unaware of TLS
• Clients not forced into TLS negotiation

Benefits

  1. Simpler Infrastructure
    No internal certificate authority, certificate distribution, or trust configuration is required.

  2. Better Performance
    Internal services avoid unnecessary TLS overhead.

  3. Clear Separation of Responsibilities
    • Edge proxy → security and TLS
    • Messaging system → message transport

  4. Standard Architecture
    This approach mirrors how many RabbitMQ or other AMQP deployments handle TLS: encryption is often terminated at the network edge rather than enforced inside the messaging layer.

Summary
This use case demonstrates a common Kubernetes deployment pattern:
• TLS at the edge
• plain communication internally

In such architectures, messaging systems like NATS should ideally remain TLS-agnostic internally, allowing secure external access without complicating internal service communication.

Contribution

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalEnhancement idea or proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions