Replies: 2 comments 12 replies
-
Adding nodes to a cluster does not require restarts, like clients learn via seed nodes about new members so do clusters. You can start a new node pointing it to a few existing nodes and it will join the fully meshed cluster with no restarts needed. So the pattern would be to keep a core stable few servers around and scale out new ones pointing them to those as seeds |
Beta Was this translation helpful? Give feedback.
-
|
I think spinning up another cluster is not necessarily any better than starting a new server in the cluster in the sense that your main issue is rebalancing the existing client connections, regardless of how you add servers. Since you are only using core NATS then no problem adding/removing servers to the cluster 'on the fly'. Then to trigger the re-balancing you would probably have to create a process that you run on a regular basis that looks at how the connections are distributed amongst the servers (so it has to be able to know how many servers are currently in the cluster) and then it can 'kick' connections from the most crowded servers and when those applications re-connect they should have learned from the gossip the complete list of current servers and pick from that to reconnect, and that's one way to re-balance. In any case you want to throttle that re-balancing in order to not create a 'thundering herd' by having suddenly thousands of client applications re-connecting. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
We are running core NATS (no JetStream) for a very high-traffic, low-latency component. Our current setup is a single cluster of 4 seed nodes on AWS Fargate, with Rust applications connecting via the async_nats client.
Our traffic profile is highly bursty and unpredictable: we occasionally see very intense spikes over short periods of time.
To give a sense of scale:
Context and constraints
We understand that horizontal hot scaling within a single NATS cluster is intentionally limited, due to the full-mesh route topology and the associated N(N-1)/2 cost. In practice, adding nodes under load often increases CPU/memory pressure before it helps.
Today:
Approach we are exploring
We are currently evaluating superclusters with gateways as a way to absorb bursts:
This seems aligned with gateway scaling characteristics with a better connection topology cost (Ni(M-1)), but we are still early in testing.
Questions for the community
On the client side:
We are looking for best practices and real world patterns.
If this scenario has not been explored much yet, we would be happy to share feedback once we gain more experience with hot scaling NATS clusters.
Thanks in advance for your insights.
Beta Was this translation helpful? Give feedback.
All reactions