Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and others

I’m trying to build up a push system recently. To increase the scalability of the system, the best practice is to make each connection as stateless as possible. Therefore when bottleneck appears, the capacity of the whole system can be easily expanded by adding more machines. Speaking of load balancing and reverse proxying, Nginx is probably the most famous and acknowledged one. However, TCP proxying is a rather recent thing. Nginx introduced TCP load balancing and reverse proxying from v1.9, which is released in late May this year with a lot of missing features. On the other hand, HAProxy, as the pioneer of TCP loading balacing, is rather mature and stable. I chose to use HAProxy to build up the system and eventually I reached a result of 300k concurrent tcp socket connections. I could have achieved a higher number if it were not for my rather outdated client PC.

Step 1. Tuning the Linux system

300k concurrent connection is not a easy job for even the high end server PC. To begin with, we need to tune the linux kernel configuration to make the most use of our server.

File Descriptors

Since sockets are considered equivalent to files from the system perspective, the default file descriptors limit is rather small for our 300k target. Modify /etc/sysctl.conf to add the following lines:

fs.file-max = 10000000 
fs.nr_open = 10000000

These lines increase the total file descriptors’ number to 1 million.
Next, modify /etc/security/limits.conf to add the following lines:

* soft nofile 10000000
* hard nofile 10000000
root soft nofile 10000000
root hard nofile 10000000

If you are a non-root user, the first two lines should do the job. However, if you are running HAProxy as root user, you need to claim that for root user explicitly.

TCP Buffer

Holding such a huge number of connections costs a lot of memory. To reduce memory use, modify /etc/sysctl.conf to add the following lines.

net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216

Step 2. Tuning HAProxy

Upon finishing tuning Linux kernel, we need to tune HAProxy to better fit our requirements.

Increase Max Connections

In HAProxy, there is a “max connection cap” both globally and backend specifically. In order to increase the cap, we need to add a line of configuration under the global scope.

maxconn 2000000

Then we add the same line to our backend scope, which makes our backend look like this:

backend pushserver
        mode tcp
        balance roundrobin
        maxconn 2000000

Tuning Timeout

By default, HAProxy will detect dead connections and close inactive ones. However,  the default keepalive threshold is too low and when applied to a circumstance where connections have to be kept in a long-pulling way. From my client side, my long socket connection to the push server is always closed by HAProxy as the heartbeat is 4 minutes in my client implementation. Heartbeat that is too frequent is a heavy burden for both client (actually android device) and server. To increase this limit, add the following lines to your backend. By default these numbers are all in milliseconds.

 timeout connect 5000
 timeout client 50000
 timeout server 50000

Configuring Source IP to solve port exhaustion

When you are facing simultaneous 30k connections, you will encounter the problem of “port exhaustion”. It is resulted from the fact that each reverse proxied connection will  occupy an available port of a local IP. The default IP range that is available for outgoing connections is around 30k~60k. In other words, we only have 30k ports available for one IP. This is not enough. We can increase this range by modify /etc/sysctl.conf to add the following line.

net.ipv4.ip_local_port_range = 1000 65535

But this does not solve the root problem, we will still run out of ports when the 60k cap is reached.

The ultimate solution to this port exhaustion issue is to increase the number of available IPs. First of all, we bind a new IP to a new virtual network interface.

ifconfig eth0:1 192.168.8.1

This command bind a intranet address to a virtual network interface eth0:1 whose hardware interface is eth0. This command can be executed several times to add arbitrary number of virtual network interfaces. Just remember that the IP should be in the same sub-network of your real application server. In other words, you cannot have any kind of NAT service in your link between HAProxy and application server. Otherwise, this will not work.

Next, we need to config HAProxy to use these fresh IPs. There is a source command that can be used either in a backend scope or as a argument of server command. In our experiment, the backend scope one doesn’t seem to work, so we chose the argument one. This is how HAProxy config file looks like.

backend mqtt
        mode tcp
        balance roundrobin
        maxconn 2000000
        server app1 127.0.0.1:1883 source 192.168.8.1
        server app2 127.0.0.1:1883 source 192.168.8.2
        server app3 127.0.0.1:1883 source 192.168.8.3
        server app4 127.0.0.1:1884 source 192.168.8.4
        server app5 127.0.0.1:1884 source 192.168.8.5
        server app6 127.0.0.1:1884 source 192.168.8.6

Here is the trick, you need to declare them in multiple entries and give them different app names. If you set the same app name for all four entries, the HAProxy will just not work. If you can have a look at the output of HAProxy status report, you will see that even though these entries has the same backend address, HAProxy still treats them as different apps.

That’s all for the configuration! Now your HAProxy should be able to handle over 300k concurrent TCP connections, just as mine.

30 thoughts on “Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and others

  1. Ralf Wenzel

    I'm not sure about the IP source exhaustion solution:
    the "net.ipv4.ip_local_port_range = 1000 65535" tweak makes sense.
    This will allow ~60.000 conns targeting a single backend server (having its own IP in a real world szenario).

    The next 60.000 conns can target the next backend server (having another than the first backend and so on).
    Adding additional IP's to local network interface is only required when targeting a single backend.

    Reply
    1. admin Post author

      Yeah, it's just as you said.

      Our backend server has the ability to handle over 60,000 connections, that's why we have to do this to maximize the capability of the back end server.

      Reply
  2. Slawek

    Hi there,

    Thanks for a great tutorial. I studied it twice trying to fix issue we are having with our Chrome Ext. and a PHP Ratchet backend server. Problem is that there is a limit on HAProxy or PHP itself (or Debian?) that limits number of concurrent connections.

    We had a PHP Websocket server run on port 8080 and limit of concurrent connections was around 1000 connections (1024?), so we have implemented HAProxy and now its LoadBalance traffic from 8080 to 8081, 8082, 8083 and so on (so we have multiple instances of Websocket server on different ports to handle more clients) … unfortunately after hours or digging around (few of thing from your tutorial were already implemented) and changes of configuration 2000 (2048?) is the highest number we can go!

    Do you have any idea what might be wrong? Would you have time to have a look at our setup and infrastructure?

    Thanks!

    Reply
      1. Exocomp

        I don't understand the significance of this comment:

        "Note that if you're connecting to 127.0.0.1, you don't need to bind to a "public" address, just use 127.X.Y.Z, they're all yours!"

        Can you explain in more detail?

        Reply
      1. hos7ein

        Hi

        I use haproxy-1.5.14-3.el7.x86_64 on centos 7.2 whit kernel 3.10.0-327.18.2.el7.x86_64

        I set two ip on haproxy server for Example eth0=10.10.10.1 and Virtual interface eth0:1 = 10.10.10.2 and use one backend server whit IP 10.10.10.11

        I use “source” on configuretion file on haproxy for send request from two IP Address (eth0=10.10.10.1 and eth0:1=10.10.10.2) to backend side,plz see this config :

        backend test
        mode tcp
        log global
        option tcplog
        option tcp-check
        balance roundrobin

        server myapp-A 10.10.10.11:9999 check source 10.10.10.1
        server myapp-B 10.10.10.11:9999 check source 10.10.10.2

        With this scenario,i get 120k connection on backend side (10.10.10.11) and Everything is ok.
        for give more connection I add other backend server for Example 10.10.10.12 , plz see this config :

        backend test
        mode tcp
        log global
        option tcplog
        option tcp-check
        balance roundrobin

        server myapp-A 10.10.10.11:9999 check source 10.10.10.1
        server myapp-B 10.10.10.11:9999 check source 10.10.10.2

        server myapp-C 10.10.10.12:9999 check source 10.10.10.1
        server myapp-D 10.10.10.12:9999 check source 10.10.10.2

        In this scenario i expected give 120k on Each backend server,But no! On each backend server only give 60k conncetion!

        what was wrong?
        can you help me?
        Tnx

        Reply
  3. Haven

    backend mqtt
    mode tcp
    balance roundrobin
    maxconn 2000000
    server app1 127.0.0.1:1883 source 192.168.8.1
    server app2 127.0.0.1:1883 source 192.168.8.2
    server app3 127.0.0.1:1883 source 192.168.8.3
    server app4 127.0.0.1:1884 source 192.168.8.4
    server app5 127.0.0.1:1884 source 192.168.8.5
    server app6 127.0.0.1:1884 source 192.168.8.6

    In above configuration, does it mean that we will have two MQTT nodes run on port 1883 and 1884?

    Reply
  4. Tom

    Setting the hard and soft limits to 10 million like you posted will result in a broken system – this is too much even for our Dell R630's that are running CentOS 6.7 (128GB memory)!

    1 million is the maximum that you can set these to – I think you have a typo.

    Reply
    1. Petrkr

      You need to set more File Descriptors to be able set more than 1 million. I solved that last day and it is hard to google it. Take look for sysctl fs.nr_open there is by default set 1 million and fs.file-max. Then you will be able set ulimit more than 1 million.

      Petr

      Reply
  5. Sushil

    Hello,

    We have two redis web servers behind haproxy, but i need all traffic should go to Redis-web1 only and haproxy should divert traffic to Redis-web2 only when Redis-web1 is down ?

    Is this possible ? Please suggest

    Thanks
    Sushil R

    Reply
  6. n00b Sys

    What happens if one using haproxy to proxy traffic to remote servers?

    Will the virtual network interface still work? I noticed you suing localhost which means apps will be running locally where haproxy is, but for cases where the apps are running on another server does it mean this is still possible?

    If it will be possible then does it mean i will have to create the virtual interfaces on the remote servers? I am guessing that will not be possible right?

    Please let me know if you understand my question.
    Thanks!!!

    Reply
    1. admin Post author

      It's definitely doable, just creating the virtual interface will be more complicated. In the meanwhile, your remote server should be configured to accept multiple connections from the same host.

      Reply
  7. usergoodvery

    Hi,

    What's the significance of having the server listen on two different port numbers to this setup? Server won't have any port exhaustion issues because it is not initiating outbound connections the same way haproxy is.

    regards,

    Reply
    1. admin Post author

      I kind of forgot. It's just a normal server configuration, like 16 physical cores with 64GB RAM IIRC.

      Reply
  8. Pingback: How we fine-tuned HAProxy to achieve 2,000,000 concurrent SSL connections | Cong Nghe Thong Tin - Quang Tri He Thong

  9. phil

    These lines increase the total file descriptors’ number to 1 million.
    Next, modify /etc/security/limits.conf to add the following lines:

    * soft nofile 10000000
    * hard nofile 10000000
    root soft nofile 10000000
    root hard nofile 10000000

    The above setting is harmful, it will prevent you from logging into your server. Apply this with caution

    Reply
  10. Arihant

    I am using haproxy to loadbalance my MQTT brokers cluster. Each MQTT Broker can handle up to 1,00,000 Connections easily. But the problem i am facing with haproxy is that is only handling upto30k connections per node. Whenever if any node is reaching near 32k connections, the haproxy CPU Would suddenly spike to 100% and now all connections start dropping.

    The problem with this is, that for every 30k connection, i have to roll another MQTT broker. How can I increase it to at least 60k connections per MQTT broker node?

    Note: I cannot increase virtual network interfaces in digitalocean vpc.

    My config –
    “`
    bind 0.0.0.0:1883
    maxconn 1000000
    mode tcp

    #sticky session load balancing – new feature
    balance source
    stick-table type string len 32 size 200k expire 30m
    stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
    option clitcpka # For TCP keep-alive
    option tcplog

    timeout client 600s
    timeout server 2h
    timeout check 5000

    server mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
    server mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
    server mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
    “`

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *