SD-WAN

  • 1.  HA Failover Settings

    Posted 01-16-2019 13:50
    I was asked recently if there is a way to reduce the node failover time in router node HA (High Availability) deployments, so I'm answering it here in case others have the same question.

    For background: a periodic heartbeat is generated between two 128T router HA nodes. Each node of an HA pair generates this heartbeat independently of the other node, over SSH. If a node does not receive a response in a certain amount of time, it decides the other node must be down and switches the system to standalone mode. It then keeps periodically trying to reach the other node, and when it finally succeeds it then switches back to redundant mode. The default timeout is 5 seconds, and the heartbeat is generated every second by default.

    This timeout time can be changed in the local or global init files. It must be at least 1 second, and both nodes of the same router HA pair must use the same value.

    We do not recommend reducing the value below 5 seconds if the nodes are heavily strained by CPU resources, or if the connection between the nodes can frequently fail for more than a few seconds. The downside to reducing the timeout to too low a value is that one or both nodes may incorrectly believe the other node is down when it is not, leading to split-brain behavior or frequent switching between standalone and redundant modes.

    To change the timeout value, add an "haTimers" object to the json init files, with a field for "highAvailabilityDisconnectTime", and a numeric value for the number of seconds for the timeout. The heartbeat will be generated every 1/5th of the timeout time.

    As with other init file settings, this will require a reboot to take effect.

    Example:
    {
    "init": {
    ...
    },
    "haTimers": {
    "highAvailabilityDisconnectTime": 5
    }
    }

    #HA #HighAvailability
    ------------------------------
    Hadriel S Kaplan
    hadriel@128technology.com
    ------------------------------
    ​​