Server has lost contact with failover partner server

If you see multiple events with ID 20255 “Server has lost contact with failover partner server“, this article may be able to help you.
I’ll concentrate on the actual network settings, specificially MTU settings.

Usually, when you see multiple events per minute stating that the Server has lost contact with failover partner server, followed by Server has established contact with failover partner server, the culprit is the MTU setting.

First of all, on both DHCP servers, make sure the network card’s MTU is set to 1500. You can do that by running the following command:

As you can see, the interface’s MTU in the screenshot is already set to 1500. In case yours isn’t, you can adjust it by running the following (where 12 is the Idx of your network card which you retrived earlier with netsh interface ipv4 show interfaces):

If the DHCP servers are virtualized, then make sure the virtual Switch’s MTU is also set to 1500. Here’s how it looks in the vSphere (HTML5)’s interface.

What if the DHCP servers are running on two separated hypervisors (as they should be..) and you’re still facing the same issue? It most likely is an issue related to the underlying network, so you may want your Network admin to check that, however you can still run a couple of tests.
First of all, try running a ping with 1500 bytes, in Windows you can do this with the option -l:

Try to ping from dhcp01 to dhcp02 and vice versa. Once I noticed that was that I was able to ping the DHCP servers with >1500bytes from a different network, but not within the same network and the DHCP servers weren’t able to ping each other with more than 1450ish bytes.

You can also test directly at the ESXi level with:

The other thing you can try, if these are virtualized servers, is to migrate them under the same hypervisor just to check it out and exclude an issue with the virtual network configuration.

Read More