Another good (shoudl I say brilliant?) information from our collegue Elianne van der Kamp.
Yesterday we discovered an issue with Windows 2008 clusters: manually added persistent routes disappear from the active routes table, when taking offline (or failing over) a cluster group containing an ip-address-resource.
This issue is documented here. This same article also describes a workaround for when you have multiple gateways on multiple NIS’c.
By changing your route add command from e.g. <route add 10.1.0.0 mask 255.255.255.0 10.1.0.1 –p> to <route add 10.1.0.0 mask 255.255.255.0 0.0.0.0 if 25>
With this second command you bind the route to the interface instead of an ip-address. And since it is now bound to a local device any cluster failover will leave the route in the routing table.
However this will not solve the issue we discovered yesterday: We are using 2 gateways ‘behind’ the same interface. So binding the route to the interface will not help here.
Example interface 18: 192.168.251.36 mask 255.255.255.0 192.168.251.1, with added route 192.168.250.0 mask 255.255.255.0 192.168.251.3 –p.
When an ip-address will be taken offline (fails over) the Active route 192.168.250.0 255.255.255.0 192.168.251.3 will be removed.
Accidentally we found out that adding the interface to the route will solve this new issue (thanks our collegue Enrico). So our new route command will have to look like this:
<Route add 192.168.250.0 mask 255.255.255.0 192.168.251.3 if 18>. This will leave the route in the active routes table.
Why does this work? And is it reliable?
Since we couldn’t find any google/Microsoft hits on this particular issue, we had to do a little registry digging.
The standard command <Route add 192.168.250.0 mask 255.255.255.0 192.168.251.3 > just adds the persistent route to the registry which triggers the ‘bug’.
However the new command <Route add 192.168.250.0 mask 255.255.255.0 192.168.251.3 if 18> also makes 14 changes in the cluster part of the registry telling it that this route is bound to the adapter and to be left behind on the local server in case of a failover
So I think it look pretty reliable. We did lots of reboots and failovers on the cluster and the routes seem pretty persistent now..