Release Found: Red Hat Enterprise Linux 4 and later running Red Hat Cluster Suite
When using a qdisk to bolster quorum in a cluster, it is often useful to have multipath backing the device for redundancy and to avoid single points of failure. However, several settings should be adjusted in order to allow each component (cman, qdiskd, and multipathd) enough time to determine if a failure has occurred before passing the error to the next layer.
In general, the following ratios are recommended for each component:
Multipath failover time is controlled by the following settings in /etc/multipath.conf:
polling_interval 5
no_path_retry 3
The above settings mean that multipath will check all paths to all multipath devices every 5 seconds and will retry I/O to a particular map 3 times before failing it (for a maximum total of 20 seconds). Possible values for no_path_retry are 'queue' (never fail I/O to the map but queue it in memory and re-submit later), 'fail' (immediately fail I/O to the map when all paths fail), or an integer n > 0 (queue I/O to the map in memory and retry n times before failing it). Using 'queue' is not recommended for a multipathed qdisk.
Thus when no_path_retry is set to an integer n > 0, the multipath failover is defined as polling_interval * (no_path_retry + 1). Otherwise if no_path_retry is set to 'fail' the multipath failover is equal to polling_interval.
The qdisk failover time can be calculated with the interval and tko attributes from the quorumd tag in /etc/cluster/cluster.conf:
<quorumd device="/dev/mapper/mpath5p1" interval="3" min_score="2" tko="9" votes="1">
The interval controls the length of the qdisk cycle, while tko determines how many cycles can fail before qdisk fails. The forumla for qdisk failover is: qdisk failover = interval * tko.
The cman failover time is controlled differently in Red Hat Enterprise Linux 4 and 5. For version 4, it can be adjusted with the cman deadnode_timeout setting (in seconds) in /etc/cluster/cluster.conf:
<cman deadnode_timeout="54"></cman>
In version 5 it can be adjusted with the totem token setting (in milliseconds):
<totem token="54000"></totem>
On Red Hat Enterprise Linux 5 qdisk works just as another node in the cluster. However, the connection to qdisk occurs only in the local node, meaning that in every local node, there is a qdisk which connects via cman, and adds its votes through libcman. Therefore, qdisk does act like another node in a slightly different manner.
Qdisk is composed of two main threads. The main thread is responsible for the main loop and also for the I/O operations, while the second thread is responsible for the heuristics. One of the responsibilities of the main thread of qdisk is to send hello messages to cman and ais saying that it is present. If qdisk takes more than quorum_dev_poll seconds to send a hello to cman, then cman will declare qdisk dead and will print a message saying that connectivity to the quorum device was lost. For example:
Oct 21 10:43:31 node1 openais[2677]: [CMAN ] lost contact with quorum device
Because the qdisk thread that sends the hello to cman is the same thread that does the I/O to disk, there is the possibility where qdisk will become hung in state 'D' if the storage is unavailable for a number of seconds and the requests do not return with errors. If multipath is being used, it is necessary to take the time to failover in consideration so that we can permit the paths to failover properly and come back. Therefore the quorum_dev_poll should be configured to be greater than the value of multipath failover. The same applies to the usage of multipath with the feature queue_if_no_path. If the qdisk process stays in 'D' state for longer than quorum_dev_poll, CMAN/AIS will disconnect it only on the node where this event happened.
Note: quorum_dev_poll must be lesser than totem token.
Therefore to increase the quorum_dev_poll to 50 seconds (note that ais parameters are in milliseconds):
<cman quorum_dev_poll="50000"></cman>
Please also notice that on Red Hat Enterprise Linux 5 it is not possible to change these values while the cluster is running. To make the new totem token and the new cman quorum_dev_poll effective, it is necessary to reboot the entire cluster.