|
The Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical "bonded" interface. The behavior of the bonded interfaces depends upon the mode; generally speaking, modes provide either hot standby or load balancing services. Additionally, link integrity monitoring may be performed.
The ARP monitor operates as its name suggests: it sends ARP queries to one or more designated peer systems on the network, and uses the response as an indication that the link is operating. This gives some assurance that traffic is actually flowing to and from one or more peers on the local network. The generic network traffic received can also be used to state the link is operating.
The MII monitor monitors only the carrier state of the local network interface. It accomplishes this in one of three ways: by depending upon the device driver to maintain its carrier state, by querying the device's MII registers, or by making an ethtool query to the device. (See use_carrier parameter for more information).
The interfaces aggregated are called slave interfaces.
It is critical that either the miimon or arp_interval and arp_ip_target parameters be specified, otherwise serious network degradation will occur during link failures. Very few devices do not support at least miimon, so there is really no reason not to use it.
max_bonds- Specifies the number of bonding devices to create for this instance of the bonding driver. E.g., if max_bonds is 3, and the bonding driver is not already loaded, then bond0, bond1 and bond2 will be created. The default value is 1.
arp_interval - Specifies the ARP link monitoring frequency in milliseconds.
arp_ip_target- Specifies the IP addresses to use as ARP monitoring peers when arp_interval is > 0. Multiple IP addresses must be separated by a comma. At least one IP address must be given for ARP monitoring to function. The maximum number of targets that can be specified is 16.
arp_validate- Specifies whether or not ARP probes and replies should be validated in the active-backup mode. This causes the ARP monitor to examine the incoming ARP requests and replies, and only consider a slave to be up if it is receiving the appropriate ARP traffic (recently introduced).
Possible values are:
- none or 0 - No validation is performed. This is the default.
- active or 1 - Validation is performed only for the active slave.
- backup or 2 - Validation is performed only for backup slaves.
- all or 3 - Validation is performed for all slaves.
For the active slave, the validation checks ARP replies to confirm that they were generated by an arp_ip_target. Since backup slaves do not typically receive these replies, the validation performed for backup slaves is on the ARP request sent out via the active slave. It is possible that some switch or network configurations may result in situations wherein the backup slaves do not receive the ARP requests; in such a situation, validation of backup slaves must be disabled.
miimon- Specifies the MII link monitoring frequency in milliseconds. This determines how often the link state of each slave is inspected for link failures. A value of zero disables MII link monitoring. A value of 100 is a good starting point. The use_carrier option, below, affects how the link state is determined. The default value is 0.
updelay - Specifies the time, in milliseconds, to wait before enabling a slave after a link recovery has been detected.
downdelay - Specifies the time, in milliseconds, to wait before disabling a slave after a link failure has been detected.
use_carrier- Specifies whether or not miimon should use MII or ETHTOOL ioctls vs. netif_carrier_ok() to determine the link status. A value of 1 enables the use of netif_carrier_ok() (faster, better, but not always supported), a value of 0 will use the deprecated MII / ETHTOOL ioctls. The default value is 1.
Modes can be specified by either names or numbers, see below:
Round-robin policy: Transmit packets in sequential order from the first available slave through the last.
Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails.
Additional Parameters:
fail_over_mac- Specifies whether active-backup mode should set all slaves to the same MAC address (the traditional behavior), or, when enabled, change the bond's MAC address when changing the active interface (i.e., fail over the MAC address itself).
Fail over MAC is useful for devices that cannot ever alter their MAC address, or for devices that refuse incoming broadcasts with their own source MAC (which interferes with the ARP monitor).
The down side of fail over MAC is that every device on the network must be updated via gratuitous ARP, vs. just updating a switch or set of switches (which often takes place for any traffic, not just ARP traffic, if the switch snoops incoming traffic to update its tables) for the traditional method. If the gratuitous ARP is lost, communication may be disrupted.
When fail over MAC is used in conjunction with the mii monitor, devices which assert link up prior to being able to actually transmit and receive are particularly suseptible to loss of the gratuitous ARP, and an appropriate updelay setting may be required.
A value of 0 disables fail over MAC, and is the default. A value of 1 enables fail over MAC. This option is enabled automatically if the first slave added cannot change its MAC address. This option may be modified via sysfs only when no slaves are present in the bond.
XOR policy: Transmit based on the selected transmit hash policy.
Additional parameters:
xmit_hash_policy - Selects the transmit hash policy to use for slave selection in balance-xor and 802.3ad modes.
Possible values are:
Uses XOR of hardware MAC addresses to generate the hash.
This algorithm will place all traffic to a particular network peer on the same slave.
This algorithm is 802.3ad compliant.
This policy uses a combination of layer2 and layer3 protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to generate the hash.
This algorithm will place all traffic to a particular network peer on the same slave. For non-IP traffic, the formula is the same as for the layer2 transmit hash policy.
This policy is intended to provide a more balanced distribution of traffic than layer2 alone, especially in environments where a layer3 gateway device is required to reach most destinations.
This algorithm is 802.3ad compliant.
This policy uses upper layer protocol information, when available, to generate the hash. This allows for traffic to a particular network peer to span multiple slaves, although a single connection will not span multiple slaves.
This policy is intended to mimic the behavior of certain switches, notably Cisco switches with PFC2 as well as some Foundry and IBM products.
This algorithm is not fully 802.3ad compliant.
Broadcast policy: transmits everything on all slave interfaces.
IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.
Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option, documented below. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard. Differing peer implementations will have varying tolerances for noncompliance.
Prerequisites:
1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
2. A switch that supports IEEE 802.3ad Dynamic link aggregation.
Most switches will require some type of configuration to enable 802.3ad mode.
Additional parameters:
lacp_rate: It specifies the rate in which we'll ask our link partner to transmit LACPDU packets in 802.3ad mode.
Possible values are:
slow or 0 (default) - Request partner to transmit LACPDUs every 30 seconds
fast or 1 - Request partner to transmit LACPDUs every 1 second
xmit_hash_policy (see the description above)
Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
Prerequisite:
Ethtool support in the base drivers for retrieving the speed of each slave.
Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server.
Receive traffic from connections created by the server is also balanced. When the local system sends an ARP Request the bonding driver copies and saves the peer's IP information from the ARP packet. When the ARP Reply arrives from the peer, its hardware address is retrieved and the bonding driver initiates an ARP reply to this peer assigning it to one of the slaves in the bond. A problematic outcome of using ARP negotiation for balancing is that each time that an ARP request is broadcast it uses the hardware address of the bond. Hence, peers learn the hardware address of the bond and the balancing of receive traffic collapses to the current slave. This is handled by sending updates (ARP Replies) to all the peers with their individually assigned hardware address such that the traffic is redistributed. Receive traffic is also redistributed when a new slave is added to the bond and when an inactive slave is re-activated. The receive load is distributed sequentially (round robin) among the group of highest speed slaves in the bond.
When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies with the selected MAC address to each of the clients. The updelay parameter must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the peers will not be blocked by the switch.
Prerequisites:
1. Ethtool support in the base drivers for retrieving the speed of each slave.
2. Base driver support for setting the hardware address of a device while it is open. This is required so that there will always be one slave in the team using the bond hardware address (the curr_active_slave) while having a unique hardware address for each slave in the bond. If the curr_active_slave fails its hardware address is swapped with the new curr_active_slave that was chosen.
To configure one or multiple bonding channels setup the ifcfg-bond<N> and ifcfg-eth<N> files as if there were only one bonding channel.
Sets up /etc/modprobe.conf. If the two bonding channels have the same bonding option, such as bonding mode, monitoring frequency, etc, add the max_bonds option. For example:
alias bond0 bonding
alias bond1 bonding
options bonding max_bonds=2 mode=balance-rr miimon=100
If the two bonding channels have different bonding options (for example, one is using round-robin mode and one is using active-backup mode), the bonding modules have to load twice with different options. For example, in/etc/modprobe.conf:
install bond0 /sbin/modprobe --ignore-install bonding -o bonding0 mode=0
miimon=100 primary=eth0
install bond1 /sbin/modprobe --ignore-install bonding -o bonding1 mode=1
miimon=50 primary=eth2
If you have more bonding channels, add one install bondN --ignore-install bonding -o bondingN options line per bonding channels.
This is an example how a slave interface is configured in ifcfg-ethN:
DEVICE=eth<N>
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
After the files are configures, restart the network service.
Note: The use of -o bondingX to get different options for multiple bonds was not possible in Red Hat Enterprise Linux 4 GA and 4 Update 1.
Refer to the Red Hat Enterprise Linux 4 Reference guide Section 8.2.3 "Channel Bonding Interface" for more information.
Because initscripts package is updated to fix several problem about bonding, if you are using Red Hat Enterprise Linux 5.3 (or update to initscripts-8.45.25-1.el5), configure multiple bonding channels is very similar to configure single bonding channel. You can setup the ifcfg-bond<N> and ifcfg-eth<X> files as if there were only one bonding channel. You can specify different BONDING_OPTS for different bonding channels so that they can have different mode and other settings.
For example, you can add the following line to /etc/modprobe.conf:
alias bond0 bonding
alias bond1 bonding
And here is an example for ifcfg-bond0 and ifcfg-bond1:
ifcfg-bond0:
DEVICE=bond0
IPADDR=192.168.50.111
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS="mode=0 miimon=100"
ifcfg-bond1:
DEVICE=bond1
IPADDR=192.168.30.111
NETMASK=255.255.255.0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS="mode=1 miimon=50"
This is an example how a slave interface is configured in ifcfg-ethN:
DEVICE=eth<N>
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no
After the files are configures, restart the network service.
Refer to the Red Hat Enterprise Linux 5 Deployment Guide "14.2.3. Channel Bonding Interfaces" for more information.
A bonding interface is virtual but appears to the system as a real interface, so you can have VLANs on top of it, or use tcpdump, or insert firewall as any other common network interface.
There are many bonding issues reported that they were actually real NIC problems, so if the physical NIC device isn't working, bonding will react to that failure. The bonding reaction depends on the bonding setup.
1. Check if the configuration is correct.
2. Understand bonding mode and monitoring are in use.
3. Check /proc/net/bonding/bond<N> file
In this file you will be able to see bonding device information.
For example: driver version, the mode, MII status, MII Polling Interval,
Up Delay, Down Delay, ARP Polling Interval, ARP IP target/s, and etc...
If it is a transient problem, check it few more times.
4. Check /var/log/messages and 'dmesg' output for bonding or slaves interfaces
messages or any other network messages.
5. Check 'ifconfig' output. You should be able to see bond devices and
slaves devices.
6. Use ethtool to diagnose if the slaves are fine.
7. Capture traffic dump (tcpdump) on slaves and on bonding interfaces.
(there was a bug in older kernels that tcpdump wasn't reliable)
Problem 1: The bonding interface isn't working at all.
Step 1 results: configuration is okay! mode=active-backup miimon=100 primary=eth0
Step 2 results: it should be only one slave active, if not, another backup should take over. Bonding is configured to use MII monitor every 100ms
Step 3 results: reports that no current active slave and all slaves have MII status down.
Step 4 results:
bonding: bond0: link status definitely down for interface eth0
bonding: bond0: link status definitely down for interface eth1
bonding: bond0: now running without any active interface!
Step 5 results: normal - no errors, no drops, no overruns, no collisions.
Step 6 results:
ethtool eth0 shows 'Link detected: no'
ethtool eth1 shows 'Link detected: no'
This means both slaves (the real NIC eth0 and eth1) aren't detecting physical network link, so it isn't a bonding issue anymore.
Step 7 results: No traffic at all
Problem 2: The same as problem 1
Step 1 to 3 results: same as problem 1
Step 4 results:
e1000_clean_tx_irq: Detected Tx Unit Hang on Intel NIC 82546EB Gigabit Ethernet
The slaves were e1000 and both had Tx Unit problem, so this is a real NIC problem
Problem 3: bonding is unstable when failover happen
Step 1 results:
configuration ok,
mode=1 primary=eth2 arp_interval=3000 arp_ip_target=198.140.48.126
arp_validate=active
Step 2 results: Only one active interface, arp monitoring with 3 seconds of interval, validation only on active slave.
Step 3 results: reports that no current active slave and all slaves have MII status down. eth2 interface bounces between up and down.
Step 4 results:
During the layer 2 switch failure, this was observed in /var/log/messages:
18:42:17 bonding: bond0: backup interface eth2 is now up
18:42:44 bonding: bond0: backup interface eth2 is now down
18:42:47 bonding: bond0: backup interface eth2 is now up
18:43:05 bonding: bond0: backup interface eth2 is now down
Step 5 results: normal - no errors, no drops, no overruns, no collisions.
Step 6 results: normal, packets coming on eth2 and on others.
Step 7 results: normal, the ethtool reports that eth2 has link and it's stable.
At this point you need to recall how the selected bonding mode works and which parameters are used. In this case it was ARP monitoring, so the active slave interface should send ARP probes and see ARP replies from 198.140.48.126 every 3 seconds, notice that arp_validate is ON for active meaning that only ARP replies will validate the link status as OK. The tcpdump shows that happening confirming this is a real bonding driver problem.
Possible workaround: remove the option arp_validate=active and let any traffic activity validate the link status