|
http://people.redhat.com/nayfield/storage/ - RHEL 4 Storage Quickstart
A multipath.conf file for the EMC Clariion: multipath.conf
Use multipath -ll to see a list of paths. On ITD's EMC you'll see similar output to this:
# multipath -ll 3600601600872100054c5748fcca7da11 [size=300 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [prio=2][active] \_ 1:0:1:0 sdc 8:32 [active][ready] \_ 2:0:1:0 sde 8:64 [active][ready] \_ round-robin 0 [enabled] \_ 1:0:0:0 sdb 8:16 [active][ready] \_ 2:0:0:0 sdd 8:48 [active][ready]
I'm using a Dell 2650 with two QLogic cards to write this documentation. Most of our servers are built with 2 QLogic cards for redundant and load balanced paths to the storage array. If you only have one QLogic card you'll see half as many paths. However, multipathing is still needed as you will still see a path to an active EMC controller and a secondary controller. I am not covering booting from the EMC or having / on the EMC. I assume that you have a redundant, local system disk where the OS is installed and your server is configured to boot from the local disks.
The EMC has redundant controllers. If the primary fails the secondary takes its place. Note that only one controller will let you access the storage volumes at a time. As Linux boots and discovers all the paths and devices the devices that map to an inactive controller will give IO errors. This is normal and does not happen after the individual devices have been merged together to create the multipathed device that you will be mounting. The multipathed device will handle controler failover automagically.
I installed the test server with a very simple test configuration while my FC connections were not plugged in. The main advantage of this method is that you know you have not formatted, configured, erased, or have any part of the system on the EMC. My WebKickstart config looked like this:
version as4 use itd-cls/use/cls package @ NCSU Realm Kit Server package device-mapper-multipath package vim-enhanced part / --size 10240 part swap --recommended part /boot --size 128 part /tmp --size 2048 part /var --size 5120 --grow part /var/cache --size 1024
If you are using RHEL or Realm Linux 5 the device-mapper-multipath is included by default in the server install. The smallest workable system for 5 can be installed with just one packages line:
package @ Realm Linux Server
This machine's system disk is based on a PERC RAID 5 setup. Please use RAID and LVM where appropriate for your production servers' system disks. Note the package device-mapper-multipath line. That package isn't installed by default and contains the utilities we will use.
Next run up2date -fu or yum update on RHEL 5 to install the latest kernel and OpenAFS packages. Power off the system and connect the FC to the EMC. Assuming that volumes have been created and setup on the EMC on your next boot you'll find at least 4 new SCSI devices for one storage volume on the EMC. You see two EMC controllers on each path. You'll also see IO errors as Linux tries to read the partition tables from the devices that map to the inactive controllers. You should note which devices these are, it should be one per storage volume per path. They are not garunteed to be the same /dev/sdX device every boot.
Also, if the volumes have been used before be aware of ext3 filesystem labels and LVM volume names that may conflict with what you have on the system disk. The e2label program is an easy way to inspect and change ext2/3/4 labels.
Next modify the /etc/multipath.conf file. A more verbose version is attached to this page above. However, the following has most of the comments removed. Please note that this file is specific to the EMC Clariion and has the device information set very carefully.
defaults { udev_dir /dev user_friendly_names yes } devnode_blacklist { # wwid 26353900f02796769 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z]" devnode "^cciss!c[0-9]d[0-9]*" # blacklist the system disk devnode "^sda[0-9]*" } devices{ device { vendor "DGC " product "*" path_grouping_policy group_by_prio getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio_callout "/sbin/mpath_prio_emc /dev/%n" path_checker emc_clariion path_selector "round-robin 0" features "1 queue_if_no_path" no_path_retry 300 hardware_handler "1 emc" failback immediate } }
Verbose version of the EMC Clariion multipath.conf: multipath.conf
If you are attached to a less complex SAN device you probably need a multipath.conf file that's much less complex. A good starting point would be the following to allow the multipath tools to scan your block devices and figure out what to do.
blacklist { devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" devnode "^hd[a-z]" devnode "^cciss!c[0-9]d[0-9]*" # blacklist the system disk devnode "^sda[0-9]*" } defaults { user_friendly_names yes }
See the above external documentation links for more details on this file. Also /usr/share/doc/device-mapper-multipath-<VERSION>/multipath.conf.annotated contains great detail.
Now we can create the multipath devices manually with the multipath tool.
[root@uni00ldev ~]# multipath -v2 create: mpath0 (3600601602d09110032389268b2d4da11) [size=250 GB][features="1 queue_if_no_path"][hwhandler="1 emc"] \_ round-robin 0 [prio=2] \_ 1:0:0:0 sdb 8:16 [ready] \_ 2:0:0:0 sdf 8:80 [ready] \_ round-robin 0 \_ 1:0:1:0 sdc 8:32 [ready] \_ 2:0:1:0 sdg 8:96 [ready] create: mpath1 (3600601600872100036674909b6cad811) [size=200 GB][features="1 queue_if_no_path"][hwhandler="1 emc"] \_ round-robin 0 [prio=2] \_ 1:0:2:0 sdd 8:48 [ready] \_ 2:0:2:0 sdh 8:112 [ready] \_ round-robin 0 \_ 1:0:3:0 sde 8:64 [ready] \_ 2:0:3:0 sdi 8:128 [ready]
Once the devices have been created future runs of multipath -v2 will not produce any output. Use the multipath -ll command to view the currently configured paths. It will produce similar output.
So far so good. Start the multipathd service which monitors the pathing. Make sure it starts each boot.
[root@uni00ldev ~]# service multipathd start Starting multipathd daemon: [ OK ] [root@uni00ldev ~]# chkconfig multipathd on
At this point you should see device nodes in /dev/mapper/. There will be nodes that are unmountable and map to the inactive EMC controller. (No you don't have to remount on a controller failure. The daemon knows what to do.) There will be nodes for any partitions that have been previously created.
You can easily convert a server from the EMC drivers to the Linux device-mapper multipath tools. The EMC storage volume are just block devices and the Power Path drivers treat them as such so those volumes can be mounted with the Linux multipathing tools with the data intact.
First remove the Power Path package. Make sure that it removes its tweeks from /etc/rc.d/rc.sysinit. Search for "BEGINPP" in the file to find the modifications that should be removed. Next, setup the multipath tools as described above. As documented below, you should be able to access the partitions or raw devices by the device nodes in /dev/mapper/.
All these devices get confusing fast. Devices are not garunteed to always map to the same storage volume or active controler. There are some best practices to make sure that the storage volumes are persistant and flexible to your needs.
Do not partition. This just makes things more confusing to you, me, and even some of the tools. The node name for that partition may shift. If you need to partition or remove partitions work with one of the underlaying devices rather than the multipathed devices. Using the same example machine as above (look at the multipath tool output) I'm going to remove the partition table that I've found on mpath0.
[root@uni00ldev ~]# fdisk /dev/sdb
I then entered o to create a new table and w to write the table to disk. Afterwards, I needed to reboot so the device mapper would recreate the device nodes for the paths.
[root@uni00ldev ~]# cd /dev/mapper [root@uni00ldev ~]# ls mpath* mpath0 mpath1
That's the device nodes for the two storage volumes I have access to on the EMC.
If you chose to partition or have existing partitions on the EMC volume you will have entries in /dev/mapper like the following.
mpath0 mpath0p0 mpath0p1 mpath0p2
In this case (which does not follow the above disk layout) mpath0 is not mountable. The mpath0p0, mpath0p1... devices refer to the partitions on the volume.
Mounting the raw device is not a best practice but it, of course, works as you'd expect. There are /dev/dm-*, /dev/mapper/*, and /dev/mpath/* device nodes. Always use the nodes in /dev/mapper/ to mount the filesystems. If you use the other nodes in your fstab your system may not boot properly. The nodes in /dev/mapper are created by the device mapper when the drivers are loaded. The other nodes are created by udev and may not exist early enough to be mountable during boot. For example, here are the lines from the /etc/fstab on my test machine:
/dev/mapper/mpath0 /srv/mpath0 ext3 defaults 1 2 /dev/mapper/mpath1 /srv/mpath1 ext3 defaults 1 2
A best practice is to use LVM with the multipathed volumes. This insures device node persistance which can be a problem with multipathing. It also gives you all the advantages of virtualized disk space. Edit the /etc/lvm/lvm.conf file so that the device mapper will scan /dev/mapper for devices and ignore the underlaying SCSI devices and /dev/dm-X. This is done by replacing the filter line with something similar with the below. Notice that I've specifically left my system disk to be scanned by the LVM device mapper.
filter = [ "a|/dev/sda|", "r|/dev/sd.*|", "r|/dev/dm-.*|", "a|/dev/mapper/mpath.*|" ]
Using my setup from above I created my physical volumes and put both PVs into the same volume group.
# pvcreate /dev/mapper/mpath0 # pvcreate /dev/mapper/mpath1 # vgcreate sms_group /dev/mapper/mpath0 /dev/mapper/mpath1
Create your logical volumes. I'll create two with the following commands.
[root@uni00ldev mapper]# lvcreate -L300G -n lv300 sms_group Logical volume "lv300" created [root@uni00ldev mapper]# lvcreate -L149G -n lv150 sms_group Logical volume "lv150" created
Now /dev/mapper contains the following device nodes:
[root@uni00ldev mapper]# ls control mpath0 mpath1 sms_group-lv150 sms_group-lv300
Format the devices.
[root@uni00ldev mapper]# mke2fs -j /dev/mapper/sms_group-lv150 [root@uni00ldev mapper]# mke2fs -j /dev/mapper/sms_group-lv300
Finally, add them to the fstab as normal. Reboot to insure that the system will boot up normally. On my test box I now have this disk layout.
[root@uni00ldev ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 9.9G 1.2G 8.3G 13% / /dev/sda1 122M 22M 94M 19% /boot none 1014M 0 1014M 0% /dev/shm /dev/sda5 2.0G 36M 1.9G 2% /tmp /dev/sda3 52G 124M 50G 1% /var /dev/sda6 1012M 34M 927M 4% /var/cache /dev/mapper/sms_group-lv150 147G 93M 140G 1% /srv/lv150 /dev/mapper/sms_group-lv300 296G 97M 281G 1% /srv/lv300 AFS 8.6G 0 8.6G 0% /afs
The device-mapper-multipath driver in the Linux kernel is at least as featureful as the EMC Power Path drivers. Certain failures on the EMC can cause both drivers to mark paths as being offline. In that or similar situations you may need to manually bring the path back online. For the device-mapper-multipath you can do that thusly. You need to use the underlying device nodes such as sdb, sdc, etc.
# cat "running" > /sys/block/<device>/device/state # cat /sys/block/<device>/device/state running
Much of this knowledge came from various folks on the Red Hat Enterprise Linux 4 (Nahant) Discussion List. Especially the Best Practices section. There are several interesting threads in the archives regarding issues, errors, and understanding this technology.
EMC specific information came from the Native Multipath Failover Based on DM-MPIO for v2.6.x Linux Kernel and EMC Storage Arrays and our storage expert Steven Stewart.
Information also comes from the Multipath Tools Wiki.
The SMS is an EMC Clariion (1)
RealmLinuxServers/Multipathing (2008-05-02 20:17:04에 jjneely가(이) 마지막으로 수정)
|
첫댓글 Multipath linux and EMC CLARiiON
http://www.nxnt.org/2010/08/multipath-linux-and-emc-clariion/