简介:
Multipath: 这个多路径软件在 Linux 平台广泛使用, 他的功能就是可以把一个快设备对应的多条路径聚合成一个单一的 multipath device. 主要目的有如下两点:
多路径冗余 (redundancy): 当配置在 Active/Passive 模式下, 只有一半的路径会用来做 IO, 如果 IO 路径上有任何失败 (包括, 交换机故障, 线路故障, 后端存储故障等), 可以自动切换的备用路线上, 对上层应用做到基本无感知.
提高性能 (Performance): 当配置在 Active/Active 模式下, 所以路径都可以用来跑 IO(如以 round-robin 模式), 可以提高 IO 速率或者延时.
multipath 不是本文的重点, 如有需要, 请移步: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/setup_overview
安装及使用:
Multipath: 这个多路径软件在 Linux 平台广泛使用, 在 Debian/Ubuntu 平台可以通过 sudo apt-get install multipath-tools 安装, RedHat/CentOS 平台可以通过 sudo yum install device-mapper-multipath 安装.
multipath.conf: multipath 对于主流的存储阵列都有默认的配置, 可以支持存储阵列的很多自带特性, 如 ALUA. 当然用户可以在安装好后, 手动创建 / etc/multipath.conf
以下是 VNX/Unity 的参考配置 (vnx cinder driver https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/dell-emc-vnx-driver.html#best-practice ):
- blacklist {
- # Skip the files under /dev that are definitely not FC/iSCSI devices
- # Different system may need different customization
- devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
- devnode "^hd[a-z][0-9]*"
- devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
- # Skip LUNZ device from VNX
- device {
- vendor "DGC"
- product "LUNZ"
- }
- }
- defaults {
- user_friendly_names no
- flush_on_last_del yes
- }
- devices {
- # Device attributed for EMC CLARiiON and VNX series ALUA
- device {
- vendor "DGC"
- product ".*"
- product_blacklist "LUNZ"
- path_grouping_policy group_by_prio
- path_selector "round-robin 0"
- path_checker emc_clariion
- features "1 queue_if_no_path"
- hardware_handler "1 alua"
- prio alua
- failback immediate
- }
- }
Multipath 在 OpenStack 中的应用及 faulty device 的产生:
OpenStack 中, multipath 可以使用在 Nova 和 Cinder 的节点上, 提供对后端存储的高可用访问. 在很早之前, 这部分代码是分别在 Nova 和 Cinder 项目里面的, 渐渐的为了维护方便, 就单独拧出来一个项目: https://github.com/openstack/os-brick
os-brick 里面很重要的两个 interface 是: connect_volume - 负责链接一个存储上的 LUN 或者 disk,disconnect_volume - 辅助断开与存储上一个 LUN 的链接.
什么是 faulty device
当 host 上 multipath 软件发现对应的 host path 不可访问时, 就会显示为 faulty 状态.
关于所有状态的描述, 可以参考: https://en.wikipedia.org/wiki/Linux_DM_Multipath
os-brick 的代码我选择的是比较早期容易产生 faulty device 的版本: https://github.com/openstack/os-brick/blob/liberty-eol/os_brick/initiator/connector.py
connect_volume 的主要逻辑如下:
- @synchronized('connect_volume')
- def connect_volume(self, connection_properties):
- """Attach the volume to instance_name.
- connection_properties for iSCSI must include:
target_portal(s) - ip and optional port
target_iqn(s) - iSCSI Qualified Name
target_lun(s) - LUN id of the volume
Note that plural keys may be used when use_multipath=True
- ""device_info = {'type':'block'}
- if self.use_multipath:
- # Multipath installed, discovering other targets if available
- try:
- ips_iqns = self._discover_iscsi_portals(connection_properties)
- except Exception:
- raise exception.TargetPortalNotFound(
- target_portal=connection_properties['target_portal'])
- if not connection_properties.get('target_iqns'):
- # There are two types of iSCSI multipath devices. One which
- # shares the same iqn between multiple portals, and the other
- # which use different iqns on different portals.
- # Try to identify the type by checking the iscsiadm output
- # if the iqn is used by multiple portals. If it is, it's
- # the former, so use the supplied iqn. Otherwise, it's the
- # latter, so try the ip,iqn combinations to find the targets
- # which constitutes the multipath device.
- main_iqn = connection_properties['target_iqn']
- all_portals = set([ip for ip, iqn in ips_iqns])
- match_portals = set([ip for ip, iqn in ips_iqns
- if iqn == main_iqn])
- if len(all_portals) == len(match_portals):
- ips_iqns = zip(all_portals, [main_iqn] * len(all_portals))
- for ip, iqn in ips_iqns:
- props = copy.deepcopy(connection_properties)
- props['target_portal'] = ip
- props['target_iqn'] = iqn
- self._connect_to_iscsi_portal(props)
- self._rescan_iscsi()
- host_devices = self._get_device_path(connection_properties)
- else:
- target_props = connection_properties
- for props in self._iterate_all_targets(connection_properties):
- if self._connect_to_iscsi_portal(props):
- target_props = props
- break
- else:
- LOG.warning(_LW(
- 'Failed to connect to iSCSI portal %(portal)s.'),
- {'portal': props['target_portal']})
- host_devices = self._get_device_path(target_props)
- # The /dev/disk/by-path/... node is not always present immediately
- # TODO(justinsb): This retry-with-delay is a pattern, move to utils?
- tries = 0
- # Loop until at least 1 path becomes available
- while all(map(lambda x: not os.path.exists(x), host_devices)):
- if tries>= self.device_scan_attempts:
- raise exception.VolumeDeviceNotFound(device=host_devices)
- LOG.warning(_LW("ISCSI volume not yet found at: %(host_devices)s."
- "Will rescan & retry. Try number: %(tries)s."),
- {'host_devices': host_devices,
- 'tries': tries})
- # The rescan isn't documented as being necessary(?), but it helps
- if self.use_multipath:
- self._rescan_iscsi()
- else:
- if (tries):
- host_devices = self._get_device_path(target_props)
- self._run_iscsiadm(target_props, ("--rescan",))
tries = tries + 1
- if all(map(lambda x: not os.path.exists(x), host_devices)):
- time.sleep(tries ** 2)
- else:
- break
- if tries != 0:
- LOG.debug("Found iSCSI node %(host_devices)s"
- "(after %(tries)s rescans)",
- {'host_devices': host_devices, 'tries': tries})
- # Choose an accessible host device
- host_device = next(dev for dev in host_devices if os.path.exists(dev))
- if self.use_multipath:
- # We use the multipath device instead of the single path device
- self._rescan_multipath()
- multipath_device = self._get_multipath_device_name(host_device)
- if multipath_device is not None:
- host_device = multipath_device
- LOG.debug("Unable to find multipath device name for"
- "volume. Only using path %(device)s"
- "for volume.", {'device': host_device})
- device_info['path'] = host_device
- return device_info
其中重要的逻辑我都用红色标记了, 用来发现 host 上的块设备 device
disconnect_volume 逻辑如下:
- @synchronized('connect_volume')
- def disconnect_volume(self, connection_properties, device_info):
- """Detach the volume from instance_name.
- connection_properties for iSCSI must include:
target_portal(s) - IP and optional port
target_iqn(s) - iSCSI Qualified Name
target_lun(s) - LUN id of the volume
- ""
- if self.use_multipath:
- self._rescan_multipath()
- host_device = multipath_device = None
- host_devices = self._get_device_path(connection_properties)
- # Choose an accessible host device
- for dev in host_devices:
- if os.path.exists(dev):
- host_device = dev
- multipath_device = self._get_multipath_device_name(dev)
- if multipath_device:
- break
- if not host_device:
- LOG.error(_LE("No accessible volume device: %(host_devices)s"),
- {'host_devices': host_devices})
- raise exception.VolumeDeviceNotFound(device=host_devices)
- if multipath_device:
- device_realpath = os.path.realpath(host_device)
- self._linuxscsi.remove_multipath_device(device_realpath)
- return self._disconnect_volume_multipath_iscsi(
- connection_properties, multipath_device)
- # When multiple portals/iqns/luns are specified, we need to remove
- # unused devices created by logging into other LUNs' session.
- for props in self._iterate_all_targets(connection_properties):
- self._disconnect_volume_iscsi(props)
上面的红色代码块, 会把 LUN 对应的 host path 从 kernel 中, 和 multipath mapper 中删除.
注意到, 以上两个接口都是用的同一个叫 (connect_volume) 的锁 (其实就是用 flock 实现的 Linux 上的文件锁)
为了方便描述 faulty device 的产生, 我画了如下的图, 来表示两个接口的关系
如上的流程在非并发的情况下是表现正常的, host 上的 device 都可以正常连接和清理.
但是, 以上逻辑有个实现上的问题, 当高并发情况下, 会产生 faulty device, 考虑一下执行顺序:
右边的 disconnect_volume 执行完毕, 存储上 LUN 对应的 device path(在 / dev/disk/by-path 下可以看到) 和 multipath descriptor(multipath -l 可以看到).
这个时候, connect_volume 锁被释放, 左边的 connect_volume 开始执行, 而右边的 terminate_connection 还没有执行, 也就是说, 存储上还没有移除 host 访问 LUN 的权限, 任何 host 上的 scsi rescan 还是会发现这个 LUN 的 device.
接着, connect_volume 按正常执行, iscsi rescan 和 multipath rescan 都相继执行, 造成在步骤 1) 中已经删除的 device 有重新被 scan 出来.
然后, 右边的 terminate_connection 在存储上执行完成, 移除了 host 对 LUN 的访问, 最终就形成的所谓的 faulty device, 看到的 multipath 输出如下 (两个 multipath descriptor 都是 faulty 的):
$ sudo multipath -ll
3600601601290380036a00936cf13e711 dm-30 DGC,VRAID
size=2.0G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
- |-+- policy='round-robin 0' prio=0 status=active
- | `- 11:0:0:151 sdef 128:112 failed faulty running
- `-+- policy='round-robin 0' prio=0 status=enabled
- `- 12:0:0:151 sdeg 128:128 failed faulty running
3600601601bd032007c097518e96ae411 dm-2 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
- |-+- policy='round-robin 0' prio=0 status=active
- `- #:#:#:# - #:# active faulty running
一般来说, 有 #:#:#:# 输出的 multipath 是可以直接用 sudo multipath -f 3600601601bd032007c097518e96ae411 删除的.
作为第一部分, 到这里 faulty device 的产生介绍完了, 后面在找机会, 介绍下在 os-brick 中如何尽量避免 faulty device 的出现.
参考资料:
RedHat 官方 multipath 的介绍: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/mpio_description
EMC VNX driver doc:https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/dell-emc-vnx-driver.html
Go 实现的块设备连接工具: https://github.com/peter-wangxu/goock
iSCSI Faulty Device Cleanup Script for VNX:https://github.com/emc-openstack/vnx-faulty-device-cleanup
来源: https://www.cnblogs.com/sting2me/p/8849689.html