当前位置：

首页
/
IT
/
程序
/
Objective-C
/
Multipath 在 OpenStack 中的 faulty device 的成因及解决 (part 1)

Multipath 在 OpenStack 中的 faulty device 的成因及解决 (part 1)

简介:

Multipath: 这个多路径软件在 Linux 平台广泛使用, 他的功能就是可以把一个快设备对应的多条路径聚合成一个单一的 multipath device. 主要目的有如下两点:

多路径冗余 (redundancy): 当配置在 Active/Passive 模式下, 只有一半的路径会用来做 IO, 如果 IO 路径上有任何失败 (包括, 交换机故障, 线路故障, 后端存储故障等), 可以自动切换的备用路线上, 对上层应用做到基本无感知.

提高性能 (Performance): 当配置在 Active/Active 模式下, 所以路径都可以用来跑 IO(如以 round-robin 模式), 可以提高 IO 速率或者延时.

multipath 不是本文的重点, 如有需要, 请移步: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/setup_overview

安装及使用:

Multipath: 这个多路径软件在 Linux 平台广泛使用, 在 Debian/Ubuntu 平台可以通过 sudo apt-get install multipath-tools 安装, RedHat/CentOS 平台可以通过 sudo yum install device-mapper-multipath 安装.

multipath.conf: multipath 对于主流的存储阵列都有默认的配置, 可以支持存储阵列的很多自带特性, 如 ALUA. 当然用户可以在安装好后, 手动创建 / etc/multipath.conf

以下是 VNX/Unity 的参考配置 (vnx cinder driver https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/dell-emc-vnx-driver.html#best-practice ):

blacklist {
    # Skip the files under /dev that are definitely not FC/iSCSI devices
    # Different system may need different customization
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z][0-9]*"
    devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
    # Skip LUNZ device from VNX
    device {
        vendor "DGC"
        product "LUNZ"
        }
}
defaults {
    user_friendly_names no
    flush_on_last_del yes
}
devices {
    # Device attributed for EMC CLARiiON and VNX series ALUA
    device {
        vendor "DGC"
        product ".*"
        product_blacklist "LUNZ"
        path_grouping_policy group_by_prio
        path_selector "round-robin 0"
        path_checker emc_clariion
        features "1 queue_if_no_path"
        hardware_handler "1 alua"
        prio alua
        failback immediate
    }
}

Multipath 在 OpenStack 中的应用及 faulty device 的产生:

OpenStack 中, multipath 可以使用在 Nova 和 Cinder 的节点上, 提供对后端存储的高可用访问. 在很早之前, 这部分代码是分别在 Nova 和 Cinder 项目里面的, 渐渐的为了维护方便, 就单独拧出来一个项目: https://github.com/openstack/os-brick

os-brick 里面很重要的两个 interface 是: connect_volume - 负责链接一个存储上的 LUN 或者 disk,disconnect_volume - 辅助断开与存储上一个 LUN 的链接.

什么是 faulty device

当 host 上 multipath 软件发现对应的 host path 不可访问时, 就会显示为 faulty 状态.

关于所有状态的描述, 可以参考: https://en.wikipedia.org/wiki/Linux_DM_Multipath

os-brick 的代码我选择的是比较早期容易产生 faulty device 的版本: https://github.com/openstack/os-brick/blob/liberty-eol/os_brick/initiator/connector.py

connect_volume 的主要逻辑如下:

@synchronized('connect_volume')
     def connect_volume(self, connection_properties):
         """Attach the volume to instance_name.
         connection_properties for iSCSI must include:

target_portal(s) - ip and optional port

target_iqn(s) - iSCSI Qualified Name

target_lun(s) - LUN id of the volume

Note that plural keys may be used when use_multipath=True

""device_info = {'type':'block'}
         if self.use_multipath:
             # Multipath installed, discovering other targets if available
             try:
                 ips_iqns = self._discover_iscsi_portals(connection_properties)
             except Exception:
                 raise exception.TargetPortalNotFound(
                     target_portal=connection_properties['target_portal'])
             if not connection_properties.get('target_iqns'):
                 # There are two types of iSCSI multipath devices. One which
                 # shares the same iqn between multiple portals, and the other
                 # which use different iqns on different portals.
                 # Try to identify the type by checking the iscsiadm output
                 # if the iqn is used by multiple portals. If it is, it's
                 # the former, so use the supplied iqn. Otherwise, it's the
                 # latter, so try the ip,iqn combinations to find the targets
                 # which constitutes the multipath device.
                 main_iqn = connection_properties['target_iqn']
                 all_portals = set([ip for ip, iqn in ips_iqns])
                 match_portals = set([ip for ip, iqn in ips_iqns
                                      if iqn == main_iqn])
                 if len(all_portals) == len(match_portals):
                     ips_iqns = zip(all_portals, [main_iqn] * len(all_portals))
             for ip, iqn in ips_iqns:
                 props = copy.deepcopy(connection_properties)
                 props['target_portal'] = ip
                 props['target_iqn'] = iqn
                 self._connect_to_iscsi_portal(props)
             self._rescan_iscsi()
             host_devices = self._get_device_path(connection_properties)
         else:
             target_props = connection_properties
             for props in self._iterate_all_targets(connection_properties):
                 if self._connect_to_iscsi_portal(props):
                     target_props = props
                     break
                 else:
                     LOG.warning(_LW(
                         'Failed to connect to iSCSI portal %(portal)s.'),
                         {'portal': props['target_portal']})
             host_devices = self._get_device_path(target_props)
         # The /dev/disk/by-path/... node is not always present immediately
         # TODO(justinsb): This retry-with-delay is a pattern, move to utils?
         tries = 0
         # Loop until at least 1 path becomes available
         while all(map(lambda x: not os.path.exists(x), host_devices)):
             if tries>= self.device_scan_attempts:
                 raise exception.VolumeDeviceNotFound(device=host_devices)
             LOG.warning(_LW("ISCSI volume not yet found at: %(host_devices)s."
                             "Will rescan & retry.  Try number: %(tries)s."),
                         {'host_devices': host_devices,
                          'tries': tries})
             # The rescan isn't documented as being necessary(?), but it helps
             if self.use_multipath:
                 self._rescan_iscsi()
             else:
                 if (tries):
                     host_devices = self._get_device_path(target_props)
                 self._run_iscsiadm(target_props, ("--rescan",))

tries = tries + 1

if all(map(lambda x: not os.path.exists(x), host_devices)):
                 time.sleep(tries ** 2)
             else:
                 break
         if tries != 0:
             LOG.debug("Found iSCSI node %(host_devices)s"
                       "(after %(tries)s rescans)",
                       {'host_devices': host_devices, 'tries': tries})
         # Choose an accessible host device
         host_device = next(dev for dev in host_devices if os.path.exists(dev))
         if self.use_multipath:
             # We use the multipath device instead of the single path device
             self._rescan_multipath()
             multipath_device = self._get_multipath_device_name(host_device)
             if multipath_device is not None:
                 host_device = multipath_device
                 LOG.debug("Unable to find multipath device name for"
                           "volume. Only using path %(device)s"
                           "for volume.", {'device': host_device})
         device_info['path'] = host_device
         return device_info

其中重要的逻辑我都用红色标记了, 用来发现 host 上的块设备 device

disconnect_volume 逻辑如下:

@synchronized('connect_volume')
     def disconnect_volume(self, connection_properties, device_info):
         """Detach the volume from instance_name.
         connection_properties for iSCSI must include:

target_portal(s) - IP and optional port

target_iqn(s) - iSCSI Qualified Name

target_lun(s) - LUN id of the volume

""
         if self.use_multipath:
             self._rescan_multipath()
             host_device = multipath_device = None
             host_devices = self._get_device_path(connection_properties)
             # Choose an accessible host device
             for dev in host_devices:
                 if os.path.exists(dev):
                     host_device = dev
                     multipath_device = self._get_multipath_device_name(dev)
                     if multipath_device:
                         break
             if not host_device:
                 LOG.error(_LE("No accessible volume device: %(host_devices)s"),
                           {'host_devices': host_devices})
                 raise exception.VolumeDeviceNotFound(device=host_devices)
             if multipath_device:
                 device_realpath = os.path.realpath(host_device)
                 self._linuxscsi.remove_multipath_device(device_realpath)
                 return self._disconnect_volume_multipath_iscsi(
                     connection_properties, multipath_device)
         # When multiple portals/iqns/luns are specified, we need to remove
         # unused devices created by logging into other LUNs' session.
         for props in self._iterate_all_targets(connection_properties):
             self._disconnect_volume_iscsi(props)

上面的红色代码块, 会把 LUN 对应的 host path 从 kernel 中, 和 multipath mapper 中删除.

注意到, 以上两个接口都是用的同一个叫 (connect_volume) 的锁 (其实就是用 flock 实现的 Linux 上的文件锁)

为了方便描述 faulty device 的产生, 我画了如下的图, 来表示两个接口的关系

如上的流程在非并发的情况下是表现正常的, host 上的 device 都可以正常连接和清理.

但是, 以上逻辑有个实现上的问题, 当高并发情况下, 会产生 faulty device, 考虑一下执行顺序:

右边的 disconnect_volume 执行完毕, 存储上 LUN 对应的 device path(在 / dev/disk/by-path 下可以看到) 和 multipath descriptor(multipath -l 可以看到).

这个时候, connect_volume 锁被释放, 左边的 connect_volume 开始执行, 而右边的 terminate_connection 还没有执行, 也就是说, 存储上还没有移除 host 访问 LUN 的权限, 任何 host 上的 scsi rescan 还是会发现这个 LUN 的 device.

接着, connect_volume 按正常执行, iscsi rescan 和 multipath rescan 都相继执行, 造成在步骤 1) 中已经删除的 device 有重新被 scan 出来.

然后, 右边的 terminate_connection 在存储上执行完成, 移除了 host 对 LUN 的访问, 最终就形成的所谓的 faulty device, 看到的 multipath 输出如下 (两个 multipath descriptor 都是 faulty 的):

$ sudo multipath -ll

3600601601290380036a00936cf13e711 dm-30 DGC,VRAID

size=2.0G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw

|-+- policy='round-robin 0' prio=0 status=active
| `- 11:0:0:151 sdef 128:112 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:0:151 sdeg 128:128 failed faulty running

3600601601bd032007c097518e96ae411 dm-2 DGC,VRAID

size=1.0G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw

|-+- policy='round-robin 0' prio=0 status=active
  `- #:#:#:# -   #:#   active faulty running

一般来说, 有 #:#:#:# 输出的 multipath 是可以直接用 sudo multipath -f 3600601601bd032007c097518e96ae411 删除的.

作为第一部分, 到这里 faulty device 的产生介绍完了, 后面在找机会, 介绍下在 os-brick 中如何尽量避免 faulty device 的出现.

参考资料:

RedHat 官方 multipath 的介绍: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/dm_multipath/mpio_description

EMC VNX driver doc:https://docs.openstack.org/cinder/queens/configuration/block-storage/drivers/dell-emc-vnx-driver.html

Go 实现的块设备连接工具: https://github.com/peter-wangxu/goock

iSCSI Faulty Device Cleanup Script for VNX:https://github.com/emc-openstack/vnx-faulty-device-cleanup

来源: https://www.cnblogs.com/sting2me/p/8849689.html

与本文相关文章

暂无,快来抢沙发吧！