问题的开始

我在托管提供商处有一台专用服务器,最近我的节点导出器检测到我的 RAID 1 阵列 /dev/md3 上的磁盘 io 饱和度很高。我已检查硬盘的 smartctl,发现阵列中的两个驱动器都显示大量读取错误:

[root@ovh-ds03 ~]# smartctl /dev/sda -a | grep Err
Error logging capability:        (0x01) Error logging supported.
     SCT Error Recovery Control supported.
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       65538
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0



[root@ovh-ds03 ~]# smartctl /dev/sdb -a | grep Err
Error logging capability:        (0x01) Error logging supported.
     SCT Error Recovery Control supported.
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       65536
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

我请求支持人员更换 2 个磁盘,但他们并没有更换,而是添加了 2 个磁盘,并在 2 个新磁盘上重建了阵列。一切都很好,但现在阵列处于降级状态,我因此收到警报,名为️NodeRAIDDegraded,在服务器上检查,是的,它处于降级状态:

[root@ovh-ds03 ~]# mdadm --detail /dev/md3
/dev/md3:
           Version : 1.2
     Creation Time : Sat Mar 30 18:18:26 2024
        Raid Level : raid1
        Array Size : 1951283200 (1860.89 GiB 1998.11 GB)
     Used Dev Size : 1951283200 (1860.89 GiB 1998.11 GB)
      Raid Devices : 4
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sat Sep 14 19:30:44 2024
             State : active, degraded
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : md3
              UUID : 939ad077:07c22e9e:ae62fbf9:4df58cf9
            Events : 55337

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3

我该如何修复它?

我曾尝试测试从头开始重建阵列等各种解决方案

mdadm --assemble --scan


最佳答案
1

为了修复它并移除已移除的磁盘,您需要减少 RAID 阵列中的磁盘数量,在您的例子中,您从 2 个扩展到 4 个,现在您需要从 4 个缩减为 2 个,可以这样做

mdadm --grow --raid-devices=2 /dev/md3

修复后 RAID 阵列如下所示:

[root@ovh-ds03 ~]# mdadm --detail /dev/md3
/dev/md3:
           Version : 1.2
     Creation Time : Sat Mar 30 18:18:26 2024
        Raid Level : raid1
        Array Size : 1951283200 (1860.89 GiB 1998.11 GB)
     Used Dev Size : 1951283200 (1860.89 GiB 1998.11 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sat Sep 14 19:33:15 2024
             State : active
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : md3
              UUID : 939ad077:07c22e9e:ae62fbf9:4df58cf9
            Events : 55484

    Number   Major   Minor   RaidDevice State
       2       8       35        0      active sync   /dev/sdc3
       3       8       51        1      active sync   /dev/sdd3