Data recovery and disk repair questions and discussions related to old-fashioned SATA, SAS, SCSI, IDE, MFM hard drives - any type of storage device that has moving parts
February 27th, 2025, 16:09
This case just came to me:
I have a TrueNAS Core server running with 24 SAS disks (900GB each) connected to an HP P822 RAID controller, configured in RAID 50. The problem started after a power outage, during which one of the disks failed due to an electronic issue. I was able to repair the disk by using a donor board and transferring the ROM from the original disk. However, despite the repair, the RAID controller did not recognize the RAID until the donor disk was in place.
After replacing the failed disk, the RAID was recognized again, and TrueNAS could detect the volume. However, the pool is still showing as "degraded," even though all the physical disks are now healthy. I performed a scrub on the volume, and it detected 1,066,810 errors. I encounter I/O read errors when trying to copy large files, and there are errors accessing specific LBAs (Logical Block Addresses).
Since TrueNAS uses ZFS, it is not ideal to run ZFS on top of a hardware RAID, and this is likely causing issues with data integrity and the management of the disks. ZFS needs direct access to the individual drives for its own redundancy and error correction features, but in this case, the RAID controller abstracts the disks and hides them from TrueNAS.
I am currently performing a bit-by-bit copy of the RAID array before making any changes to TrueNAS, to ensure I have a backup in case anything goes wrong. I’m hoping someone might be able to shed some light on this issue. I know the HP P822 cards are quite problematic in general, so any advice or insights would be greatly appreciated.
February 27th, 2025, 16:18
1 failed drive on a raid 50 shouldn't have made any difference to anything, I'd start your investigations looking for issues with the controller rather than at a file system level.
February 27th, 2025, 16:27
Lardman wrote:1 failed drive on a raid 50 shouldn't have made any difference to anything, I'd start your investigations looking for issues with the controller rather than at a file system level.
I even connected the 24 discs directly to a workstation with an LSI card, as the shelve of disks have a mini sas connector. I am getting errors the same way with any software: PC3000, UFS explorer...
February 27th, 2025, 16:27
speakerbox wrote:This case just came to me:
I have a TrueNAS Core server running with 24 SAS disks (900GB each) connected to an HP P822 RAID controller, configured in RAID 50. The problem started after a power outage, during which one of the disks failed due to an electronic issue. I was able to repair the disk by using a donor board and transferring the ROM from the original disk. However, despite the repair, the RAID controller did not recognize the RAID until the donor disk was in place.
After replacing the failed disk, the RAID was recognized again, and TrueNAS could detect the volume. However, the pool is still showing as "degraded," even though all the physical disks are now healthy. I performed a scrub on the volume, and it detected 1,066,810 errors. I encounter I/O read errors when trying to copy large files, and there are errors accessing specific LBAs (Logical Block Addresses).
Since TrueNAS uses ZFS, it is not ideal to run ZFS on top of a hardware RAID, and this is likely causing issues with data integrity and the management of the disks. ZFS needs direct access to the individual drives for its own redundancy and error correction features, but in this case, the RAID controller abstracts the disks and hides them from TrueNAS.
I am currently performing a bit-by-bit copy of the RAID array before making any changes to TrueNAS, to ensure I have a backup in case anything goes wrong. I’m hoping someone might be able to shed some light on this issue. I know the HP P822 cards are quite problematic in general, so any advice or insights would be greatly appreciated.
Edit: I even connected the 24 discs directly to a workstation with an LSI card, as the shelve of disks have a mini sas connector. I am getting errors the same way with any software: PC3000, UFS explorer...
February 27th, 2025, 16:51
There's no value in working at the array volume level, it could be faulty. Image the drive individually and try reassembly in software.
February 27th, 2025, 16:57
Lardman wrote:There's no value in working at the array volume level, it could be faulty. Image the drive individually and try reassembly in software.
Thanks for the input. Will do.
Powered by phpBB © phpBB Group.