Hi Everybody,
I have two of these. Both were running for about 50K hours, and both are dead:
HDD #1: SMART status was failed (high read error rate), there were about 2000 reallocated sectors in GLIST, and sometimes RAID controller reported that there is no spare sectors to reallocate bad blocks.
After performing low-level (?) format by using "sg_format" utility, there were only 10 defects in GLIST and SMART status changed to "OK".
But it still occasionally reports read errors and GLIST is growing (this drive has ARRE bit set, so it automatically reallocates sectors which are almost dead but can be read, now there are 378 sectors).
Number of 'read errors corrected with delay' is also growing.
Looks like it is completely dead and can't be repaired.
Please correct me if I'm wrong and there are magic procedures in the firmware which may bring it back to life (selfscan? but I have no idea how to start it).
HDD #2: this one is much more interesting. It had about 1000 reallocates and failed SMART status:
"DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]".
After formatting it with sg_format, there are no defects in GLIST, and the number of errors corrected with delays doesn't change. SMART error was
not cleared.
But after few hours of reads and writes, 8 sectors (two groups of 4 subsequent sectors, seems that physical sectors in this drive are 4K) were unable to read.
After overwriting these sectors, read errors were gone and GLIST is still empty, so these were 'soft' bad-blocks.
Currently this drive works perfectly well and even when I decreased its 'Read Retry Count' to 1 (which limits the time sector read time to 60ms per the Savvio 15K.2 manual), there are no read errors.
It also passes short and long self-tests without any issues.
But SMART status has not changed: "DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]"
I have no idea which "data channel" it refers. If this is a SAS channel, I would assume a controller failure (now the drive is running on another controller and I have no information about the controller this drive was running on before).
Or is it an internal HDD interface? May were those 'soft' bad-blocks caused by this mysterious error rate?
Does it makes a sense to replace the PCB on this drive (both drives have the same boards, so I know where I can get one working PCB

)?
Basing on a
photo I found on the Internet, there are Marvell 88i8062-BHC2 controller and 4Mbit SPI flash on the PCB.
Is it enough to replace only SPI flash chip from the old PCB, or this Marvell controller has its own flash?
Both drives are running the same firmware, but I'm concerning about adaptives and other unique stuff.
How could I reset that SMART error after replacing the PCB?
PS: I know that it looks unreasonable to spend a time for repairing this drive, but this is rather a hobby. I want to bring this drive back to life even if I will need more time than I would spend earning money for buying a bunch of similar working drives.