All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 13 posts ] 
Author Message
 Post subject: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 11th, 2017, 19:53 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
Hi Everybody,

I have two of these. Both were running for about 50K hours, and both are dead:

HDD #1: SMART status was failed (high read error rate), there were about 2000 reallocated sectors in GLIST, and sometimes RAID controller reported that there is no spare sectors to reallocate bad blocks.
After performing low-level (?) format by using "sg_format" utility, there were only 10 defects in GLIST and SMART status changed to "OK".
But it still occasionally reports read errors and GLIST is growing (this drive has ARRE bit set, so it automatically reallocates sectors which are almost dead but can be read, now there are 378 sectors).
Number of 'read errors corrected with delay' is also growing.
Looks like it is completely dead and can't be repaired.
Please correct me if I'm wrong and there are magic procedures in the firmware which may bring it back to life (selfscan? but I have no idea how to start it).


HDD #2: this one is much more interesting. It had about 1000 reallocates and failed SMART status:
"DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]".

After formatting it with sg_format, there are no defects in GLIST, and the number of errors corrected with delays doesn't change. SMART error was not cleared.
But after few hours of reads and writes, 8 sectors (two groups of 4 subsequent sectors, seems that physical sectors in this drive are 4K) were unable to read.
After overwriting these sectors, read errors were gone and GLIST is still empty, so these were 'soft' bad-blocks.

Currently this drive works perfectly well and even when I decreased its 'Read Retry Count' to 1 (which limits the time sector read time to 60ms per the Savvio 15K.2 manual), there are no read errors.
It also passes short and long self-tests without any issues.
But SMART status has not changed: "DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH [asc=5d, ascq=32]"

I have no idea which "data channel" it refers. If this is a SAS channel, I would assume a controller failure (now the drive is running on another controller and I have no information about the controller this drive was running on before).
Or is it an internal HDD interface? May were those 'soft' bad-blocks caused by this mysterious error rate?
Does it makes a sense to replace the PCB on this drive (both drives have the same boards, so I know where I can get one working PCB :) )?

Basing on a photo I found on the Internet, there are Marvell 88i8062-BHC2 controller and 4Mbit SPI flash on the PCB.
Is it enough to replace only SPI flash chip from the old PCB, or this Marvell controller has its own flash?
Both drives are running the same firmware, but I'm concerning about adaptives and other unique stuff.

How could I reset that SMART error after replacing the PCB?


PS: I know that it looks unreasonable to spend a time for repairing this drive, but this is rather a hobby. I want to bring this drive back to life even if I will need more time than I would spend earning money for buying a bunch of similar working drives.


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 12th, 2017, 1:52 
Offline
User avatar

Joined: April 3rd, 2011, 0:19
Posts: 2003
Location: Providence, RI
Your platter surfaces are developing bad sectors. Think of this like old chipping paint on a house. It'll only continue to get worse until you remove the lose paint and re-paint the house. Your data is literally stored in paint coated on the platters. That paint is degrading. Unless you have multi-million dollar equipment for coating hard drive platters at your disposal (to repaint them as it were) they will just continue to degrade.

What you are asking is like asking for the cure to aging. Best you're going to get is some suggestions on how to mask symptoms for a bit. The underlying problem isn't going away.

The drives are rubbish.

_________________
Data Medics - Hard Drive, SSD, and RAID Data Recovery Service Company


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 12th, 2017, 11:54 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
I agree that the first one can't be repaired due to faulty platters and/or heads (perhaps, only 1 surface/head from 4?).

But the platters of the second HDD seems to be OK after I did 'sg_format':
- no defects in G-list
- no read delays anymore
- no read errors with RRC=1 (60 ms per sector limit)
- the drive passes long self test and multiple 'badblocks' scans without errors and remaps

But it still has data channel "DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH" SMART error,
and 2 groups of 4 soft bad-blocks appeared once. I can't reproduce soft bad-blocks anymore,
but it looks like this is a problem of the main chip (or write heads?), so I want to replace the PCB.

But I'm not sure is it enough to replace only a 8-pin 4 Mbit SPI-flash from the original PCB.
Perhaps, Marvell chip may have its own flash also?


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 13th, 2017, 1:56 
Offline

Joined: June 17th, 2017, 18:30
Posts: 37
Location: Russia
Data channel in SAS/SCSI sense codes, or End-to-end error in SATA, means that at some point between a platter and a host drive, the checksum did not match. May be in-drive cache issue, or cabling issue. The most simple thing to try from common troubleshooting is to swap the suspect cable with a different drive and see if the error follows the cable.


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 13th, 2017, 22:41 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
This drive was running with another SAS controller, backplane, and cable when SMART status was changed to failed.
I swapped PCB's and formatted both drives with sg_format, but nothing changed.
Error was not cleared automatically.


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 13th, 2017, 22:46 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
PS: I always liked to know what are 15K drives made of, so there is no HDD #1 anymore :)


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 15th, 2017, 11:50 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
Well... I did some research with HDD #1, which I disassembled recently:
- Heads and platters were probably good (basing on statistics of LBA numbers associated with read errors fetched from SCSI logs).
- Spindle motor and its bearing were OK (windings had good L and Q, hydrodynamic bearing had minimal worn-out for 53,000,000,000 revolutions it did during its life).
- But the actuator bearing was worn-out, and I believe this was the root cause of failure.
This explains why these disks were having phantom read errors and soft bad-blocks.

Perhaps, the second disk has the same issue :(


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 15th, 2017, 14:07 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
BTW: I cleared HDD #2 SCSI logs, but 'data rate' error still persists. Seems that it is latching and will never be cleared by the drive itself.

Anyway, there is no much a sense to reset this error on a drive with bad actuator bearing assembly.
Despite its perfect platters and heads, there is a very little chance that I can do something with the actuator.

Two drives with the same problem... Shame on you, Seagate!


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 25th, 2017, 14:56 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
I did several iterations with 'sg_format' followed by 'badblocks' check and reproduced soft bad-blocks several times (neither random, nor sequential zero-fill caused bad-blocks, but sg_format did).

Assuming actuator bearing failure (I disassembled first HDD, which had similar symptoms, and its actuator bearing was definitely bad), I started end-to-end seek test and when it did more than 10,000,000 seeks.
To be honest, I expected that this drive won't survive such a high load, so I ran sg_format followed by badblocks in a loop to check how it works, and... I didn't seen any G-list defects, unrecoverable errors, or even read errors recovered with delays anymore.
Read curve is smooth and seek time diagram looks vital.
It also passed 2 hours SeaChest random read test.

[asc=5d, ascq=32] error is still there, but everything else works surprisingly well.
Looks like Savvio don't support UART terminal, because there is no activity on the pin which is supposed to be drive TX output.

I will appreciate if somebody help me with enabling UART terminal or clearing SMART error using anything else.


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 26th, 2017, 17:28 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15440
Location: Australia
Perhaps the UART port needs to be enabled in some way? Maybe with jumpers? It stands to reason that the PCB would be manufactured with blank flash, in which case there would need to be some way to program it.

As for adaptives, the firmware update file (B62C_4F0.LOD) contains the text string, "SER NUM MISMATCH", which would imply that the flash does indeed contain adaptive data.


Attachments:
b62c_4f0.rar [553.74 KiB]
Downloaded 494 times

_________________
A backup a day keeps DR away.
Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 27th, 2017, 19:10 
Offline

Joined: December 11th, 2017, 18:19
Posts: 10
Location: Ukraine
There is a 4-pin connector similar to those on Seagate SATA.

All pins are routed on the PCB and supposed pinout is the following (I started to count pins from SAS data connector):
1 - routed to the main (probably) chip, have a 10K pull-up to +2.5V. Only low-side clamp diode is present, so this pin is supposed to be 3.3 or 5-volt tolerant RX. Also it is connected to pin 4 of the 10-pin board edge connector (that one with a notch in the middle, near the heads connector).
2 - routed directly to the main chip, no pull-up and only low-side clamp diode is present. Supposed to be 2.5V UART TX. Also connected to pin 3 of the board edge connector.
3 - GND
4 - routed to the main chip through a 200 Ohm series resistor, have a 10K pull-up to +2.5V. Similar to other pins, no high-side clamp diode for this input (?) inside the main chip. Maybe I should short this pin to GND and see what changes. But I didn't tried this yet.

I monitored pin #2 (as well as other 2 pins) and it is always high. No activity at all.
Looks like UART is physically present, but terminal commands are either disabled or not implemented in the firmware.
I'll try to add a jumper between the pin 4 and GND, but maybe no earlier than this weekend.


Attachments:
IMG_20171227_233801.jpg
IMG_20171227_233801.jpg [ 585.33 KiB | Viewed 13253 times ]
Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 28th, 2017, 0:39 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15440
Location: Australia
You could invalidate the flash memory (by shorting an appropriate pin), and then see whether an error message appears at the UART port.

For example, here is the error message from a 7200.11 SATA PCB:

Quote:
TetonST Boot ROM 2.0.
Copyright Seagate 2006.
Serial FLASH boot code checksum failure!

_________________
A backup a day keeps DR away.


Top
 Profile  
 
 Post subject: Re: Repairing Seagate ST9146852SS (Savvio 15K.2, 146GB, SAS)
PostPosted: December 30th, 2017, 16:20 
Offline
User avatar

Joined: September 8th, 2009, 18:21
Posts: 15440
Location: Australia
I've examined Lenovo's B62C firmware update for the 146GB SAVVIO/HORNET 15K.2 SAS drive. The architecture looks very similar to F3 SATA architecture.

In SATA drives, SA module 0x3D is the "virtual boot loader" (see attachment). It contains numerous text strings pertaining to terminal commands. However, your SAS drive's 3D module has relatively little plain text. I don't know what this implies.

ST DM series Virtual Start Tutorial:
viewtopic.php?f=1&t=35634

Code:
Directory of F:\Lenovo_Firmware_Updates\ST9146852SS

B62C_4F0 LOD     1,107,456  11-08-12 10:48a b62c_4f0.lod
MOD3D    TXT         3,365  12-27-17  2:09p mod3d.txt
B62C_4~2 BIN         8,192  12-27-17 10:01a b62c_4f0_lod_part_1.bin  (= part_4)
B62C_4~4 BIN        72,704  12-27-17 10:02a b62c_4f0_lod_part_2_sfw_1.bin
B62C_4~3 BIN         4,632  12-27-17 10:02a b62c_4f0_lod_part_3_rom_shell0b_copy1.bin
B62C_4~5 BIN         8,192  12-27-17 10:03a b62c_4f0_lod_part_4.bin  (= part_1)
B62C_4~1 BIN        72,704  12-27-17 10:04a b62c_4f0_lod_part_5_sfw_2.bin
B62C_4~7 BIN         4,632  12-27-17 10:05a b62c_4f0_lod_part_6_rom_shell0b_copy2.bin
B62C_4~8 BIN         8,192  12-27-17 10:06a b62c_4f0_lod_part_7.bin
B62C_4~9 BIN       262,144  12-27-17  9:57a b62c_4f0_lod_part_8_rom.bin
B62C_4~6 BIN       663,592  12-27-17 10:00a b62c_4f0_lod_part_9_mod3d.bin

ROM header

Code:
Offset(h) 00       04       08       0C

00000000  7A1E0000 980F0000 00000000 D0670400
00000010  63736944 000027AF 08000004 20FFFFFF  csiD..'¯.... ÿÿÿ
00000020  16480000 15500000 0E800B00 10900F00
00000030  06000002 04109402 05209502 0330B102
00000040  0B40CD03 0058DF03 00000000 7CC10000

ID  offset  size  description
------------------------------
16     48      8  BOOTFW_DIR
15     50    B30  GENERAL_DATA
0E    B80    410  IAP
10    F90  1F070  CFW
06  20000   9410  RAP
04  29410    110  CAP
05  29520   1C10  SAP
03  2B130  11C10  SFW
0B  3CD40   1218  SHELL
00  3DF58         end of ROM


Attachments:
ST2000DM001-1CH164_mod3D.rar [501.69 KiB]
Downloaded 473 times
ST9146852SS_B62C_components.rar [576.14 KiB]
Downloaded 478 times

_________________
A backup a day keeps DR away.
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 111 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group