All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: ST2000DL003 does not make it to the SATA init
PostPosted: August 18th, 2016, 1:42 
Offline

Joined: August 15th, 2016, 6:05
Posts: 9
Location: Czech Republic
Hello everybody,

I've got a failed Seagate ST2000DL003-9VT166 drive with a firmware version CC32. It is not shown on an SATA controller. The failure was most likely caused by a bad powersupply (cable or ATX). Prior failure the drive often beeped (same sound when normal poweroff occured - probably head parking) and one time it failed completely.

Right after failure I've got this log (with no terminal access):
Code:
Rst 0x40M
MC Internal LPC Process
(P) SATA Reset
User Data Base  00991590
MCMainPOR: Start:
Check MCMT Version: Current
MCMainPOR: Non-Init Case
MC Seg Disc and Cache Nodes:  4011A624  40118734
Seg Write Preamble VBM start: 000010A7 end: 000010CE
  Footer - start: 000010D0 end: 000010F7
Seg Read Preamble VBM - start: 000010F9 end: 00001120
  Footer - start: 00001122 end: 00001149
Reconstruction: MCMT Reconstruction Start
  Max number of MC segments 22E0
Nonvolatile MCMT sequence number 003A2776
[RSRS] 0995
[SW] 17B3
[SW] 17B8
[SW] 17B9
[SW] 17D4
[SW] 17D5
[SW] 1802
[SW] 1838
[SW] 186B
[SW] 1899
[RSRS] 099E
[SW] 18CC
[SW] 18F9
[SW] 191C
[SW] 1951
[SW] 197E
[SW] 1985
[SW] 198E
[SW] 1999
[SW] 19A6
[RSRS] 09A7
ProcessRWError -Read-   at LBA 00054759  Sense Code=40000087
InitiateMarkPendingReallocateRequest for disc_lba: 00054759!
Reconstruction: EXCEPTION: Seg Read Fail: Status = 0000
Continuing after error
ReadFooters (Forward): Segment 09A7 StartLBA 002A3AA0 Starting Footer LBA 002A3CA8
  SeqNum 003A2789  TotalUserBlocks 01D8
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
  Footer Status = 0001
Reconstruction: EXCEPTION: Segment Overall Sequence Number Mismatch, No Valid Footer  003A2789 003A2780
Reconstruction: ProcessIncompletelyWrittenSegment
Adds  LBA/Len
  2681A686/0008  2681A836/0008  2681AA46/0008  2681AB06/0008  2681ACC6/0008  2681ADF6/0008  2681AF6E/0008  2681B1A6/0008  2681B206/0008  2681B356/0008  2681B4AE/0008  2681B5BE/0008  2681BA5E/0008  2681BF0E/0008  2681C026/0008  2681C12E/0008  2681C426/0008  2681C51E/0008  2681C87E/0008  2681C936/0008  2681CB66/0008  2681CF36/0008  2681D1A6/0008  2681D28E/0008  2681D326/0008  2681D336/0008  2681D43E/0008  26A354F6/0008  26A35546/0008  26A355B6/0008  26A366BE/0008  26A36FDE/0008  26A374AE/0008  26A37736/0008  26A378DE/0008  26A379CE/0008  26A37ABE/0008  26A37CD6/0008  26A3818E/0008  26A3864E/0008  26A38776/0008  26A38896/0008  26A38EA6/0008  26A393CE/0008  26A3945E/0008  26AE180E/0008  26AE1B5E/0008  26AE1C06/0008  26AE1FCE/0008  26AE2036/0008  26AE25EE/0008  26AE29FE/0008  26AE33F6/0008  26AE35C6/0008  26AE381E/0008  26AE382E/0008  26AE41B6/0008  26AE452E/0008  26AE469E/0008
  Add Count 0000003B
Rems  LBA/Len
  2681ACC6/0008  2681B206/0008  2681B5BE/0008  2681BA5E/0008  2681C87E/0008  2681C936/0008  2681D326/0008  26A355B6/0008  26A36FDE/0008  26A378DE/0008  26A3864E/0008  26A38776/0008  26AE1C06/0008  26AE2036/0008  26AE381E/0008  26AE469E/0008
Remove 0000 LBA 2681ACC6 End 2681ACCE
    Add 0000 LBA 2681A686 End 2681A68E
    Add 0001 LBA 2681A836 End 2681A83E
    Add 0002 LBA 2681AA46 End 2681AA4E
    Add 0003 LBA 2681AB06 End 2681AB0E
    Add 0004 LBA 2681ACC6 End 2681ACCE
Reconstruction: ProcInCmptSeg: OverlapDetected: With previous Add/Rem
Remove 0001 LBA 2681B206 End 2681B20E
    Add 0000 LBA 2681A686 End 2681A68E
    Add 0001 LBA 2681A836 End 2681A83E
    ... repeats many times ...
    Add 003A LBA 26AE469E End 26AE46A6
Reconstruction: ProcInCmptSeg: OverlapDetected: With previous Add/Rem
Reconstruction: Last Chance
Rst 0x40M
MC Internal LPC Process
LED:000000BD FAddr:00007E05

Some time later I was able to execute the pin shorting method and after few tries (too soon, too late, brief flash of "F3 T>" prompt and drive hang) I've got a terminal access. From this point I have a nearly complete log of the executed actions (I will not post it here as it is redundant and around 20 MB, but I can send it, if required).
Code:
Rst 0x40M
MC Internal LPC Process
(P) SATA Reset

(P) SATA Reset

SIM Error 3005 LBA 0000000000064407 FD FC37D093
RW Error 00000080
User Data Base  00991590

No HOST FIS-ReadyStatusFlags 0002A1A5

I have tried to disable the automatic repair functions by "/TFxx,yy,22" commands. But drive is stil unable to enable the SATA. I've tried to enable some of the debugging flags (and combinations) from "/TF" flag list, but without success.

I was able to download ("/Trxx,yy") probably all available system files (about 100 MB of data). BTW many of them is not listed by the "/Ty" command. Trying to download some of them shows:
Code:
DiagError 00000024

which is I suppose "not found". But some of them hangs (?) the drive and power must be cycled. BTW One of the system files seems to contain some interesting factory logs from drive manufacturer (I think it is "/Tr318,3"). SIM file reports size of 32 MB, but the drive will hang after about 17 MB (I don't known if this is problem coresponding with the drive failure or just some buffer overflow - but another file has over 30 MB (filled with 0xff) and gets downloaded fine).

My assumption is, that drive finds some SIM file error and refuses to continue in the boot sequence and that it is possible this error was created by shorting the read channel. The matching file from:
Code:
SIM Error 3005 LBA 0000000000064407 FD FC37D093

is (fileID=0xea, volume=3):
Code:
File  Vol  FD        Location      Size      Cylinder  Hd  Sector
   ----  ---  --------  ------------  --------  --------  --  ------
   00ea  003  fc37d093  000000064407  00000009  00029539  01  000248

but "/Trea,3" command returns "DiagError 00000024" file not found error.

I have not used any of the "/Ti" or "/Tm" commands. But after dumping most possible SIM files I have run "/1N1 - initialize SMART", which hangs the drive after:
Code:
Initial value of SectorAltRlistEvents is 00
Initial value of WedgeAltRlistEvents is 00

...and "/CU3 - modify media cache", which seems to run OK, but without any other effect. By using commands "/ARxxx" and "/2Bxxxx,yyyy" I was able to inspect media cache area (just some random sectors). After "/CU3" it seems to get cleared.

When I use the "User Data Base" value from drive boot (divided by 8). I'm actually able to see original MBR of drive (= userspace LBA 0). Problem is that dumping byte by byte from terminal "/ARxxx" and "/2Byyy,yyy" with UART speed about 500 kbps would take like half a year, so my question is:

  • It is possible (and how) to instruct the drive to ignore some of the errors and initialize the SATA port?
  • Is my assumption, that data are mostly OK, valid?
  • Does somebody have existing 0xea SIM file?

P.S. The drive has a single "/TV40 - Nonresident GList" entry, 0x2990 "/TV10 - P list" entries, 22 "/TV80 - Resident Glist" entries and over 0x2900 "/TV100 - Primary DST List" entries (all logged from console list and SIM files dump).

Thanks for any help.


Top
 Profile  
 
 Post subject: Re: ST2000DL003 does not make it to the SATA init
PostPosted: August 18th, 2016, 11:47 
Offline

Joined: October 5th, 2015, 18:53
Posts: 433
Location: Canada
You need to repair FC37D093. I would try to read it sector by sector and if it's corrupted a lot - I would use it from other drive. There are 2 copies of this file.


Top
 Profile  
 
 Post subject: Re: ST2000DL003 does not make it to the SATA init
PostPosted: August 21st, 2016, 8:10 
Offline

Joined: August 15th, 2016, 6:05
Posts: 9
Location: Czech Republic
Ah I see now, I was mistaken, file ID is in "FD" value (FC37D093) of SIM table and not in first column.

So drive cannot read configuration data? In this case both commands:
Code:
/Tr93,3,0
/Tr93,3,1

read identical data, and:
Code:
/2A0
/2s00029539,1,22
/2r,248,1

from SIM table "/Ty" returns same LBA position as Display Active Status "." command. Same LBAs (from SIM table and SIM Error message) is reported by "^X" too without any sector failures.


Top
 Profile  
 
 Post subject: Re: ST2000DL003 does not make it to the SATA init
PostPosted: August 21st, 2016, 15:37 
Offline

Joined: August 15th, 2016, 6:05
Posts: 9
Location: Czech Republic
I tried to rewrite file ID 93 and nothing changed so I think the line:
Code:
SIM Error 3005 LBA 0000000000064407 FD FC37D093

does not necessarily mean there is an error.

I rewrieved the logs and I think I know what happened. Is it possible that incorrect read channel shorting caused incorrect marking of sectors as bad?

After enabling more debug I'm now getting:
Code:
Rst 0x40M
MC Internal LPC Process
(P) SATA Reset

RAW OFF
PASS
Drive AMPS Configuration has been modified from compiled defaults.
Drive must be re-initialized to controller firmware defaults by re-downloading controller firmware
DO NOT SHIP WITHOUT FIRST RE-DOWNLOADING CONTROLLER FIRMWARE OR RESETTING TO DEFAULTS!

(P) SATA Reset

Drive AMPS Configuration has been modified from compiled defaults.
Drive must be re-initialized to controller firmware defaults by re-downloading controller firmware
DO NOT SHIP WITHOUT FIRST RE-DOWNLOADING CONTROLLER FIRMWARE OR RESETTING TO DEFAULTS!

SIM Error 3005 LBA 0000000000064407 FD FC37D093
RW Error 00000080
User Data Base  00991590

MCMainPOR: EXCEPTION: SIM aborted prior to MCMT read
MCMainPOR: EXCEPTION: POR Failed General
MCMainPOR: Feature Disabled...
No HOST FIS-ReadyStatusFlags 0002A1A5

It seems, that Media Cache Table Primary Copy system file "/Tr342,3" returns:
Code:
DiagError 0000000E

and the second copy too.

From SIM table:
Code:
File  Vol  FD        Location      Size      Cylinder  Hd  Sector
----  ---  --------  ------------  --------  --------  --  ------
004a  003  fc34a342  000000000000  00000800  000298d0  00  000000
00b7  003  fc34a342  00000003b41f  00000800  00029493  01  000000

it seems, they are located at LBA 0 and LBA 3b41f.

Problem is, these two sectors are listed in System Slip Defect List "/TV2"
Code:
System Slip Defect List
                             log log   log     phys   phys
        LBA    span   cumm   cyl  hd  sctr zn   cyl   sctr     SFI
           0      0      0     0  0     0   0  298D0     0 FFFFFFFF            0
       3B41F   B9E1   B9E1     0  1     0   1  29493     0 FFFFFFFF        46E00
       7683E   B9E1  173C2     0  2     0   2  28F42     0 FFFFFFFF        8DC00
       B1C5D   B9E1  22DA3     0  3     0   3  27D95     0 FFFFFFFF        D4A00
       ED07C   B9E1  2E784     0  4     0   4  28701     0 FFFFFFFF       11B800
      12849B   B9E1  3A165     0  5     0   5  29996     0 FFFFFFFF       162600
      1638BA   B9E1  45B46   120  0     0   6  299F0     0 FFFFFFFF       1A9400
      19ECD9   B9E1  51527   120  1     0   7  295B3     0 FFFFFFFF       1F0200
      1DA0F8   B9E1  5CF08   120  2     0   8  29062     0 FFFFFFFF       237000
      215517   B9E1  688E9   120  3     0   9  27EB5     0 FFFFFFFF       27DE00
      250936   B9E1  742CA   120  4     0   A  28821     0 FFFFFFFF       2C4C00
      28BD55   B9E1  7FCAB   120  5     0   B  29AB6     0 FFFFFFFF       30BA00

Head 0: entries      2        slips     B9E1
Head 1: entries      2        slips    173C2
Head 2: entries      2        slips    173C2
Head 3: entries      2        slips    173C2
Head 4: entries      2        slips    173C2
Head 5: entries      2        slips    173C2
  Total Entries      C  Total Slips    7FCAB

First copy is listed in User Slip Defect List "/TV1" too:
Code:
User Slip Defect List
                         log log   log     phys   phys
        LBA    span   cumm   cyl  hd  sctr zn   cyl   sctr     SFI      PBA
           0      0      0     0  0     0   0      0     0        9            0
       6681E      1      1    35  5    CA   0     35    CB    4AED0        6681F
       66970      1      2    34  5    A5   0     34    A6    4B2FB        66972
...

Acessing these sectors by:
Code:
/2A0
/2s000298d0,0,22
/2r,0,1

commands causes R/W error:
Code:
DiagError 00005003 R/W Status 2 R/W Error 43110081
Next System LBA 000000000000 LLL CHS 000000.0.0000 PLP CHS 0298D0.0.0000
Remaining Transfer Length 00000001

Is it possible to increase depth of the command history "^X"? It seems to log only last 32 actions (so earlier read errors are not in the log anymore).

If this is true, it is possible to somehow remove LBA 0 and 3b41f from slip defect lists?


Top
 Profile  
 
 Post subject: Re: ST2000DL003 does not make it to the SATA init
PostPosted: August 24th, 2016, 23:26 
Offline

Joined: March 19th, 2015, 15:01
Posts: 1405
Location: isreal
pc2005 wrote:
I tried to rewrite file ID 93 and nothing changed

You don't just rewrite it, you need disable everything before


Top
 Profile  
 
 Post subject: Re: ST2000DL003 does not make it to the SATA init
PostPosted: August 25th, 2016, 12:52 
Offline

Joined: August 15th, 2016, 6:05
Posts: 9
Location: Czech Republic
I don't have PC3k, but I understand that using "/TFxxx,yyy,22" command is equivalent (file ID 93 is something like config file for "/TF").

I have changed DISABLE_CORRECTION, READ_SPARING_ENABLED, WRITE_SPARING_ENABLED, BGMS_ENABLE, DAR_ENABLED and MediaCacheControl (and probably few more).

It seems that firmware still tries to read from the mediacache file ID and fails on falsely bad sectors even when mediacache is disabled.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 28 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group