All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: ST2000DM001 and brother sudden deaths
PostPosted: January 23rd, 2019, 16:50 
Offline

Joined: January 23rd, 2019, 13:54
Posts: 3
Location: Argentina
Hi, I'm new to DR, expecting some guidance and help to diagnose and eventually try a recovery.

Thanks in advance to all the community members that one way or another contributed to open the knowledge of this intricate and neverending world of DR.

I've four ST2000DM001 units, two of them were used frequently on a hackintosh box.
The other two are stored, barely and rarely used, almost new.

Around july-2018 a cute OSX notification appeared telling "Disk Not Ejected Properly / Eject "BLABLA" before disconnecting or turning it off" for one of them (none of them were system disks).
See for example: http://cdn.osxdaily.com/wp-content/uplo ... ly-mac.jpg

I tried shutdown/powerup -> no ID in UEFI/bios. No weird noises, spins-up ok, no clicking.
Checked the power supply in bios and with voltmeter, all ATX voltages in range, other disks in box working (SSD + other WD, and the other ST2000DM001)

Tried in another working pc:
- after a while was recognized in bios, but during boot a Windows OS was like halted, eventually I shutdown the pc.
- tried Seatools bootable -> no ID

I ran and bought new (WD this time, lesson learned :D) drives to backup the others disks in the same box immediately.
Turned on hackingtosh, checked SMART for the living Seagate brother, all OK, no relocated/pending sectors (or so I believe to remember).
I spent just a couple of hours online, then I've done a file listing only backup on all drives (just to update/remember/archive what was in there).
Then I was about to make a backup of the living Seagate brother in box when... "Disk Not Ejected Properly" :D (yes, the other ST2000DM001 about to be backuped next)

I couldn't believe it.
I thought it should be a joke software related "problem", because...
1) What's the probability of two hdds dying "almost" at the same time? (but not the SAME time as to be associated with an electric malfunction event for example).
2) SMART was ok immediately before death
3) Only the Seagate brothers in the box, not other disks

This two units were originally installed new in the box almost at the same time, so they have almost the same PowerOnHours (about 3 years).

After reading, internet told me Seagate is a bad-guy, ok. No more Seagate from now on, I promise ;)

Note that I have about 90% of important data on these drives backuped, but the remaining 10% is valuable to me, so I want to try a recover.
I hope that DIY is an option because of no tools/money :(

Anyway, long story short:

I have terminal access to the drives (at least the two I'm working with by now, one of the healthy and one of the deads; not tried yet the others).

First power up of "dead" drive (no sata attached):
Spins-up ok, no weird sounds, no clicking.
Throws a bunch of errors and then appears to stabilize.
Tried Ctrl+Z -> ok, T level access
Tried Ctrl+R -> ok, ASCII online mode
Tried sata hotpluging it after this, Windows recognized the drive in Device Manager and (as automount=on) tried to mount partition (HFS+ partitions handled by third party software), but then it was no responding, I turned off system, which hanged at "Shutting down" (I forced a power off then).
No terminal messages logged on hotplug beside "(H) SATA Reset"
Terminal log attached "FirstPowerUp.log"


Second power up of "dead" drive (no sata attached):
Spins-up ok, no weird sounds, no clicking.
Now throws only 10 errors of the type "ProcessRWError -Read- at LBA 000000000002BXXX Sense Code=XX" and then appears to stabilize.
Tried Ctrl+Z -> ok, T level access
Get info from non-destructing terminal commands.
Spin-down through "F3 2>Z"
Power off
Terminal log attached "SecondPowerUp.log"

Looking to recover only the important files. Need a way to diagnose the problem.
Is there a way to print verbose diagnostic messages in terminal ASCII online mode?

What do you think? Bad heads/platters? Firmware problems? Media cache?

What's the probability of this happening to two different drives almost at the same time for a non-electronic failure???

Isn't it like if there is some sort of "if powerOnHours > 3 years -> self-death" on firmware? Those bad guys... ;)

Any help would be greatly appreciated.

BTW: programmer skills + some electronics here if needed, totally new to this DR world


Attachments:
SecondPowerUp.log [3.12 MiB]
Downloaded 476 times
FirstPowerUp.log [7.41 KiB]
Downloaded 317 times
Top
 Profile  
 
 Post subject: Re: ST2000DM001 and brother sudden deaths
PostPosted: January 23rd, 2019, 20:34 
Offline

Joined: January 23rd, 2019, 13:54
Posts: 3
Location: Argentina
Oh, thanks very much for your quick and detailed answer Spildit!

I'm gonna probably scream out loud like a mad man and try your suggestions ;)

By the way, before I try that, some more questions if you are so kind to enlighten me.

Do you know what that "DOS table" is all about?

About media cache, I made some research but can't find useful answers neither.
Supposedly a cache for "last 70GB" (or so) of data accessed, stored on the platters.
Do you know how it works? Do someone know what happen when you turn it off through Congen?
Is the data in this "big cache" accesible through normal high level file system access?

Eg: you have a filesystem on a drive with this media cache turned on, then you turn it off through Congen.
Is ALL the data still accesible from a file system parser/reader point of view?

Apart from that, you suggested to rebuild MC with "F3 C>U1/2/3" commands.
If I don't misunderstand this will clear the MC (set to 0s the data on the platters), losing my precious data.
So, is it dangerous to try the data recovery with MC off through Congen but without rebuilding it through "F3 C>U1/2/3" commands?

About SMART: is it a good idea to turn it off and/or clear it for data recovery purposes?

Be assured that I'LL NEVER USE SEAGATE EVER AGAIN :)

PS: data is important for me but I'm unemployed here for now, so I can't spend on third party tools or recovery services.
If the situation were different, be sure I'll contact you for services in first place.

Thanks again!


Top
 Profile  
 
 Post subject: Re: ST2000DM001 and brother sudden deaths
PostPosted: January 26th, 2019, 20:26 
Offline

Joined: January 23rd, 2019, 13:54
Posts: 3
Location: Argentina
Well, after some reading and practicing on my healthy drive I powered on the patient drive again.

This time, after other errors, it got stuck with:
Code:
LED:00000047 FAddr:FFFFFFFE
Attachment:
Log-03.log [21.49 KiB]
Downloaded 432 times

No terminal access, so I turned off the drive and I did more research.
MC corrupt maybe? Shorting pcb pins to gain access?
Anyway, I tried turning it on again and I was lucky, no LED:47 and got terminal access :)
Attachment:
Log-04.log [12.71 KiB]
Downloaded 442 times

Immediately I backed up the following files through HyperTerminal using level T>r command:
    FILE_0_000_0
    FILE_0_32C_0
    FILE_0_32D_0
    FILE_1_000_0
    FILE_2_000_0
    FILE_3_01B_0
    FILE_3_01C_0
    FILE_3_01F_0
    FILE_3_028_0
    FILE_3_035_0
    FILE_3_036_0
    FILE_3_037_0
    FILE_3_093_0
    FILE_3_093_1
    FILE_3_133_0
    FILE_3_178_0
    FILE_3_195_0
    FILE_3_300_0
    FILE_4_054_0
    FILE_7_000_0
    FILE_9_001_0
    FILE_9_32A_0
    FILE_A_001_0
    FILE_A_32A_0

[Q] Any other unique/critical/important files to backup on this ST2000DM001@CC4H drives/firmware?

No errors reported on terminal nor slow downs while reading the files.

I printed the stock Congen using "F3 T>F":
Attachment:
Congen-stock.log [51.08 KiB]
Downloaded 473 times

Next I proceeded with the following Congen mod:
Code:
F"READ_SPARING_ENABLED",0
F"WRITE_SPARING_ENABLED",0
F"OFFLINE_SPARING_ENABLED",0
F"DAR_ENABLED",0
F"BGMS_DISABLE_DATA_REFRESH",1
F"ABORT_PREFETCH",1
F"READ_LOOKAHEAD_DISABLED_ON_POWER_UP",1
F"READ_CACHING_DISABLED_ON_POWER_UP",1
F"BGMS_ENABLE",0
F"MediaCacheControl",00

I printed the modded Congen using "F3 T>F" afterwards [before reset]:
Attachment:
Congen-modded-before-reset.log [51.61 KiB]
Downloaded 467 times

I redownloaded file FILE_3_093_0 and it was modified accordingly.

Next I did the DOS (Directed Off-line Scan) table clearing command you suggested:
Code:
F3 7>m100

(DOS) File Save

I redownloaded file FILE_3_195_0 and it was indeed cleared :)

I made one last full terminal log before power cycling the drive:

Code:
Head 00 Resistance 00FC
Head 01 Resistance 00EF
Head 02 Resistance 0159
Head 03 Resistance 012C

SMART:

Num  Flgs normlzd worst raw
05   0033   47     33   00000000009828 (reallocated sector count)
C5   0012   4E     4E   00000000000E88 (current pending sector)
C6   0010   4E     4E   00000000000E88 (offline uncorrectable)

Attachment:
Log-05.zip [713.05 KiB]
Downloaded 295 times

[Q] Does this look BAD?

Ok, I power cycled the drive with Ctrl+C, looks quiet:
Code:
Spinning Down

Spin Down Complete
Elapsed Time 11.543 secs
Delaying 5000 msec

Jumping to Power On Resetÿ
Boot 0x40M

Spin Up
TCC-0022[0x000065B4][0x00006A20][0x00006E8C]

Trans.


Rst 0x40M

MC Internal LPC Process

Spin Up
TCC-0022

(P) SATA Reset


MCMainPOR: Start:

Check MCMT Version: Current

MCMainPOR: Non-Init Case

MCMainPOR: MCTStateFlags 0000002A  MCStateFlags 000000C1

MCMainPOR: MC off and MCMT empty

MCMainPOR: EXCEPTION: POR Failed General

MCMainPOR: Feature Disabled...
PowerState = IDLE1
PowerState = IDLE2

I reprinted Congen just in case [after reset]:
Attachment:
Congen-modded-after-reset.log [51.13 KiB]
Downloaded 417 times

Ok, I plugged SATA data cable, drive appears to be stable so I started file recovery attempt, but then I see lots of:
Code:
Starting LBA of RW Request=X  Length=00000001

ProcessRWError -Read-   at LBA X  Sense Code=43110081

for many LBAs I tried.
Attachment:
Log-06-file-reading-attempt.log [136.33 KiB]
Downloaded 444 times

According to doc, SenseCode=43110081 : Disc Xfr - HW uncorrectable medium error

I have fully recovered only some files, rest of them have bad sectors.

So, more questions:

[Q] What's the diagnostic for this drive?
[Q] P(bad heads) == 1 ?? :(
[Q] Can SenseCode=43110081 be attributed to any other problems like FW state corruption? (corrupted P/G/NRG/etc lists / translator / others)
[Q] Does any free/open source imager software similar to DDI4 exists that can deal with bad sectors statistical data recovery attempts?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: Google Adsense [Bot], Jonnyz2 and 43 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group