Data recovery and disk repair questions and discussions related to old-fashioned SATA, SAS, SCSI, IDE, MFM hard drives - any type of storage device that has moving parts
September 12th, 2009, 21:40
Hi all, I found this forum last night and have enjoyed browsing through it. Some of the recovery stuff goes way over my head but I'm still interested in understanding the technology and inner workings of hard drives, since I maintain around 25 of them.
My question today is about strange behaviour from a Seagate ST3500320AS (500GB) drive that is about 14 months old. It dropped out of a RAID1 mirror, a self test failed, and the current pending sector and offline uncorrectable SMART values were non zero. I performed a zero fill to try to evoke any further errors but all this seems to have done is DECREMENTED the bad sector related values back to 0 (and it also has NOT incremented the reallocated sector count - that is still 0!). If it wasn't for spin retry count being >0 (and two UNCR events in the SMART error log) you would never know there was a problem with this drive.
So I put it back into the mirror to see what would happen; a few seconds into rebuilding it started clicking again and eventually dropped out with the bad sector SMART values going >0 again. One more zero fill, perfect, back into the mirror, click click click. It's reporting 600+ bad sectors but a zero fill consistently "fixes" the problem. At this point I left it for a few days.
It gets stranger... now that I've removed the drive and put it in another machine it hasn't skipped a beat. An MHDD scan looks reasonable, with only 45 sectors <150ms.
I'm confused why this drive is not incrementing the SMART reallocated sector count, if it's remapping bad sectors then surely this should be >0 ?Here's some relevant bits from smartctl -a:
- Code:
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 099 097 Pre-fail Always - 948
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
40 51 00 2d eb 10 00 Error: UNC at LBA = 0x0010eb2d = 1108781
40 51 00 d8 ca 08 00 Error: UNC at LBA = 0x0008cad8 = 576216
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 10340 -
# 2 Short offline Completed: read failure 90% 10302 597072
# 3 Short offline Completed: read failure 90% 10294 22095
# 4 Short offline Completed without error 00% 10289 -
# 5 Extended offline Completed: read failure 90% 10264 3910350
# 6 Short offline Completed without error 00% 10264 -
# 7 Short offline Completed without error 00% 10249 -
# 8 Short offline Completed: read failure 90% 10247 211917
# 9 Short offline Completed without error 00% 10218 -
#10 Extended offline Completed: read failure 90% 10216 256509
#11 Short offline Completed: read failure 90% 10216 256509
#12 Extended offline Completed without error 00% 8969 -
Thanks in advance for any comments...
September 12th, 2009, 21:42
PS: I've had bad luck with drives so every server I look after has a minimum of RAID1 protection, and I do regular backups, so I'm not going to argue with you guys about how I can recover data myself for less than $50.
September 12th, 2009, 22:09
Could be soft bads that are fixed by rewriting them.
September 12th, 2009, 22:34
drccsc wrote:Could be soft bads that are fixed by rewriting them.
Yep, I understand that much, but shouldn't that increment the reallocated sector count? This is the behaviour I see with other drives.
September 12th, 2009, 22:50
After re-reading your reply I guess you meant that it's a marginal error, and rewriting the same sector (rather than remapping) ends up restoring it? Hmmm.
September 12th, 2009, 23:12
Right. Of course if it keeps happening over and over again then I would say the drive is having problems.
September 12th, 2009, 23:19
I'm going to use a simple program that I wrote which reads and writes random lengths of data on random portions of a disk, I don't really trust this drive any more and without it [currently] playing up I can't RMA it, so I may as well give it a bit of a workout and see what happens...
September 12th, 2009, 23:25
If you really want to kill it just use spinrite or something...
September 12th, 2009, 23:54
drccsc wrote:If you really want to kill it just use spinrite or something...

Yeah, I actually did a search on this forum for spinrite a day ago, it's entertaining to see some of the replies.

I used it probably 15 years ago on an old MFM drive. Even back then terms used in the readme such as "scrubbing" sounded a little wacky and far fetched. I presume with today's proprietary electronics presenting a digital only interface low level access isn't anywhere near possible these days.
I once wrote a program to "refresh" floppies on an old (circa 1982) microcomputer. If it came across a bad sector it would repeatedly try to read it until valid data was returned, then it wrote out the sector again. It usually worked. Sounds familiar?
September 13th, 2009, 11:43
About floppies : how can you write a sector if the beginning mark or gap or preamble or guard band are messed up ? Maybe reformat entire track then rewrite the sectors... but what if the problem is physical i.e. magnetic layer dropout or media damage ?
On modern drives it is practically impossible to do such actions, except ask the drive to read or read without CRC control a block, then rewrite it.
You have some sort of control on reallocation and other mechanism(s) if you have enough info about firmware.... but forget direct access to internal DSP/decoding/encoding unless you work for the manufacturer or you have done extensive research on firmware and hardware structure of THAT drive/family - and, final note, it is totally useless except for self-educational purpose....
September 13th, 2009, 14:09
BlackST wrote:About floppies : how can you write a sector if the beginning mark or gap or preamble or guard band are messed up ? Maybe reformat entire track then rewrite the sectors... but what if the problem is physical i.e. magnetic layer dropout or media damage ?
This was just a simple program to try to recover a floppy that had degraded through normal use, formatted in one machine, half the sectors written in another machine, then some of those updated and more written out in another... like I said, it usually worked, I was a 15 year old kid, lighten up a bit.

BlackST wrote:On modern drives it is practically impossible to do such actions, except ask the drive to read or read without CRC control a block, then rewrite it.
You have some sort of control on reallocation and other mechanism(s) if you have enough info about firmware.... but forget direct access to internal DSP/decoding/encoding unless you work for the manufacturer or you have done extensive research on firmware and hardware structure of THAT drive/family - and, final note, it is totally useless except for self-educational purpose....
That's why Spinrite's claims seem so far fetched. I saw a screenshot on grc.com which showed some sort of head amplitude graph... through an IDE/SATA interface??
Powered by phpBB © phpBB Group.