All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: 2 questions on "Recovering Unrecoverable Data" pap
PostPosted: March 31st, 2005, 11:05 
Offline

Joined: October 10th, 2005, 5:36
Posts: 39
Location: Portugal
Greetings,

Most of you have probably read Charles Sobey paper "Recovering Unrecoverable Data" (the link to it was posted several months ago in this forum).

In the paper, the author states that:

"It is not widely known, but many modern drives routinely check for thermal decay of bits in the field and rewrite the sectors in which degradation is identified"

I've contacted the author and asked the following questions about this:

Q1) What does it mean "routinely check"? Let's take for example a 80GB disk. Scanning it sector by sector looking for thermal decay would take a long time, so I suppose that it doesn't do it that way. Is the check made while reading data?

Q2) Servo data is written in the same magnetic media, so it will also suffer
thermal decay. If read/write heads rewrite degraded sectors, what about
servo data?

Because I believe that this subject would be of some interest to most of you, he kindly agreed when I asked him for permission to post his answer in this forum. I hope you enjoy it.

******************************************************

Q1: Hard disk drives check themselves for any degradation -- regardless of the cause -- not exclusively thermal decay. The check involves reading an entire sector and checking the ECC (Error Correction Coding).

An ECC "check" has two parts. A "simple" calculation to determine if errors
are DETECTED. This involves calculating a "syndrome." If the syndrome is not 0, then a not-so-simple calculation must be preformed to determine where the error is located and how to CORRECT it. Any source of read error will result in a non-zero syndrome.

If the syndrome is non-zero, but the data is still correctable (because of
the very powerful ECC used by hard disk drives), the corrected data can be re-written to the sector. This re-starts the thermal decay "clock." It is
good practice to verify the rewrite by checking its ECC. If the sector
repeatedly has errors, the drive's defect management routines will map out the bad sector and logically replace it with a spare sector.

How often does the drive do this? I must use qualifying words like
"routinely" and "in general" and "typically" because drive manufacturers do not publicly define exactly what they are doing inside their drives. A large customer of a drive company (e.g., Dell) can probably get whatever
information they want, but it will only be under strict non-disclosure
agreements. Furthermore, every drive manufacturer does things a little
differently. Differences are found even in different drives from the same
manufacturer.

This check can happen on-the-fly, but there are also modes in which the
drive reads every sector (typically after an idle time of > 10 minutes) and
re-writes or re-allocates as needed. This is what I referred to in the
paper. At the Western Digital website (http://www.wdc.com), I believe that there is information about their "data lifeguard" feature that may discuss some of this behavior.

Sure the check takes a long time, but if your drive is idle (and it is not
in a mobile device) there is little to lose and much to gain by making the
check. In an always-accessing server application, this is not a good option
and other system-level data integrity steps can be taken (e.g., RAID and
mirroring).

It is also possible to check the bits themselves without using the ECC, or
even in conjunction with the ECC, to gauge decay BEFORE an ECC detectable error occurs. I do not know if any company uses such methods in the field, although they are used in manufacturing as a test.

The bottom line to your question is, "routinely check" means whatever the
drive manufacturer has decided for that model. If you are a big purchaser,
the drive company will probably tell you. If not, you may be able to monitor the behavior of a group of idle drives and see (that is hear) what happens when. In some cases, the manufacturer may provide this level of information in their extensive "product manuals." These are different than their spec sheets or data sheets.


Q2: You are very observant to note that servo presents unique challenges.
All data on the disk are subject to thermal decay, however, some bits are
more susceptible than others.

For example, in addition to temperature causing bits to decay, magnetic
fields weaken bits also. These weakened bits are then more likely to be
affected by thermal decay. Ignoring stray external fields as an obvious
issue, magnetic fields come from other, in-drive, sources. Tightly spaced
transitions (high density bits in the "down-the-track" direction) weaken
each other (this causes a lot of other problems and is fundamentally linked
to the maximum capacity of a disk surface). Also, writing to a sector
results in a side-fringing field that can weaken the transitions on adjacent
tracks.

Servo sectors are typically (a qualifying word again) written at a much
lower density than data sectors. Plus, the servo portions of tracks are
aligned radially next to each other and are never written in the field.
Therefore, they are not subject to the side-erase effect due to
side-fringing fields during writing.

These two facts make servo information more stable than data -- in general. Furthermore, robust drive designs should be able to handle having a few servo sectors in a row (<=3) be corrupted -- but only at a few locations. To my knowledge, no drive re-writes servo information in the field. It is theoretically possible, but there are many practical issues.

One main issue is that if your lower-density, never-written-next-to, servo
bits are decaying, .... Your data bits are probably long gone!

********************************************************

Regards,

Daniel


Top
 Profile  
 
 Post subject:
PostPosted: March 31st, 2005, 17:04 
Hi Daniel, thank you for sharing this text with us.
Regards
Jose Pinto


Top
  
 
 Post subject:
PostPosted: April 1st, 2005, 5:57 
Thanks Daniel, great information.

Only a tip. I didn't found any topic on the formu regarding that paper, I searched for title and/or author without success. Could you put here a link for the topic - or - the paper ?

Thanks in advance !


Top
  
 
 Post subject:
PostPosted: April 1st, 2005, 9:14 
Offline

Joined: October 10th, 2005, 5:36
Posts: 39
Location: Portugal
"Recovering Unrecoverable Data", By Charles Sobey:

http://forums.actionfront.com/attachmen ... chmentid=6


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 165 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group