This is related to the "
Firmware modding to gain raw access to data from read head?" thread I started a couple days ago, but it's more generalized to the point that I think a separate thread is called for.
I had an IBM Deskstar go bad in January 2002, and it spawned an ongoing data recovery project. At first all I did was clone the partition (with WinHex which logged bad sectors), fix the cloned copy so it was accessable, and write a program to generate a list of affected files from the list of bad sector LBAs. Later I mapped the geometry of the bad sectors and worked on recovering the data from the bad sectors themselves. The project was a big success; I fully recovered many of the files most important to me, and learned a heck of a lot along the way.
Talking about this
in the StorageReview forum was a tad disappointing towards the end of the thread, when I was describing the technical details of what I had figured out and done, and nobody had much to say about it. Now I've discovered the HDD Guru forums. Perhaps I have found a place where I can actually discuss these things with others?
Many of you on the forum are extremely experienced data recovery engineers, and I would like to hear what you think may have actually happened to my drive. What kind of event would create the
bad sector stripe in this graph? (Also see the
polar graph.) Is it possible a physical scratch was created in that path, or did the head simply write while seeking (and why would it do that)? I have a vague hypothesis, but I'm a newbie at this compared to you.
And the steps I used to fully recover some bad sectors... how many of you have done these things, or even do them regularly?
- XOR-descrambling and inverse-RLL'ing multiple reads of a sector, and lining up the multiple reads so that intact parts can be identified, and some of the marginal parts can be recovered
- filling in data in a sector by context; how about in compressed files, where (with the aid of a custom-written program) backreferences can be filled in one by one, each one giving more and more context to do more and more filling-in?
- using Galois field matrix division (automated by a program, of course) to correct any chosen set of N bytes using the ECC (where the ECC is N bytes long, and 512-N or more data bytes are known to be correct — brought past the all-or-nothing threshold with help from the previous two steps)
For me, this was painstaking work, and doing #2 absolutely required an intimate knowledge of the nature of the data that had been corrupted. (Most of the bad sectors I recovered were pieces of game replay logs in a format I know only because I'm a developer of the game in question.) It makes me wonder, if I had sent this drive to a high-end data recovery center, would they have simply called these bad sectors a loss? Or is there some magic means of data recovery that could have worked on these same sectors without requiring intimate knowledge of the data — perhaps a hardware technique that trumps my software-only technique? Steps #1 and #3 could be automated to recover a subset of sectors (and recover the rest with zeroed gaps inside)... does anyone do that?
About #1 and #3 — I would imagine the XOR scrambling, RLL (*PRML) algorithm and ECC polynomial&matrix would have to be reverse-engineered, like I did. Or is this information shared at all?
About #2 — what if there are "bad sector gaps" in a data file that is, let's say for the sake of argument, a story the client wrote. What if the client would know how to fill in some of the gaps, because he/she remembers writing it — and this extra amount of filling-in could bring it to the critical threshold of being ECC-correctable — do you bring in the client, or do you call the missing data a loss?
Perhaps I should get into data recovery as a career. How many of you got started recovering your own data?