I had an IBM Deskstar go bad, and it spawned an ongoing data recovery project. While at first all I did was clone the partition, fix the cloned copy so it was accessable, and write a program to generate a list of affected files from the list of bad sector LBAs, I later mapped the geometry of the bad sectors and worked on recovering the data from the bad sectors themselves. The project was overwhelmingly successful, in that:
- I mapped the geometry of the bad sector stripe (here is a graph of it) without any access to undocumented vendor-specific commands. To get the P-List I wrote a program to time seek operations over multiple passes to find the exact locations of track boundaries; this gave the head and track of each factory defect (by pointing out, for example, tracks 701 sectors long that should've been 702), which was enough to perfectly align an LBA to head/track/sector conversion algorithm, in turn making it possible to translate the list of bad sector LBAs (logged during the drive cloning) into a G-List. Kinks in the graph even showed me where the embedded servo sectors are.
- I fully recovered some of the bad sectors holding the data most important to me. Using READ LONG to do multiple reads of each sector, I reverse-engineered the RLL scheme, implemented blind ECC correction, and combined these two techniques with filling in data from context (depending on what kind of data was in the sector). A picture of a particularly interesting kind of recovered bad sector can be seen here.
I would like to take this project a step further. Even with READ LONG, a lot of unnecessary and counterproductive filtering is done on bad sectors. Ambiguity is introduced in the RLL conversion. In bit-shifted portions, parity is miscorrected. "Weak bits" are rounded to zero, so that even multiple passes yield zeroes in those portions (the bad sector stripe created long strings of weak bits, albeit shorter than the length of a sector). With multiple passes, the ADC data could probably be averaged together to yield data even within those "weak bit" portions.
In theory, with a firmware mod I could gain access to the ADC data and do my own Viterbi decoding. However, I have no experience accessing hard drive firmware, let alone modifying it. Is there any documentation (or free/cheap software) available for reading/writing an IBM drive's firmware? Are there disassemblers which work on the instruction sets of the CPUs used in hard drives? Are they publicly available? Has anyone on this forum done hard drive firmware modification — especially related to the sector read/write process?
P.S. Speaking of READ LONG — why is there no 48-bit LBA version of this command? Do any drives implement vendor-specific versions of it?