daltonwide wrote:Maybe the CRC calculation is different between Windows 7 and Windows 8.1?
I don't think so... They're both a CRC of the partition entries array (the ~16K before or after the header). Plus it seems like a pretty bad idea to change GPT header formats for no good reason between OS iterations.
daltonwide wrote:Or does the specific data size of 16476 mean it can only come from an OS instruction?
It's not so much the data size, but rather that this is the disk/partition metadata (GPT header + the location of the partitions) and therefore something that user-level programs will almost never access (see below).
As fzabkar mentioned, it seems like this corruption should only happen when connecting the drive or booting up the computer. The reason is because when a physical drive (let's assume non-boot drive) is connected or powered on, the first thing Windows does is read the first three sectors of the drive: the first sector is the MBR (for backwards "compatibility"), followed by the GPT header (which indicates that this drive uses GUID partitioning + sector number of the first partition entry), followed by the third sector which contains the first few partition entries. These partition entries indicate the type of the partition as well as the starting and ending sector numbers. Using this, Windows can discover where the "basic data partition" is located. The actual file system (NTFS metadata and all your files) begin at the starting sector of that partition (e.g., if data partition starts at sector 0x40800, the NTFS metadata will begin there, followed by your files). When Windows reads this, it will start locating all your files, folders and so on---this is when the drive appears and is accessible.
So basically, the only time Windows is touching or reading the GPT headers is during this initial phase. Since both our errors have GPT headers with the correct sector info and valid partition entries, this seems like a logical place for the corruption to happen---we must have first read this GPT/partition-entry data from the drive first. This is also why I believe most user-level programs will never read, let alone modify, the GPT header: programs interact with the file system, which is confined to a single partition (i.e., between sectors 0x40800 to some ending sector, say, 0x1D10CB7FF). But that's all assuming that the partition sector numbers are kept in memory: if for some reason the GPT header is rescanned, there's another opportunity for corruption.
daltonwide wrote:Maybe we see partition data just because the controller copies data from the last sectors (or an unerased buffer) by mistake, instead of reading the correct source?
This might be possible: the controller has the partition entries and GPT header cached (for whatever reason) and is told to do a read... but instead selectively dumps its cache somewhere. It depends on how large such a cache is though---possible if it's several MB, unlikely if it's only a few KB. Also, it's not zeroing out the rest of the corrupt sector, so the controller may be doing a read followed by a write, which suggests something more complex.
Additionally, I made a mistake saying my "partition entries are not written": they're right before the header in the same file---that's what the "Microsoft reserved partition" and "Basic data partitions" are. My affected drives have exactly those two partitions. In your case, only having "Basic data partition" indicates that that physical drive doesn't have a reserved partition. Having two entries would mean the physical drive has two partitions. So this means our symptoms are actually slightly different: you have a proper CRC for the partition entries (haven't checked if it's actually correct though), whereas the CRC in my header is flat out wrong: the entries exist (non-zero) but the CRC is garbage (all 0s).
fzabkar wrote:This would suggest that the relevant sectors are first read, then amended, and then written back to their original locations. The trailing bytes in the "EFI PART" sector could be "don't care" bytes. If these trailing bytes really do belong to the corrupted file instead of being introduced from somewhere else, then that would confirm that the affected sector is being edited rather than simply overwritten. Moreover, if the corrupt data were originating from a host side buffer, then we would expect to see junk data in the trailing bytes rather than the original file data.
The sources you linked say it must be zeroed. Interesting point you raise though: are writes to drives always issued in 512 byte (sector sized) chunks? If so, then it is being read in, corrupted and not zeroed out, and then written back.
Guess it's difficult to fully diagnose it without repeatability and a closer inspection of the full stack.