|
A computer system was corrupting a user's data (due to what appeared to be a defective motherboard chipset). The corruption seemed random, intermittent, and each occurrence small (like a bit from a file being inverted).
A backup of the data existed; however, the backup was connected to the corrupting system and so too experienced similar corruption as the other data.
Now there are multiple data sets of what should be the same data, but due to the corruption (which itself was not identical for the data sets), the sets are different. The data sets were composed of the following media file types: JPEG, PDF, PPT (PowerPoint), MPEG, and some BMP (bitmap) documents.
Between the two data sets, there are files that should be the same, but are different (at least one being corrupt). For most of the corrupt media files, the corruption is not noticeable when opening or playing the documents in their typical associated applications (Adobe Reader, PowerPoint, Windows Media Player, etc.). The picture files that should be identical from both sets (but their binaries are slightly different) can be displayed and seem to be identical.
I want to purge only the corrupted files from the data sets. To first do this, it seems necessary to determine which are the corrupt files between the sets. How can this be done?
|