jdituro wrote:
if I run typical read and write tests on the HDD how can I determine if the failure is local to the controller on the HDD itself or that of the controller on the motherboard
That's why I said an HDD used for testing should be
known-good* - then the cause of any errors is highly unlikely to be the HDD. Also the specific type of failure will sometimes make the suspect component (more) obvious (though not in this case).
jdituro wrote:
I recently had a siduation were I found a delayed write error during an HDD evaluation and assumed it was the HDD. After replacing the HDD, OS etc. came to get the same errors with the new HDD. Rather than wasting time guessing can you reccomend a method that is more precise.
As well as the point above, that's also why I said I wouldn't be using Windows for the testing, as you are affected by its limited error messages, and hidden driver behaviour (e.g. retries could hide marginal issues).
In your example, you should be looking in the Windows system event log, to see which specific driver(s) logged errors, and what that error was, which then led to the displayed messaged about delayed write error. Also, if you had seen hardware errors logged before that delayed write error, then you could have started with the hypothesis that the OS was unlikely to cause hardware error messages, and started considering other causes first, rather than reinstalling the OS... (Personally I've never seen a software cause for those specific Windows errors, but I rarely use Windows these days, so I don't claim it can never have an software cause - just that I've never seen that.)
I'm usually running *nix, where I can easily see nice, verbose kernel driver messages, which typically give a better clue that what I have seen in Windows error messages, in my experience. However it really can be difficult to identify the true cause of an interface problem, from
any error messages alone. Properly interpreting error messages from any tool, and knowing what cannot be assumed or believed, can be an art.
There are also DOS-based disk I/O tools that I mentioned before (e.g. MHDD, HDAT2 etc.), although you may need to temporarily switch the SATA controller into IDE / legacy / compatibility mode (whichever it is called in each BIOS) to use that type of tool. You will likely get less ambiguous error messages from such tools (no drivers involved), for
some types of problem.
If you had used a known-good* disk for testing, you wouldn't have started by guessing the disk was faulty.

Or you could have moved the customer's disk into a test system which you run in your workshop, to see if the problem moves with the disk - in your case it would not have done that, and so again, you would be narrowing-down the cause.
(*Of course known-good disks can become faulty, but that's why you use a lab system to run tests on items like those, to confirm that there are no detectable problems, before then using them on customer systems.)
There are limitations & caveats to these types of techniques, and without lots more time to explain things, nor having the faulty hardware here to find out
exactly what was wrong with it (additional test equipment might have helped, but is costly), I can just give some suggested approaches for you to consider.
