First, I'd like to say hello. I just stumbled across this site, and it looks like a place I will like! Now to the issue:
I work for a test engineering company, whom among other things manufactures hard drive test equipment.
We have been looking into an issue with SATA Hard Drives for over a year now, and despite our best efforts we cannot get Dell, HP or the HDD OEM's to investigate this issue further, or to provide us with more information. We have seen one firmware fix from Seagate (Link below), for a single model, but that's it. It seems with time more and more drives are affected by this, so I thought I would ask the folks here to see if anyone can help.
Here is what happens:
About a year ago we encountered a drive that would not go READY on our test system. Eventually we discovered that the solution was to unplug the SATA signal cable — wait "n" seconds and plug the cable back in. Under this scenario the drive would go READY every time.
For this failure the drive presented 7F at the test system. 7F is the same failure mode you see if the SATA cable is not connected. The internal registers for the SATA PHY also indicated that there was no media connected. Of course this was a false failure — the media was indeed connected. The remedy was to disconnect the SATA cable apply power to the drive — wait n seconds — and plug in the cable.
Our test system fix was to disable the SATA Bridge (for example — hold the SATA reset line low) when power was applied wait n seconds and then enable the SATA Bridge. This virtually eliminated the problem for most of the drives in that family.
We assumed it was a bug in the Seagate microcode or the drive SATA Bridge HDL. Since it was only one drive family (at the time) we felt the problem wasn't significant. The wait time we choose was 10 seconds. This was based on the fact that most drives go READY in 7 seconds and the penalty for the test system efficiency was minimal. We did not want to extend that time significantly beyond that point for several reasons. One of which was that extending the READY time for all SATA drives was a significant test through-put penalty just to accommodate one family of drives, but also for the reasons outlined below.
The important thing to understand is the nature of this problem. If the SATA Bridge is enabled "early" the drive will not go READY no matter how long you wait. So this is not an issue of an extended wait time — but a hard "false" failure mode. If the host computer or the test system enables the SATA Bridge before the drive is READY the drive will not go ready unless the drive power is cycled.
In our testing we tried to see if there were other ways to fix the problem if we saw a 7F when we enabled the SATA Bridge. As a check — when we saw the 7F we disabled the SATA Bridge a second time waited n seconds — and re-enabled. This was not effective. No matter how long we waited or how many times we cycled the SATA Bridge the drive would not go READY. Of course we are adverse to the idea that if a drive fails to go READY we cycle the power until it does go READY. That is akin to "ignore all failures — just MAKE the drives pass the test", we don't want that, we only want to pass truly good drives.
Over the intervening months we've discovered that the 10 second wait time may not be working for all drives. We assume that some drives occasionally take longer than 10 seconds to go READY. (See comments below).
Last week we encountered another variation of this failure mode. There is a family of Fujitsu lap top drives that have a similar failure mode. We discovered that the G1 version of this drive has a normal power cycle failure rate of less than 1% but the G2 version of the family has a 25% power cycle failure rate. In investigating this high failure rate we discovered it is a variation of the failure mode described above.
With this drive the failure mode presents the status 80 (drive busy) instead of 7F. As mentioned the failure rate is as high 25% on the power cycle test. This time we found that we could disable the SATA Bridge — wait a few seconds — re-enable the SATA Bridge and the drive would go READY. This might sound like a reasonable fix — but we have some major concerns. This "fix" will be in the next release of the test code or we will have a very high incidence of false failures with this drive family.
We're beginning to wonder if there is a Spec on when the computer BIOS will enable the SATA Bridge after the system is powered on. If the BIOS enables too soon these drive will exhibit a false READY failure. From what we have learned most computers must not have the SATA Bridge enabled at power on — and at some later point enable the SATA Bridge. We would suspect this "time" is a function of the BIOS overhead at power on and has a lot of variability.
Our concern is that we're now extending that time well beyond the 10 seconds — and that was very time arbitrary when we choose it. At this point we feel like the test system is beginning to "make the drive go READY at any cost", which could present quality issues in the field.
As another observation, SATA drives are designed for Hot Plug environments. In the application we are using that is not the case — but why do a few drive families act this way? They certainly wouldn't work in a Hot Plug environment.
Second in the absence of a SPEC for the time between power on and the SATA Bridge enabled — there is dangerous point — when 25% of the time we could be faced with a false power on failure.
Third even if new drives typically go READY in under 10 seconds — there are a number of things that might make that time extend (temperature, voltage, vibration, shock and normal drive aging). To my knowledge computer systems cannot fix this problem — because they cannot cycle the drive power. With most of the drives — once the drive is powered on you must power off the drive to recover.
At this point we would like raise this as a potential reliability issue. We would like to get some guidance from the folks here. Should we "make the drives go ready"? Or "always fail"? Or something in between?
What have others seen?
Specifically how long should we wait before we enable the SATA Bridge? What should we do about drives that require that we cycle the SATA Bridge a second time to make them go READY?All thoughts are appreciated!Seagate Firmware:
0 —
http://h20000.www2.hp.com/bizsupport/Te ... =0&mode=4&"