WD2002FAEX failure

I have two Western Digital 2 Terabyte WD2002FAEXs in a RAID0. They're 8+ months old.

About two weeks ago there was an error with the raid that chkdsk fixed with no noticeable loss in data. The drives/raid worked fine for little more then a week with no issues. Worked fine until about 2AM last night, and shut it down like normal and went to sleep.

This morning the raid controller reports error with one of the drives during boot up, and IRST reports one of the drives as failed. I've unplugged, swapped, and cleaned all the cables. There was quite a bit of dust in the fan in front of the drivers, some on the sata ports on the Mobo, and little dust on the drives themselves.

Is it usual/normal for drives to fail between power ups like this? Would blasting the drive/mobo with compressed air possibly help? Any other suggestions?

Mobo is a Asus Sabertooth P67 with their 'thermal armor'. Maybe even more dust around the sata ports under the armor?

Can try it but here is not PC forum and this one is not a good question to ask here. But yes dust can kill your system and cause other problems on it.

As for WD they are not good drives and this is another issue on this one.

Sounds like an ongoing bad sector issue that finally crossed the SMART threshold. Have you checked SMART values on your drives?

Hi, if you decide to do more thorough cleaning please be carefull of static. lcoughey is a repected member in Canada who can also help if you find drive still have problems and you therefore need professional help. Hard drive can fail even withing a short time of use, hence good to have backup of backups, on different media as well.

drc wrote:Sounds like an ongoing bad sector issue that finally crossed the SMART threshold. Have you checked SMART values on your drives?

Am unable to get Smart values from my raid controller.

Put the drive in another computer. Fails Smart test with read element failure. Now appears to be performing an extended scan normally. You're probably right, and will come back wth some bad sectors.

As it is performing an extended test, is it not possible to get the drive working at least temporarily in the Raid again? So i can get some of the data off the drive?

I guess i never performed proper test when i first got the drive, and it is now failing because it is finally attempting to make use of these bad sectors, so only the data written recently to those bad sectors should be damaged?

I would not recommend letting anything run an extended test/scan of any kind. Simply checking the current SMART values using MHDD or similar utility should be sufficient to give you an overview of what is going on. That said, it sounds like there are definitely bad sectors, so at this point the drive should be cloned/imaged before anything else.

@openair,

I agree with the comments from drc (and SquaL).

openair wrote:Put the drive in another computer. Fails Smart test with read element failure.

No-one suggested that you run a SMART test. Instead you were asked to provide the SMART (attribute) values, which does not require that you run any test.

openair wrote:Now appears to be performing an extended scan normally.

As drc said, I also wouldn't do that, if I was in your situation.

openair wrote:I guess i never performed proper test when i first got the drive, and it is now failing because it is finally attempting to make use of these bad sectors, so only the data written recently to those bad sectors should be damaged?

You don't know which sectors are now unreadable, nor when that problem started - although you got some clues that a problem had already developed, a couple of weeks ago. FYI, even if you had tested the drive when you got it, there may have been no problem at that time.

drc wrote:Simply checking the current SMART values using MHDD or similar utility should be sufficient to give you an overview of what is going on. That said, it sounds like there are definitely bad sectors, so at this point the drive should be cloned/imaged before anything else.

The SMART values from the dieing drive and the drive i just bought to clone it to are mostly similar. Both show as 'good.' The CRC error rate, and write error rate are slightly higher.

I have cloned the drive with 1 CRC error during the first 5 seconds of the clone. The raid controller does not recognize the cloned drive as part of the raid. Marking it either offline or unknown depending on the SATA port used. IRST shows it as a separate disc from the raid because the serial numbers are different? How do i fool the raid controller into thinking this is the same disc?

If i cannot get the controller to accept it as the same drive, cloning a member of a Raid0 was a waste of time?

Found change-seaget-serial-software-t22395.html to edit serial. The drives appear to be working as a raid0 again, just waiting for chkdsk to finish recovering some 5000 files. (EDIT: Finished, looks like 95%+ of data is intact.)

Thanks for your help drc, and someone should link that editor on the software page.

Nice work.

I'm just wondering whether the RAID controller records the serial number information in its own nonvolatile memory, or in a special area on the HDDs. If the latter, then it may have been easier and less risky to modify the serial number on the drive itself rather than in the firmware.

fzabkar wrote:Nice work.

I'm just wondering whether the RAID controller records the serial number information in its own nonvolatile memory, or in a special area on the HDDs. If the latter, then it may have been easier and less risky to modify the serial number on the drive itself rather than in the firmware.

Yes, either way it would be perfable to edit the RAID expected serial rather then the manufactures drive serial as im now going to have to restore the serial on the cloned drive at some point, but i was unable to find any information about how to go about doing that. Any suggestions?

Intel has no suggestions how to even resolve a similar problem with expected RAID serials becoming corrupt (extra characters at the begining or end) after certain updates.

fzabkar wrote:Nice work.

I'm just wondering whether the RAID controller records the serial number information in its own nonvolatile memory, or in a special area on the HDDs. If the latter, then it may have been easier and less risky to modify the serial number on the drive itself rather than in the firmware.

Franc ,
all Raid Controllers Have thier Own systems and Methods To Work .Nothing is universal as such .

openair wrote:Yes, either way it would be perfable to edit the RAID expected serial rather then the manufactures drive serial as im now going to have to restore the serial on the cloned drive at some point, but i was unable to find any information about how to go about doing that. Any suggestions?

At the risk of the blind leading the blind, I, as a complete novice, would approach the problem from first principles.

AISI, the RAID metadata could be stored in any of the following locations:

1/ CMOS RAM on motherboard
2/ motherboard BIOS
3/ RAID controller's NVRAM, especially in the case of add-on RAID cards
4/ first few LBAs of drive
5/ end LBAs of drive
6/ Host Protected Area (HPA)

In case (1) there are freeware utilities that can backup and restore the CMOS data.

In case (2) you could use the motherboard maker's BIOS flashing utility to make a BIOS backup, or you could use a universal flasher such as UniFlash.

Case (3) - ?

Cases (4) and (5) could be investigated by means of a hex editor. However, you would need to connect the drive in JBOD mode so that its full user area would be exposed to your software. AIUI, when the drive is behind the RAID controller, its metadata areas may be hidden from view.

In case (6) you could use a utility such as HDAT2 or MHDD or the HDD Capacity Restore Tool to uncut the drive and expose the metadata in the HPA.

If the metadata are not intelligible, then you could experiment with another set of drives to determine the format. For example, you could select different stripe sizes and observe the effect on the metadata. You could also change the order of the drives in a RAID 0 array. This should interchange the drives' parameters within the metadata. If the two drives are otherwise identical, then this should find the locations of their serial numbers, assuming that the data are not stored in plain text.

In the case of 4, 5, and 6, would that information not have been cloned alone with the rest of the sectors on the drive?

In the case of 1, and 2 backingup and restoring alone would be rather useless, as i need to edit the expected serial to match the new drive. The Asus util or Uniflash won't allow me to edit it, so i'd still be stuck with a hex editor? Or can you suggest a util to edit these for a Asus P67 Sabertooth? Im having no luck finding anything but backups, restores, and clears on google. Or can these backups be editted with just a txt editor?

If i am stuck with a hex editor, which i have very little experience with, seems editing the drives firmware with the util i linked above is actually the easier and safer route? (although probably more time consuming in the long run). Also as chkdsk made some corrections after the clone, the 'only' difference is probably no longer just the serials, making this just as time consuming (another clone)?

I would think that at the outset you should not be too concerned about editing and restoring the metadata. Instead I would first concentrate on locating it and determining its structure. Only then will you know which tool, if any, would be suitable for the job.

That said, I expect that the capacity of CMOS RAM would be too small to store the RAID metadata, but I still included this possibility for completeness.

The BIOS code is mostly compressed, except for the boot block at the very end, and one or more data structures such as the ESCD. The code modules can be extracted and modified using tools such as AMIBCP (AMIBIOS) and CBROM (Award). The ESCD contains a table of devices and resources which have been discovered by PnP. Both the BIOS and the OS are able to update this table. I'm only hypothesising, but it could be that the RAID controller has a similar area in BIOS that is set aside for its metadata. If so, then you could determine the location of these metadata by comparing the BIOS backups before and after modifying some aspect of the array, such as switching the order of the drives, or changing the stripe size. The backup would be in the form of a file which you should be able to edit using a regular hex editor. You may need to recompute a checksum, though.

As for cases 4, 5, and 6, yes, those metadata would have been cloned. However, your problem is that those metadata would include the serial numbers of the original drives, not your replacements. Therefore, if the RAID controller is comparing the original serial number stored in the metadata against the serial number reported by the new drive, then clearly it will fail. In this case you will need to edit those metadata to match the serial number of your new drive. Instead you have chosen to go the other way, ie you have edited the drive's firmware to match the serial number in the original metadata.

One quick way to determine whether the RAID controller is reserving any sectors for its metadata is to compare the total LBAs reported by the RAID controller against the total LBAs on the labels of your drives. If there is a difference, then you would reconfigure the drives as JBOD and then add up the total number of LBAs reported by each drive. If the numbers match the labels, then the metadata would be located in the visible user area, either at the beginning of the drive or at the end. If the MBR is located in sector 0, then the metadata would most likely be at the end of the drive. Otherwise, if the MBR is located in sector x, then sectors 0 to x-1 would contain the metadata.

Alternatively, if the total LBAs in JBOD mode does not match the labels, then the metadata must reside in a HPA. In this case you would use one of the previously mentioned tools to restore the full native capacity of the drive, and then use a disc editor to view and save the end sectors.

In retrospect, the above procedures appear to be a lot more complicated, but once you know exactly where the metadata are stored, then you could use any HDD, not just those models which are supported by a particular utility. Moreover, any erroneous changes that are made to the user area of a drive are much more easily undone than erroneous changes made to the firmware.

Of course, all the above is just speculation, but that's how I would approach the problem if I had to do it from scratch.

WD2002FAEX failure

WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure

Re: WD2002FAEX failure