MultiDrive – free backup, clone & wipe disk utility from Atola Technology

All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: WD corrupts the customer data? no ecc by default?
PostPosted: December 19th, 2008, 1:28 
Offline
User avatar

Joined: March 28th, 2008, 7:52
Posts: 1466
Location: Europe, Hungary
hy friends,

I have got in my hands one WD3200AAKS-00B3A0 (FW:01.03A01) wich come to me because the customer can't install the XP to it.
I have tested the drive with MHDD, with the smart self tests, with the WD datalifeguard diagnostics, and the drive passes all tests, including the zerofill.
But the XP still refuses to format the drive. :shock:

The surface scan runs smoothly in mhdd, only have 4 sector with >150ms.

The smart table looks like this:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 153 151 021 Pre-fail Always - 3333
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 138
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 274
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 106
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 37
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 138
194 Temperature_Celsius 0x0022 109 099 000 Old_age Always - 34
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

After I test the drive with filling with 0xFF, the issue becomes more strange! :shock:
The drive writes the FF smoothly, and reads back all the sectors without reporting any errors, but some sectors differs from 0xFF!
If i repeat the comparing, i can see, another sectors are corrupted, and the 2 bads list have matchig points as well.
(the first list have 25000 missmatches, the second list have 12000 missmatches, and some addresses are equal)
I have read out one sector wich is in both list, and i can see this: :shock:
00000000 FD FF FF FF â DF FB FF FF â FF FF FF FF â FD BF FF FF
00000010 FD FB FF FF â FF BB FF FF â DF FB FF FF â FD BF FF FF
00000020 FF FF FF FF â DF BF FF FF â FF FF FF FF â DD BF FF FF
00000030 FF FF FF FF â DD BF FF FF â FF FF FF FF â FF BB FF FF
00000040 DF FB FF FF â FF FF FF FF â FD FB FF FF â DF BF FF FF
00000050 FF FB FF FF â FF FF FF FF â FF FF FF FF â FF BB FF FF
00000060 FF FB FF FF â DF BB FF FF â FF FF FF FF â FF FF FF FF
00000070 FD FF FF FF â FF BF FF FF â DF FB FF FF â FD FF FF FF
00000080 FF FF FF FF â FF BB FF FF â FF FB FF FF â FF BF FF FF
00000090 FD FF FF FF â FD BF FF FF â FD FF FF FF â FF BB FF FF
000000A0 FD BF FF FF â FF FF FF FF â FF FB FF FF â FF BF FF FF
000000B0 FF FB FF FF â FD BF FF FF â FD FF FF FF â FF BB FF FF
000000C0 FD FB FF FF â FF BF FF FF â FF FB FF FF â FF BF FF FF
000000D0 FF FB FF FF â DF FF FF FF â FF FF FF FF â FD FF FF FF
000000E0 FF FF FF FF â DF FB FF FF â FF BF FF FF â DD FF FF FF
000000F0 FD FF FF FF â DF FF FF FF â FD FF FF FF â FF BF FF FF
00000100 FD BB FF FF â FF BF FF FF â FF BB FF FF â DF FF FF FF
00000110 FD FF FF FF â FF FF FF FF â FD FF FF FF â FF FB FF FF
00000120 FD FF FF FF â DF FB FF FF â FF FF FF FF â FF BF FF FF
00000130 DD FB FF FF â DD FF FF FF â FF FF FF FF â DD FF FF FF
00000140 FF BF FF FF â DF BF FF FF â FF FF FF FF â DF FF FF FF
00000150 FF BB FF FF â DF FF FF FF â FD BF FF FF â DF BF FF FF

I can only imagine this issue if the ECC and CRC checking is off in the drive.
The pcb have no external rom and ram, so the ram corruption is more complicated question. (anyway it could be, but the 2 bads list shows another things)
If the sata PHY makes the trick, it must be logged in the UDMA_CRC_Error_Count smart attribute, if i know right...

Somebody have seen this issue before?

Regards,
Janos


Top
 Profile  
 
 Post subject: Re: WD corrupts the customer data? no ecc by default?
PostPosted: December 19th, 2008, 1:36 
Offline
User avatar

Joined: March 28th, 2008, 7:52
Posts: 1466
Location: Europe, Hungary
After i post the first message, i can see, this can't be the ECC/CRC, because only the first WORD of the 32 bit is corrupted, i mean both DWORD looks like this: xx xx FF FF, xx xx FF FF ....

somebody have any idea?


Top
 Profile  
 
 Post subject: Re: WD corrupts the customer data? no ecc by default?
PostPosted: December 19th, 2008, 4:52 
Offline

Joined: July 18th, 2006, 3:05
Posts: 7476
Location: ITALY
Try making a bad and relocate. If doesn't work, may be a translator issue. Sometimes a bad translator makes the drive operative... until you try to format it. P.s. You have hpa set : this makes me more suspicious about config and/or translator.


Top
 Profile  
 
 Post subject: Re: WD corrupts the customer data? no ecc by default?
PostPosted: December 19th, 2008, 5:30 
Offline
User avatar

Joined: March 28th, 2008, 7:52
Posts: 1466
Location: Europe, Hungary
BlackST wrote:
Try making a bad and relocate. If doesn't work, may be a translator issue. Sometimes a bad translator makes the drive operative... until you try to format it. P.s. You have hpa set : this makes me more suspicious about config and/or translator.


I have removed the HPA, you know, and the problem still exists.
Because the corruption is organized, i don't think this is surface or ecc issue.
Anyway, because the bads are moving, and have some matching point, this shows this can be some missing check.
Much more likely this is a pcb issue, but as i sad, have only the minimal parts, and in this case, it must be the MCU (CPU), i think.

Anyway, it is interesting!

Additionally, now i have done the test with read+compare with 0x00, and the drive passed. :shock:
So, the bits only can fall to 0, not rising to 1.
This shows another electrical problem....

(still investigating....)

Janos


Top
 Profile  
 
 Post subject: Re: WD corrupts the customer data? no ecc by default?
PostPosted: December 20th, 2008, 10:04 
Offline
User avatar

Joined: August 9th, 2007, 8:40
Posts: 791
Location: United Kingdom
This looks like a dodgy cable or stuck bit in one of the registers -

If you have a bus analyser, put that in the circuit so you can see exactly what is happening.
Check if IDX bit is set in status register when writing or reading.

Have you tried another drive to eliminate the possibility that it is the cable? or try cleaning the HSA connector pads on the board to see if that helps.

<itch>

_________________
If you can keep your head when all about you are losing theirs, you probably don't fully understand the situation. ... Mr Kipling

https://www.mjm.co.uk/


Top
 Profile  
 
 Post subject: Re: WD corrupts the customer data? no ecc by default?
PostPosted: December 20th, 2008, 18:39 
Offline

Joined: July 18th, 2006, 3:05
Posts: 7476
Location: ITALY
What about hpa set? I think it could be more a fw issue rather than logic board problem... But everything could be.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 46 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group