Page 1 of 1

Hard Drive Failure Detected, HDDScan reports?

Posted: March 29th, 2010, 16:54
by satyrwilder
Hi everyone, I have an extremely serious issue and I'm here hoping for assistance or guidance. My computer is about two breakdowns short of a crime against humanity, I just passed the 28-hour-mark fighting to save my data, hopefully salvage some decent quality hardware. Losing my server hard drive would almost be the end of the world for me. Normally I'm all about doing the research, reading the manual etc but I just don't have time. I'm going to keep combing this forum and other resources but I'm hoping this will be faster. I pasted most recent tests at the end.

Also, thank you to any and all out there for sharing critical expertise.

Yesterday my hard drive threw me a hard disk failure detected. The drive has continued to function consistently, although responsiveness and performance have had periods of degradation, functionality is presently normal or near-normal. I have been working on preserving my data regardless and attempting to determine what's going on, but hardware is not my thing.

Affected hard drive is a Western Digital 1T Green hd. No history of malfunction.

OS is dual boot, GRUB bootloader, Vista / Ubuntu Server 8.04 but *nix is flatlined except for Samba. A friend installed a Broadcom 4306 wireless card as a surprise favor and it killed Hardy's networking, I didn't have time / space to deal with repairing / reinstalling *nix on a Vista dual boot given the circumstances. So I just moved my data to Windows (Samba file server still serves shares / files between Vista / *nix without issue) and been making do ever since.

I'm a web developer, the affected machine runs XAMPP and other dev tools (Netbeans, FTP clients, PuTTY, etc.) for extended periods of time. Security is Windows Security Essentials.

I have had major hardware, networking, OS, and security issues that might be relevant. I'll try to recap issues that all happened within the same time frame that may be related to my HD problem?? I have no idea, these are my educated guesses.

Problems first started to manifest --

About a several days of increasingly degraded performance, I ran speed test - 14.4K down, upload neglible. Checked my network connections; apparently my router was reset, since my network was suddenly broadcasting totally unsecured. (Possible somebody in my household, also potential malicious attack to make it available by a neighbor. Network monitoring software shows about ten or fifteen neighborhood piggybackers, five - seven regular interlopers.)

Possible severe virus infection / malware exploiting web server, etc?

This is around the time the first hard drive failure dialogue appeared. Took corrective steps to secure network and began working on preserving data and diagnosing hd issue. Numerous security vulnerabilities on my dev server (web servers, db servers, mx, FTP, shell, etc.) various open ports could have been leveraged maliciously; so far, security software detected and removed 5 viruses, I identified and removed 2 more. Possible compromise may have introduced a virus that is causing Vista hd fail detection.

Security compromise / possible active malicious attack interfering with behaviors -

While correcting network security settings, I noticed my samba shares, some shared / public locations, etc on my network had stopped behaving normally. I set a packet sniffer to run last night; mind you, I'm no network admin but it looks to me as though one of the piggybackers is running spoofs / intercept attempts. (Since all attempts appear consistently failed and aren't aimed at hijack targets, I will worry about that as soon as my hard drive isn't a fire hazard!)

Recent evidence of damaged system files -

I had been running IIS until about a month ago for an ASP project, but after the project was completed, IIS was damaged when I tried to customize PHP on IIS for Cake. Ended up just disabling it and couldn't enable it any longer, so I simply switched back to XAMPP and ran windows system file repair to restore functionality I needed (.NET, etc). Windows system file repair successfully restored the functionality I was concerned with - didn't verify if IIS had also been repaired, not really my highest priority...

Failed device impacted many other components' performance -

The BCM wireless card also malfunctioned and effectively failed last night - I'd disabled it for malfunctioning and switched to LAN; several days of increasingly degraded network performance, I ran speed test - 14.4K down, upload neglible. Checked network - my machine had 3 connections to my network, all valid ips, ping response, etc. I ensured the wireless was still disabled; attempted some testing and it did not respond with any normal behavior - erratic responses to enable / disable, couldn't detect any wireless networks (router is >20ft, other machines were connected wirelessly)

Opened box early this morning -observed the wireless card had suffered a moderate short, and removed it. The ethernet card is undamaged. Straight LAN connection now, performance drastically improved, connection behaves normally.

Contaminant?

Since it was open, I did a quick clean of computer interior, and realized that the interior of my computer and all compontents were all covered in very fine even layer of [url="http://en.wikipedia.org/wiki/Diatomaceous_earth"]diatomaceous earth[/url]. (I live in Texas - no cold winter, DE is a common treatment against insects, eg. fire ants. >_< ) I will of course do another thorough cleaning of my case and components but the possibility of DE being the cause of the hd failure, it really is a fire hazard - and I doubt even a specialist could save it, or am I wrong?

SATA / Motherboard problem?

I also installed a cheapo refurbished DVD burner I picked up yesterday expressly so I could burn my OS for repair. DVD burner functioned normally once and has not worked correctly since - failing to detect media. I read in one of the posts on this forum that one of the errors from my HDD SMART report usually indicates bad cables - the HD cables were just a little loose in their seats, so I seated them completely, but since closing up my box, the errors have happened much more frequently, although none for several hours - is that consistent with contaminant? IE, when trying to clean the computer I spread more contaminant, or something... yes, I am kinda guessing here! Granted I have not heard a change in spin up or down, no odd smells, etc.

Cables / other failing device?

After closing up my case, I attempted to burn the OS repair dvd (MagicISO.) The dvd burner only recognized media loaded the first time - after that, all media, CDs, DVDs blank or not, were not detected. Both the DVD burner and the HD are SATA - could failing devices be related? Wouldn't that point to the motherboard (sata)? Note, the same cable bundle connects the CD burner, but the CD burner is IDE and works properly.

Identity Info:

HDDScan Identity Report
Model: WDC WD10EACS-00ZJB0
Firmware: 01.01B01
Serial: WD-WCASJ1075786
LBA: 1953525168

Report By: HDDScan for Windows version 3.2
Report Date: 3/29/2010 7:46:21 AM

Main Information
Name Value
LBA Support Yes
LBA28 268435455
LBA48 1953525168
ATA Version 8
Logical Sector Size 512 bytes
Physical Sector Size 512 bytes
Cache size 16384 KB
ECC bytes 50
Nominal Form factor Not Reported
RPM Not Reported
Interface SATA
Connected through IDE-onboard controller


DMA Support
Name Value
DMA Support Yes
Multiword DMA 0 Supported
Multiword DMA 1 Supported
Multiword DMA 2 Supported
UDMA 0 Supported
UDMA 1 Supported
UDMA 2 Supported
UDMA 3 Supported
UDMA 4 Supported
UDMA 5 Supported
UDMA 6 Selected


PIO Support
Name Value
PIO Support Yes
PIO 0 Supported
PIO 1 Supported
PIO 2 Supported
PIO 3 Supported
PIO 4 Supported


Features Support
Name Value
SATA Gen2 3.0 Gb/s Supported
SATA Gen1 1.5 Gb/s Supported
Software Settings Preservation Enabled
Commands queue Supported
Queue depth 32
NCQ Supported
TCQ Not Supported
Host Protected Area (HPA) Supported
Automatic Acoustic Management (AAM) Enabled
Advanced Power Management (APM) Not Supported
Power Management Supported
Read look-ahead Enabled
Write cache Enabled
Password Protection Supported
SMART Enabled
Device Configuration Overlay (DCO) Supported
General Purpose Logging (GPL) Supported
Streaming feature Not Supported
SMART self-test Supported
SMART error log Supported
SCT Command Transport Supported
SCT Long Sector Access Supported
SCT Write Same Supported
SCT Error Recovery Control Supported
SCT Features Control Supported
SCT Data Tables Supported
Extended Status Reporting Not Supported
Free-fall Control Not Supported

A SMART offline short test errored out at 10%. I'm not sure other read, butterfly read, etc tests are even running, or which tests I should run.

What do these errors indicate?

001 Raw Read Error Rate 200 200 00000000-0000 051
003 Spin Up Time 232 179 00000000-14FF 021
004 Start/Stop Count 100 100 00000000-0066 000
005 Reallocation Sector Count 129 129 00000000-0236 140
007 Seek Error Rate 200 199 00000000-0008 051
009 PowerOn Hours Count 086 086 00000000-2880 000
010 Spin Retry Count 100 100 00000000-0000 051
011 Recalibration Retries 100 253 00000000-0000 051
012 Device Power Cycle Count 100 100 00000000-004B 000
192 Emergency Retract Count 179 179 00000000-3ED8 000
193 Load/unload Cycle Count 001 001 00000009-C54D 000
194 HDA Temperature 112 103 40 C 000
196 Reallocation Event Count 023 023 00000000-00B1 000
197 Current Pending Sector Count 200 200 00000000-0000 000
198 Uncorrectable Sector Count 200 200 00000000-0000 000
199 UltraDMA CRC Errors 200 200 00000000-0001 000
200 Write Error Rate 200 200 00000000-0000 051


Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.

C:\Users\satyrwilder>chkdsk
The type of the file system is NTFS.

WARNING! F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
682176 file records processed.
File verification completed.
2068 large file records processed.
0 bad file records processed.
2 EA records processed.
76 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
785806 index entries processed.
Index verification completed.
0 unindexed files processed.
CHKDSK is verifying security descriptors (stage 3 of 3)...
682176 security descriptors processed.
Security descriptor verification completed.
51816 data files processed.
CHKDSK is verifying Usn Journal...
36754424 USN bytes processed.
Usn Journal verification completed.
Windows has checked the file system and found no problems.

487178128 KB total disk space.
330491788 KB in 534854 files.
252468 KB in 51817 indexes.
0 KB in bad sectors.
826528 KB in use by the system.
65536 KB occupied by the log file.
155607344 KB available on disk.

4096 bytes in each allocation unit.
121794532 total allocation units on disk.
38901836 allocation units available on disk.

C:\Users\satyrwilder>cd c:\

c:\>chkdsk /f
The type of the file system is NTFS.
Cannot lock current drive.

Chkdsk cannot run because the volume is in use by another
process. Would you like to schedule this volume to be
checked the next time the system restarts? (Y/N) y

This volume will be checked the next time the system restarts.

c:\>

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 29th, 2010, 17:24
by drc
drive is failing, bunch of reallocated sectors. at this point checkdisk is dangerous.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 29th, 2010, 18:30
by satyrwilder
Oh, crap... I've got it set to run chk on next restart. >_<

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 29th, 2010, 19:36
by satyrwilder
Are you able to wager a guess as to why it's failing? Ran the serial past Western Digital, turns out it's still under warranty and it'd be nice to know in case they ask what happened to it.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 29th, 2010, 21:40
by drc
bad sectors is bad sectors, they just happen. can be caused by external means but not necessarily.

In my experience you just put "bad sectors" on the reason for RMA and they are happy to take care of it for you. I doubt they even check, I would think it would be way too time-consuming for them.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 3:32
by satyrwilder
Yeah, spose so... Thank you for helping me out, spose I have a warranty report to make!! Peace out.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 6:51
by disktech
IF drive in warranty send for waranty if you want to fix then reset smart in pc3000 or wd doctor (salvationdata). or move g-list to p-list and translate regenrate then do zerofil drive will be fine..

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 9:30
by drc
I disagree. Additionally, this sounds like an end-user, not someone with a pc3000. And he already stated that it is in warranty.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 12:10
by satyrwilder
Ha ha ^_^ I'm a she.

End-user, well - in all fairness, I'm a developer, but obviously I'm spoiled from having a sys admin on the team for these rough maint jobs... This has DEFINATELY been a learning experience!! Hardware issues make my eyes glaze over anyway, and I have never experienced this level of total-system-failure before. I'll be the first one to admit I did this to myself, all the issues I'm having are just the compounded product of me being a stereotypical "lazy programmer" slacking on maintenance... but hey, at least I learned my lesson!

It's definately been a learning experience... I was able to determine a LOT more about what triggered the abrupt degradation. The drive was already failing, so it wasn't successfully eliminating the files I was migrating out of Samba's shares. Resulted in ~60K files just piling up in Ubuntu's partition, parked in the trash.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 12:29
by drc
Well, I meant end-user as in not-trying-to-repair-the-drive, not derogatorily. At any rate, good job noticing the problem while you still are able to retrieve your data, instead of continuing to use the bad drive for six more months and then wondering why it doesn't work at all anymore.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 30th, 2010, 20:39
by satyrwilder
I'd actually rather try to save the drive, but I don't know if that is possible?? If it would require a specialist, I'll just request the replacement but I'd like to avoid waiting on shipping if this disk can be saved... I know it's dicey, let alone for a n00b, but I'll try anything that doesn't void my warranty!

The drive is two years old, it's wearing out. Is there a way for me to determine if it's too worn out to bother with?

I heeded the warning in your first reply - warning that chkdsk was / is dangerous. I removed the chkdsk + repair I'd scheduled and haven't run it since. Also, I inferred your meaning included chkdsk-type utils, admin level, command line etc disk utilities as posing the same general danger to a disk as sick as mine. (Not a moment too soon, by the way, I would have tried gpart after chkdsk!) I have only used light-touch utilities like hddscan on Vista, and Smartmontools and Disk Usage Analyzer etc on *nix.

Sorry if this is a stupid question, but - is gpart bad juju as well? What if the disk health improves and stabliizes (ie, as I fix problems manually?)

This might be an excess of optimism but smartctl -a /dev/sda4 - the first tests reported the drive situation as critically bad, much worse than hddscan was finding - scary bad reports like the one below, almost everything failing or pre-fail and Disk Usage Util's graphic representation of the disk and the partitions amounts to a giant adhesive ball of garbage (93% trash/// deleted shares piling up in the trash.)

Vista's file structure blind spot - hddscan SMART can't see past it, and Vista is blind to the mountain of bad files Ubuntu's auto indexer is choking on. Or that's what I'm pretty sure is happening, gonna find out!! Anyway, testing smartctl -a /dev/sda4 during cleaning showed immediate, dramatic improvement of the overall disk health. Once the bad data was gone, testing the disk and individual partitions showed overall pass, all fields had stablizied with pass / old age, except of course reallocated sectors.

.....Or not. I just ran a test and it's doing it again - critical degradation. I HAVE to cut the legs off this thing, so I'm going to drop Samba right now and then indexing for Ubuntu.

Could I use gpart to just drop the Samba partition completely? If resizing is inadvisable, the space could remain unallocated - one less dead zone for my confused old disk to hide zombie data and then try to make it searcheable. Assuming the disk could reasonably handle being repartitioned I mean.

Here's the brand new, very very bad test. Bad data being populated in bad locations, perhaps causing the metadata to return unknowns?

satyrwilder@Jolly-Roger:~$ sudo smartctl -a /dev/sda4
sudo: unable to resolve host Jolly-Roger
[sudo] password for satyrwilder:
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

satyrwilder@Jolly-Roger:~$ sudo smartctl -a /dev/sda4
sudo: unable to resolve host Jolly-Roger
[sudo] password for satyrwilder:
smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: WDC WD10EACS-00ZJB0
Serial Number: WD-WCASJ1075786
Firmware Version: 01.01B01
User Capacity: 1,000,204,886,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Tue Mar 30 19:27:55 2010 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x00) Offline data collection not supported.
SMART capabilities: (0x0000) Automatic saving of SMART data is not implemented.
Error logging capability: (0x00) Error logging supported.
General Purpose Logging supported.

SMART Attributes Data Structure revision number: 17018
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
255 Unknown_Attribute 0x373f 200 016 063 Pre-fail Always In_the_past 69269232549888
45 Unknown_Attribute 0x4344 087 083 068 Old_age Offline - 60680225567041
54 Unknown_Attribute 0x0038 000 000 056 Old_age Offline FAILING_NOW 52983538659968
66 Unknown_Attribute 0x3131 048 068 049 Pre-fail Offline FAILING_NOW 53151365537879
65 Unknown_Attribute 0x5345 067 048 069 Pre-fail Offline FAILING_NOW 35503310133805
32 Unknown_Attribute 0x2020 032 032 032 Old_age Offline FAILING_NOW 35322350018592
32 Unknown_Attribute 0x2020 032 032 032 Old_age Offline FAILING_NOW 550026354720
16 Unknown_Attribute 0x3f00 000 016 000 Old_age Offline FAILING_NOW 280379776891900
255 Unknown_Attribute 0x000f 000 007 015 Pre-fail Always FAILING_NOW 131943408599808
120 Unknown_Attribute 0x7800 000 000 000 Old_age Offline FAILING_NOW 0
64 Unknown_Attribute 0xfe00 001 000 000 Old_age Offline In_the_past 39030002838272
104 Unknown_Attribute 0x4174 190 035 116 Old_age Offline In_the_past 147336810495809
226 Load-in_Time 0x034e 001 061 078 Old_age Always FAILING_NOW 150
150 Unknown_Attribute 0x0016 000 000 022 Old_age Always FAILING_NOW 0

SMART Error Log Version: 122
Invalid Error Log index = 0x42 (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5)

SMART Self-test log structure revision number 17018
Warning: ATA Specification requires self-test log structure revision number = 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging


Again, thanks to everyone for your input and advice!

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: March 31st, 2010, 6:34
by fzabkar
satyrwilder wrote:What do these errors indicate?

004 Start/Stop Count 100 100 00000000-0066 000
005 Reallocation Sector Count 129 129 00000000-0236 140
009 PowerOn Hours Count 086 086 00000000-2880 000
012 Device Power Cycle Count 100 100 00000000-004B 000
192 Emergency Retract Count 179 179 00000000-3ED8 000
193 Load/unload Cycle Count 001 001 00000009-C54D 000
196 Reallocation Event Count 023 023 00000000-00B1 000

Your normalised Reallocated Sector Count (129) has dropped below the threshold (140). You should be entitled to a warranty replacement on that basis. The raw value is indicating 566 bad sectors (= 0x236). In fact your drive has failed SMART.

Your Load/Unload Cycle Count has also hit the threshold. It is presenty at 640,333 (= 0x9C54D). I believe a drive is rated for 600,000 L/U cycles. The PowerOn Hours Count is 10,368 (= 0x2880). This means that your drive is undergoing one Load/Unload Cycle every minute. I wonder if that has contributed to its early demise.

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: April 2nd, 2010, 1:46
by fzabkar
The SMART report returned by smartctl appears to be gibberish. I believe that's a bug in smartmontools. Perhaps you can modify the behaviour of smartctl with a command line switch.

Try reposting your SMART report here:

http://sourceforge.net/mail/?group_id=64297
http://sourceforge.net/mailarchive/foru ... ls-support
http://sourceforge.net/search/?group_id ... rch=mlists

Re: Hard Drive Failure Detected, HDDScan reports?

Posted: April 2nd, 2010, 9:12
by drc
You have 500+ bad sectors already, which is enough for the manufacturer to consider the disk failed and accept a return. Sure, you could keep using it and maybe nothing more would go bad, but why take the risk???