| HDD GURU FORUMS http://forum.hddguru.com/ |
|
| Multiple drives exhibiting same symptoms -- hardware?? http://forum.hddguru.com/viewtopic.php?f=7&t=22787 |
Page 1 of 1 |
| Author: | tranceaddict84 [ April 27th, 2012, 10:33 ] |
| Post subject: | Multiple drives exhibiting same symptoms -- hardware?? |
Hey guys, bit of a weird one: I have a total of five Samsung HDDs as follows: 1TB - single drive config 2x1.5TB - RAID0 2x2TB - RAID0 And a Crucial M4 128GB SSD for my OS. Other potentially relevant system specs: Asus P5Q P45 / ICH10R Windows 7 x64 Latest Intel storage system / drivers (plus tried on a clean install using MS' AHCI drivers). Recently my I noticed copying files to either of my stripes was intermittently slow. When copying large numbers of large files, after a while the speed would drop from the usual ~150MB/s to a fairly consistent 30MB/s, sometimes speeding up again later. So I fired up resmon to take a look. First weird thing I noticed was the activity graphs for each disk that're usually on the right side of the Disks tab in Windows 7 resmon were not there. Only the top 'Overall' disk I/O graph was shown. I figured it was a driver issue / OS screw-up and was probably related to my slow RAID I/O speed, so I reinstalled Windows 7. This fixed the resmon issue but the performance issue was still there. In resmon it manifested as a maximum queue length and >98% busy time on the affected drive(s). Next thing I figured was that one or more of my HDDs had errors, so I began scanning them with MHDD. I used to work for a system builder where we used MHDD on an industrial scale to check for faulty drives, so I know what a failing drive looks like as well as what a healthy drive should look like. In my opinion, my results are kinda strange: all of the drives that have been in RAID are showing approximately the same range of read times; a very large number of <10ms and <50ms (far more than a new / healthy drive should have), quite a few <150ms, and a small number of <500ms (but not as many as I'd expect to see from a mechanically failing drive). If I'd seen these results for one of the drives, or even one from each RAID stripe, I could accept that I had an impending faulty drive or two on my hands and replace them. However all four RAID drives are like this, but the 1TB drive which has not been in RAID tests almost like a brand new drive: almost completely <3ms with just a very small number of <10ms and <50ms. Considering this is by far the oldest of the five drives, I find this very strange. I have to say I think the odds of all four drives failing pretty much simultaneously are astronomical. I'm therefore wondering if something the RAID controller has done could have screwed up the drives at a block-level, causing slow access to certain areas of the drive? I've bought an additional 2TB drive and backed up everything off the 2x1.5TB stripe. Is there anything you guys can suggest that I could try running / doing with those drives to kinda zero every sector and restore the drives to an unused state (low-level format??). Bear in mind the alternative is to replace another three HDDs (expensive) and I have an sure-fire way of gauging success (run MHDD again), I might as well try anything that might work. I've got a bootable USB with Hiren's boot CD so I have a wide range of tools at my disposal, so fire away with suggestions Cheers! |
|
| Author: | labtech [ April 27th, 2012, 20:51 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
What are the raw smart values for each drive? Are these consistent or similar among all "faulty" drives? |
|
| Author: | tranceaddict84 [ May 1st, 2012, 0:24 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
Thanks for the reply. Here are the SMART values for the two 1.5TB drives. All claim to be OK but some of the values don't look too healthy, particularly 5, 196 and 200 from the first one. Code: SMART ATTRIBUTES: ID Description Status Value Worst Threshold Raw Value TEC --------------------------------------------------------------------------------------------------------------------------------------------- 1 Raw Read Error Rate OK 100 100 51 0 N.A. 3 Spin Up Time OK 72 72 11 9320 N.A. 4 Start/Stop Count OK 96 96 0 3860 N.A. 5 Reallocated Sector Count OK 100 100 10 4 N.A. 7 Seek Error Rate OK 253 253 51 0 N.A. 8 Seek Time Performance OK 97 97 15 16174 N.A. 9 Power On Time OK 99 99 0 5059 N.A. 10 Spin Retry Count OK 100 100 51 0 N.A. 11 Calibration Retry Count OK 100 100 0 0 N.A. 12 Power Cycle Count OK 99 99 0 1293 N.A. 13 Soft Read Error Rate OK 100 100 0 0 N.A. 183 SATA Downshift Error Count OK 100 100 0 0 N.A. 184 End-to-End error OK 100 100 0 0 N.A. 187 Reported Uncorrectable Errors OK 100 100 0 0 N.A. 188 Command Timeout OK 100 100 0 0 N.A. 190 Temperature Difference from 100 OK 79 62 0 353697813 N.A. 194 Temperature OK 78 61 0 22 C N.A. 195 Hardware ECC Recovered OK 100 100 0 142592 N.A. 196 Reallocation Event Count OK 100 100 0 4 N.A. 197 Current Pending Sector Count OK 100 100 0 0 N.A. 198 Uncorrectable Sector Count OK 100 100 0 0 N.A. 199 UltraDMA CRC Error Count OK 99 99 0 7 N.A. 200 Write Error Count OK 100 100 0 45 N.A. 201 Off Track Errors OK 253 253 0 0 N.A. Code: SMART ATTRIBUTES: ID Description Status Value Worst Threshold Raw Value TEC --------------------------------------------------------------------------------------------------------------------------------------------- 1 Raw Read Error Rate OK 100 100 51 0 N.A. 3 Spin Up Time OK 73 73 11 8880 N.A. 4 Start/Stop Count OK 96 96 0 4143 N.A. 5 Reallocated Sector Count OK 100 100 10 0 N.A. 7 Seek Error Rate OK 253 253 51 0 N.A. 8 Seek Time Performance OK 100 100 15 0 N.A. 9 Power On Time OK 99 99 0 3954 N.A. 10 Spin Retry Count OK 100 100 51 0 N.A. 11 Calibration Retry Count OK 100 100 0 1 N.A. 12 Power Cycle Count OK 99 99 0 1418 N.A. 13 Soft Read Error Rate OK 100 100 0 0 N.A. 183 SATA Downshift Error Count OK 100 100 0 0 N.A. 184 End-to-End error OK 100 100 0 0 N.A. 187 Reported Uncorrectable Errors OK 100 100 0 0 N.A. 188 Command Timeout OK 100 100 0 0 N.A. 190 Temperature Difference from 100 OK 80 72 0 336855060 N.A. 194 Temperature OK 79 68 0 21 C N.A. 195 Hardware ECC Recovered OK 100 100 0 59720 N.A. 196 Reallocation Event Count OK 100 100 0 0 N.A. 197 Current Pending Sector Count OK 100 100 0 0 N.A. 198 Uncorrectable Sector Count OK 100 100 0 0 N.A. 199 UltraDMA CRC Error Count OK 99 99 0 7 N.A. 200 Write Error Count OK 100 100 0 0 N.A. 201 Off Track Errors OK 253 253 0 0 N.A. I've not yet figured out the best way to back up the stuff on the 4TB stripe so I don't want to break the mirror to run diskcheckup. I can disable RAID and run one of the Hiren's boot CD tools from DOS, but I'll have to take a photo of my screen and type out the results - I'll get back to you on that. I figure I'll see if the two 1.5TB drives are salvageable by any means first or if I'm going to have to bite the bullet and buy at least one more drive... |
|
| Author: | ReclaiMe [ May 1st, 2012, 6:12 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
See if SMART attributes change after one more slowdown. Also, I'd suspect the power supply. If you are doing all your tests on the same machine, what is installed (full list), what is the power supply rated output, and how old is it? |
|
| Author: | tranceaddict84 [ May 1st, 2012, 6:34 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
All testing carried out on the same machine. The PSU is a CoolerMaster SilentPro 600W. It's an 80+ Gold rated, $150US+ PSU and is less than a year old. I think if it was PSU related I'd be experiencing other issues such as random hard reboots / power-offs. Full list of components: Asus P5Q Core2 Quad Q9550 @ 3.4GHz 8GB Corsair XMS2 DDR2 800MHz AMD Radeon HD6950 5x mechanical HDDs plus Crucial SSD Well within spec for a decent 600W PSU I'd say. Also, since I've broken the 3TB stripe and tried copying stuff to the two 1.5TB drives individually I've noticed pretty much the same behaviour. I can copy from my new 2TB to either of them and it starts off fast (~100MB/s) then slows down to as little as 2-3MB/s. Cancel the copy and restart it and often it'll go fast again -- maybe the OS / controller is writing to a different area of the drive? So I do think there's something wrong with both of these drives; I'm just not sure it's mechanical. |
|
| Author: | ReclaiMe [ May 1st, 2012, 7:41 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
Cables? Several SATA cables bundled together, as in crosstalk? Or all the cables coming from the same batch with the same manufacturing defect? |
|
| Author: | DavidPierson [ May 5th, 2012, 21:09 ] |
| Post subject: | Re: Multiple drives exhibiting same symptoms -- hardware?? |
Well, the initial fast speed could just be due to caching. Could you change the drive setting from Optimise for performance to Optimise for quick removal (or whatever it is called in Win 7) ? Does that have any effect on the initial speed? |
|
| Page 1 of 1 | All times are UTC - 5 hours [ DST ] |
| Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |
|