View topic - Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Main » Forums home » Conventional hard drives

All times are UTC - 5 hours [ DST ]

Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Page 1 of 1

[ 9 posts ]

Previous topic | Next topic

Author

Message

hbee

Post subject: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: January 17th, 2015, 15:34

Joined: January 17th, 2015, 11:46
Posts: 3
Location: Finland

Hello all,

I've had an rather unexpected and to be honest a bit strange encounter with my file server. Apparently I've got at least 3 totally faulty and maybe even 5 faulty disks (of which 4 are ST2000DM001's). My server is running on a ZFS and the storage pool consists of two striped raidz2's (raid6). Thus, now the whole pool is offline due to another raidz2 set having three failed devices. Supermicro AOC-USAS2-L8I (IT firmware) and HP SAS expander are also being used.

I first noticed a problem after my network drives went down due to high amount of failed writes on all disks (monitoring, anyone?):

Code:

        NAME                       STATE     READ WRITE CKSUM
        data                       UNAVAIL      8   164     0
          raidz2-0                 UNAVAIL      6    30     0
            c0t5000C5005365ECBAd0  UNAVAIL      0    21     0
            c0t5000C500507AA260d0  UNAVAIL      0    21     0
            c0t5000C500507ACBE0d0  UNAVAIL      0    22     0
            c0t5000C5005070B6DFd0  UNAVAIL      0     4     0
            c0t5000C5005070B21Dd0  UNAVAIL      0    22     0
            c0t5000C5005071BF4Fd0  UNAVAIL      0    22     0
            c0t5000C5005071F13Ed0  UNAVAIL      0    22     0
            c0t5000C5005071F40Ed0  UNAVAIL      0    22     0
            c0t5000C50050575E4Fd0  UNAVAIL      2    22     0
            c0t5000C50050780BDEd0  UNAVAIL      0    22     0
          raidz2-1                 UNAVAIL     10   134     0
            c0t50024E92060B16AFd0  UNAVAIL      0    95     0
            c0t50024E92060B16BBd0  UNAVAIL      0    97     0
            c0t50024E92060B1607d0  UNAVAIL      0    95     0
            c0t50024E92060B1693d0  UNAVAIL      0    98     0
            c0t5000C5002A7FB551d0  UNAVAIL      0    98     0
            c0t5000C5002A87CBFFd0  UNAVAIL      0    94     0
            c0t5000C50053495485d0  UNAVAIL      0    96     0
            c0t5000C5004007FE66d0  UNAVAIL      0    96     0
            c0t5000C5005CDA7FA7d0  UNAVAIL      0    15     0
            c0t5000C5005CDA7D59d0  UNAVAIL      2    96     1

This obviously caused me to suspect that either my SAS expander or hard disk controller were faulty. Also, apparently the disks were not completely broken at this point as all the disks came up after I rebooted the server (but started giving errors and went offline one by one). After this I powered off the virtual machine but the disks were left powered on for a couple of weeks.

Now, after replacing hard disk controller (SAS expander is still on it's way as the Ebay replacement was nonfunctional...) the printout is as follows.

Code:

        NAME                       STATE     READ WRITE CKSUM
        data                       UNAVAIL      0     0     0
          raidz2-0                 UNAVAIL      0     0     0
            c0t5000C5005365ECBAd0  ONLINE       0     0     0
            c0t5000C500507AA260d0  UNAVAIL      0     0     0
            c0t5000C500507ACBE0d0  ONLINE       0     0     0
            c0t5000C5005070B6DFd0  UNAVAIL      0     0     0
            c0t5000C5005070B21Dd0  ONLINE       0     0     0
            c0t5000C5005071BF4Fd0  UNAVAIL      0     0     0
            c0t5000C5005071F13Ed0  ONLINE       0     0     0
            c0t5000C5005071F40Ed0  ONLINE       0     0     0
            c0t5000C50050575E4Fd0  ONLINE       0     0     0
            c0t5000C50050780BDEd0  ONLINE       0     0     0
          raidz2-1                 DEGRADED     0     0     0
            c0t50024E92060B16AFd0  ONLINE       0     0     0
            c0t50024E92060B16BBd0  ONLINE       0     0     0
            c0t50024E92060B1607d0  ONLINE       0     0     0
            c0t50024E92060B1693d0  UNAVAIL      0     0     0
            c0t5000C5002A7FB551d0  ONLINE       0     0     0
            c0t5000C5002A87CBFFd0  ONLINE       0     0     0
            c0t5000C50053495485d0  ONLINE       0     0     0
            c0t5000C5004007FE66d0  ONLINE       0     0     0
            c0t5000C5005CDA7FA7d0  UNAVAIL      0     0     0
            c0t5000C5005CDA7D59d0  ONLINE       0     0     0

The following disks are 'dead'. They spin up normally and do not make any 'additional' noises (from what I can understand) but are not detected by my SATA<->USB adapter.
c0t5000C5005CDA7FA7d0
c0t5000C500507AA260d0
c0t5000C5005070B6DFd0

These disks are detected and SMART shows no faults. I wonder what is the reason for Solaris deciding to offline these? Also, these two disks are on the same SAS expander port. Coincidence?
c0t50024E92060B1693d0
c0t5000C5005071BF4Fd0

Also, another strange thing to note. When doing

Code:

sudo dd if=/dev/dsk/c0t5000C5004007FE66d0 of=/dev/null bs=10240

the LED for either c0t5000C500507AA260d0 or c0t5000C5005070B6DFd0 was lit (another disk once again on the same SAS expander port). Could there be something strange going on with my SAS expander? Can a faulty expander brick drives? Can the Norco 4224 backplanes be at fault?

I've been trying to search the forums and the Google in general but the information I've been able to find is rather limited. Do we have any experts here who could share their insight regarding the matter? How probable would it be for the profession data recovery companies to restore one disk fully or two disks partially so the pool can be brought back up and thus, allowing my to backup the data I need? I would think it would be highly probable as there appears to be no physical damage apart from (maybe) slightly excessive heat?

Also, are there any cheap premade serial adapters which would allow me to access the disks via terminal to monitor what's happening?

My plan of action is
1) When I've received a new SAS expander I'll test the faulty drive slots with my old 500GB drives and see if everything is fine (trying to isolate the faulty backplanes).
2) If everything seems to be fine I'll try to bring two offlined but according to SMART, healthy drives back up in Solaris and see if I can start resilvering/replacing dead drives with new ones.
3) If I cannot bring the pool up, I'll be willing to invest in professional data recovery services (assuming the price remains under ~2k). Maybe I'll try to take a look with console access before proceeding with professional services as I understand just connecting the console and monitoring the output would reveal quite a bit?

Thanks, and remember guys: RAID is not a backup! (most of my irreplaceable data was backed up, *phew*). Also, I'm willing to give a small tip via bitcoins for any worthy replies

Top

lcoughey

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: January 17th, 2015, 17:40

Joined: February 9th, 2009, 16:13
Posts: 2585
Location: Ontario, Canada

If pro data recovery is to be your second option, you had better make and only work with clones of the drives.

_________________
Luke
Recovery Force Donor Inventory: 5,000+ Fully Cataloged HDDs & Growing

Top

data-medics

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: January 17th, 2015, 22:23

Joined: April 3rd, 2011, 0:19
Posts: 2003
Location: Providence, RI

The fact that they are "DM" series drives, doesn't surprise me that so many failed. While firmware corruption is possible with these, I'd say it's more likely that a power interruption (possibly caused by the bad expander or backplane) caused the drives to experience simultaneous head crashes.

Unless you really need the data back and are willing to pay a pretty penny, I'd say scrap it and move on. These drives are a nightmare to work on.

_________________
Data Medics - Hard Drive, SSD, and RAID Data Recovery Service Company

Top

Dmitri

Post subject: Four faulty ST2000DM001 in ZFS-based RAID 6

Posted: January 18th, 2015, 7:37

Joined: February 8th, 2014, 8:08
Posts: 456
Location: Eastern Europe /recovering worldwide/

hbee wrote:

I've got at least 3 totally faulty and maybe even 5 faulty disks (of which 4 are ST2000DM001's). My server is running on a ZFS and the storage pool consists of two striped raidz2's (raid6)....

Unfortunately, you're on a rather thin ice with ZFS-based RAID. Choice of recovery tools is quite limited, as well as number of DR companies capable to handle some serious ZFS failure.
Also you are very unlikely to fit into your budget should you bring them the whole RAID for recovery.

For DIY I would definitely follow lcoughey's advice regarding the images and also would check other ST2000DM001 drives as well as drives you've bought/installed at the same time with the failed ones. Be sure to put the disk number onto each drive first.

P.S.
If platters' surface is scratched, then there's indeed not much reason to proceed, considering the drives are from RAID array. But if heads are the only issue, then recovering the images could be a feasible option.

Should you decide to go with the data recovery company way and won't find a suitable domestic one, we'll be happy to help. We are not far from Finland and clients from Scandinavia are not uncommon for us in general.

_________________
• Remote RAID, NAS, SAN, VMware, DVR (CCTV), flash and tape recovery. Data recovery support.

Top

hbee

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: January 18th, 2015, 10:34

Joined: January 17th, 2015, 11:46
Posts: 3
Location: Finland

Thanks for your replies.

I'm willing to try professional DR as a second choice as two of the offlined drives are fine according to SMART. Shouldn't a good SMART data exclude pretty much all faults on the disk? I'm reasoning these two disks were offlined by Solaris due to some transient errors caused by other actually failed drives but I'm pretty much just guessing. And yes, I believe there were a total of 8 ST2000DM001 drives installed at the same time. Thus, there is still the risk of these failing also, but around 24 hours would be well enough for backing up the newest data to a separate system.

Also, actually one of the three 'dead' drives makes abnormal noises: spinup, two clicks and spindown. The two disks make only the normal spinup sound but are not detected. Should it be possible to 'hear' the crashed heads? I've understood the clicking could imply this but can head crash be also something you do not necessarily hear? Should the disk spin down if something catastrophic has happened (like s head crash)?

I live in Europe but what kind of price points and success rates should I expect if I proceed with the professional recovery? If the heads are indeed crashed it would obviously be exponentially harder than pretty much every other scenario. I understand ZFS can correct the data, assuming there is enough disks available so a 100% salvage would not be necessary. Maybe even one damaged platter due to a head crash could be acceptable in the ZFS pool resilvering?

Also, for some additional speculation. What would be the odds that three disks go dead within a couple weeks of each other assuming there was no faulty components outside the disks? Or can it be reasonable to assume that the disks were damaged by some other faulty component(s) (backplane, expander, psu etc.)?

Top

Dmitri

Post subject: Affordable data recovery in Finland / Scandinavia

Posted: January 18th, 2015, 11:42

Joined: February 8th, 2014, 8:08
Posts: 456
Location: Eastern Europe /recovering worldwide/

hbee wrote:

Also, actually one of the three 'dead' drives makes abnormal noises: spinup, two clicks and spindown ... Should the disk spin down if something catastrophic has happened (like s head crash)?

That's either dead heads or even worse. If a Seagate drive stops its motor after clicking, then situation is sad.
We have seen such jobs, where only heads were bad, but more frequently that means concentric scratch on the surface.
If a drive keeps spinning, then there's a chance for the job to be easier.

hbee wrote:

If the heads are indeed crashed it would obviously be exponentially harder than pretty much every other scenario.

If by a "head crash" you mean head touching the platter surface and leaving a concentric scratch, then I'd suggest to not recover such drive at all.
In Europe most likely you'll spend at least half of your budget on it, while recovery results are likely to be scarce and useless for your situation.

hbee wrote:

I understand ZFS can correct the data, assuming there is enough disks available so a 100% salvage would not be necessary.

I'd recommend to do that on clones only. Should something go wrong with this scenario, it's very unlikely that someone would agree to work with the outcome, unless probably you'll offer them a sum with four zeros in it.

hbee wrote:

Also, for some additional speculation. What would be the odds that three disks go dead within a couple weeks of each other assuming there was no faulty components outside the disks?

If I got your question right, that's possible.
Many people buy drives of the same model for their RAID / NAS, etc. and at the same time (so they're frequently from the same batch). It's not uncommon for us to see such drives to fail one right after another.
That's why I suggested you to check other drives above.

hbee wrote:

I live in Europe but what kind of price points and success rates should I expect if I proceed with the professional recovery?

Contact the local companies to find out. Just don't let any of them to open the drive(-s), as afterwards nearly every company will charge more for a previously opened drive.

Our prices should be definitely lower than average European ones and diagnostics is free, that's why I offered to consider us as an option.

P.S.
Also if during your search you'll get a price for those ST2000DM001's, which would appear surprisingly low to you, then I'd recommend to stay away from that place. Quite possibly they either don't know what they say or accepting the job out of "why not to try?" consideration.

_________________
• Remote RAID, NAS, SAN, VMware, DVR (CCTV), flash and tape recovery. Data recovery support.

Top

hbee

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: February 15th, 2015, 10:48

Joined: January 17th, 2015, 11:46
Posts: 3
Location: Finland

A small update regarding this.

The disks were able to be cloned quite successfully via the SATA connector using some specialized software by a local DR company. One disk was cloned for about 70% until one of the heads gave up. Second disk was cloned for '99%'. 3rd disk was 'dead' according to DR company and 4th does not even spinup.

Total cost for those two clones was approximately 450 euros. Apparently the root cause for the problems was too high temperature which caused some software issues and thus rendering disks unusable.

Top

mr_spokk

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: February 15th, 2015, 11:16

Joined: May 21st, 2007, 16:10
Posts: 1592
Location: Gothenburg/ Sweden

Hi, Sorry to say but the biggest problem was rather the drives themselves (DM series) then heat.
The fourth drive have probably pcb problem, so a compatible pcb + rom transfer might fix that one. And a good pcb you already have on the other good ones (so you don't need to find any), and the last will need at least one head replacement (maybe more, depends on how bad the platter is).

Bosse

_________________
Rescue IT Datarecovery service Sweden
Rescue IT Dataräddning Göteborg AB
http://www.rescue-it.se

Top

pcimage

Post subject: Re: Four (!) faulty ST2000DM001 (7200.14) and a SAS expander

Posted: February 15th, 2015, 13:04

Joined: November 29th, 2006, 10:08
Posts: 7865
Location: UK

mr_spokk wrote:

Agreed, these DM series drives seem to be of very poor quality.

Seem to be particularly susceptible to head crashes and almost always suffer from media damage, often rendering them unrecoverable :-(

_________________
PC Image Data Recovery
http://www.pcimage.co.uk

New!! HDD-PCB.COM for all your PCB and donor HDD requirements!

Top

Page 1 of 1

[ 9 posts ]

Main » Forums home » Conventional hard drives

All times are UTC - 5 hours [ DST ]

Who is online

Users browsing this forum: No registered users and 59 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum