MultiDrive – free backup, clone & wipe disk utility from Atola Technology

All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 8:21 
Offline

Joined: September 9th, 2011, 5:02
Posts: 5
Location: Birmingham, UK
Hi,

Like so many others here, I'm also experiencing some issues with one of my hard drives & have a few questions.
Firstly a little history:

After rebooting My 4 disk Raid5 (due to not being able to connect to the network shares). The NAS (Thecus n4400plus) reported that there was no RAID array, but when I checked it reported all the drives were in a healthy
state. I didn't make any attempt to recreate the array as didn't want to overwrite the drives. I raised a support ticket with Thecus & was told that that the array lost 2 disks, which is why it's prompted me to create a new array. They asked that I backup the disks before they attempt to rebuild / correct the array & this is where the troubles begin.

The drives in the array are 3*Seagate ST341000340AS (SD1A) & 1*Seagate ST341000333AS - This was a replacement drive for when one of the ST341000340AS had the infamous SD15 firmware issue. (Wish I had found this forum back then)!

I have so far cloned 1 drive (1 of the ST341000340AS) successfully to a brand new 1Tb drive using ddrescue on the knoppix live cd. (I ran 'lshw -C disk' & 'fdisk -l' to ensure I got the source & destination details correct.) & so far so good.

The 2nd (ST341000340AS) drive started off copying, but then ddrescue just continually reported errors on every single block after a few hours or so I cancelled ddrescue & re-ran the 'lshw -C disk' command, but this time it only reported 1 disk, the new 1TB drive. I thought the older hard drive may have died. I changed SATA cables & even tried using a jumper to limit it the the slower speed. The drive appeared when I booted up again so I started the cloning process again. When I checked on it's progress after about 10 minutes, I saw the same input / output error. I left it going, assuming that there must be a number of bad blocks. After another few hours later & no change. The drive was also cold to the touch. I have now switched off & removed the drive & am concentrating on cloning the other disks.

I wanted to know if anyone has any ideas on why the drive appears to turn itself off after a period of activity. There's no noise or clicking so guessing the heads etc are still ok. I have yet to check the underside of the logic board, but I suspect there could be a blown component?

3 of the drives from the array are identical, (they were all purchased at the same time from the same supplier & have the same firmware, model, part & revision numbers), so if I managed to clone the others successfully (have already done 1), would it be worth the risk in swapping the logic boards & Would the memory (EEPROM?) need to be swapped? (if it does, then I would rather someone else does this as I'm not confident in my soldering skills).

Should I attempt to remove the logic board & clean the contacts with a pencil eraser?

Shall I just take a good quality picture of the board & post it here?

Is it worth trying to clone this drive again with a fan keeping it cool?

I'm guessing if the logic board is fine, then the problem will be internal, motor, bearings etc & will need to be
looked at by a specialist. I also presume that the data (or at least the majority of it), should stil be ok.

Would a platter swap - performed by a professional be an option, if the board turns out to be ok?

If I can get at least 3 of the drives cloned correctly, then the array can be rebuilt. Obviously it would be nice to have all 4, though.

Sorry for all the questions, but just wanted to get some opinions & try to see what options I have for restoring. I do have backups, but typically some of the files hadn't been backed up yet & some others were not backed up at all, as I decided I could risk loosing them (due to not having enough backup storage space available). I have now purchased a new backup drive as now I am facing potentially loosing some data I've realised I'd prefer to save everything.

Thanks in advance.


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 9:08 
Offline

Joined: May 6th, 2008, 22:53
Posts: 2138
Location: England
craigey wrote:
They asked that I backup the disks before they attempt to rebuild / correct the array & this is where the troubles begin.

Actually I expect the problems began earlier, when one or both of the drives which Thecus tell you were detected as having failed, actually did so (unlikely they both failed at the same time). In your 4-drive RAID 5, were all drives data drives, or was it a 3-drive array with a hot spare?

craigey wrote:
The 2nd (ST341000340AS) drive started off copying, but then ddrescue just continually reported errors on every single block after a few hours or so

This is a common failure mode of some Seagate drives inc the 7200.11 due to internal problems (a look at the raw SMART data from such a drive will typically show a common signature). The drive has to be power-cycled before it will respond again - but doing that also puts extra strain on the drive, and the cycle then repeats when you start reading from it again.

DR companies using specialised cloning equipment with extra hardware control (e.g. DDI - Deepspar Disk Imager) have been able to clone drives behaving in this way, when users have not been able to do so themselves using ddrescue, due to the limitations of software-only recovery tools like ddrescue.

craigey wrote:
The drive appeared when I booted up again

As you see, the drive started to respond again after it was power-cycled. :)

IMHO to get out of this situation, you need to use the services of a reputable DR company with DDI cloning equipment and the skill & experience with these drives to use that cloning equipment for this drive behaviour. If you want recommendations for reputable companies in the UK, just ask the folks here (as a start for you, you could contact member pcimage who is in Peterborough).

While the drive is still partly readable, then the costs from a DR company will probably be lower than if you kill the drive completely by repeated retries and power-cycles, thereby requiring cleanroom work. Therefore I suggest not to continue further recovery attempts yourself, as it is very unlikely that you'll be successful - unless you decide that the data is not worth anything, and that you don't mind if you make the situation unrecoverable (or at the very least, much more expensive).

Good luck!


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 9:42 
Offline

Joined: September 9th, 2011, 5:02
Posts: 5
Location: Birmingham, UK
Thanks for your advice. I will leave the drive alone as I don't want to risk the data.

In the 4 disk RAID5 all the disks were data/parity drives. There wasn't a hot spare.

Whilst I understand that it's perfectly possible that 2 drives failed over a longer period of time then I believe the failure to have occured. The Thecus support team reported the exact same reason when I lost one of my drives due to the Seagate SD15 Firmware issue. Only the 1 drive was replaced with a new drive, the others just had the firmware update & the array came back. I'm quite sure that only 1 drive is acting up, but I may of course be prooved wrong.

Thanks again for the info, I will contact pcimage to see about getting this drive cloned, but I will attempt to clone the other drives first. Would rather jsut send 1 parcel if it does turn out that 2 drives are bad.

Will keep the post updated in case it helps anyone else out.


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 10:19 
Offline

Joined: May 6th, 2008, 22:53
Posts: 2138
Location: England
craigey wrote:
Thanks for your advice. I will leave the drive alone as I don't want to risk the data.

OK :)

craigey wrote:
In the 4 disk RAID5 all the disks were data/parity drives. There wasn't a hot spare.

Thanks for the info. At least this avoids any uncertainty about whether any reconstruction had occurred to a hot spare...

craigey wrote:
I'm quite sure that only 1 drive is acting up, but I may of course be prooved wrong.

Which is not what Thecus told you (according to your earlier post). I don't know (or care :) ) who is right, but if you've only tried to clone 2 of the 4 drives (and had a problem with 1 of those 2), then the status of the other 2 are just "unknown" currently, as I understand your posts.

The only reason I mention this, is that Thecus claim that 2 of the drives were detected as having "failed" (bearing in mind that there are many shades of grey in what a RAID controller decides is a faulty drive). That would fit with the RAID 5 array no longer being able to operate. You are now saying you believe Thecus to be wrong, and for only 1 drive to have "failed" - but that would not explain why a RAID 5 array was unable to continuing operating, of course.

So either two drives did have problems, as your report Thecus tech support told you (although that doesn't necesarily mean that such drives can't be cloned, when using the right equipment); or else if you're right and only one drive was affected, but the RAID 5 was still unusable, then you have a problem with the RAID 5 functionality on that array. See what I mean?

My guess is that there were problems (not necessarily at the same time or to the same severity) on two of the drives. If that guess is correct, you'll need to be careful about which 3 of the 4 drives (or their clones) you (or Thecus tech support) attempt to use the data from, if you're going to take the risks of DIY reconstruction yourself. Don't assume that all 4 drives hold synchronised, up-to-date data.

You may want / need to keep multiple copies of the data from each disk, depending on how you intend to try to reconstruct it - in case that process changes the data (or changes the RAID controller metadata) on the disks (as it typically does).

craigey wrote:
Thanks again for the info, I will contact pcimage to see about getting this drive cloned, but I will attempt to clone the other drives first. Would rather jsut send 1 parcel if it does turn out that 2 drives are bad.

Understood. You may also want to discuss asking them to do the whole reconstruction job, to reduce the risk of errors in recovering the original RAID - what if you only ask for one drive to be cloned, for example, and then you (or Thecus tech support) make a mistake doing the reconstruction and don't manage to recover your data?

As long as you have enough readable clones of your data, then you should be OK reverting back to pre-change data to try something else. However in my job (which is niot DR), I've taken clones of disks as backups before RAID controller firmware testing, for example, and later found that when I needed to read them, one of those clones itself had developed unreadable sectors (though I managed to read it in the end). My point is that a clone of a disk is not guaranteed to be readable when you need it to be - been there, done that. :) Depending on the importance of the data, and your assessment of luck and risk, you may want to consider your options.

As I said before, good luck!


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 14:41 
Offline

Joined: September 9th, 2011, 5:02
Posts: 5
Location: Birmingham, UK
Cheers, I understand what you mean regarding the 2 types of failures. I was thinking more in terms of drives no longer responding, but what you say makes much more sense.

Anyway it looks like I was prooved wrong!
The 3rd drive cloned almost successfully, it had 16 errors shown on ddrescue. I'm not sure how much this would effect the array. There wouldn't have been a huge amount of activity on the array. Most of the writes are during torrent downloads & if these files are corrupted, the torrent can re-download the missing parts.

With 2 out of 4 drives reporting I/O errors during cloning. I think I may just speak to the guys at PCImage & see what they can do.


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 17:47 
Offline

Joined: September 9th, 2011, 5:02
Posts: 5
Location: Birmingham, UK
I removed the logic board on the drive that wouldn't remain stay running & I can't seem to find any damaged / burnt areas, so I doubt that swapping the board would do anything. I think this drive will definately need some professional (or at least not amateur) recovery. As the drives are part of the RAID I will be speaking to the professionals about them rebuilding the array.


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 9th, 2011, 18:17 
Offline

Joined: May 6th, 2008, 22:53
Posts: 2138
Location: England
craigey wrote:
I removed the logic board on the drive that wouldn't remain stay running & I can't seem to find any damaged / burnt areas

That's expected for the specific symptoms you describe. In my experience such a drive will usually stay running, if you don't try to read from it. :) It's not a PCB problem which causes those specific symptoms.

Now that you've found 2 drives with problems (although apparently different symptoms), it's easier to understand how the RAID controller gave up on that RAID 5 volume. It's typical for RAID controllers to have conditions where they decide a drive has "failed", yet the drive is not completely dead. That may fit with the "3rd drive" that you mentioned. It wouldn't surprise me if that "3rd drive" is in the early stages of degrading into the state, where it will then behave like the "2nd drive".

craigey wrote:
I think this drive will definately need some professional (or at least not amateur) recovery. As the drives are part of the RAID I will be speaking to the professionals about them rebuilding the array.

That sounds like a plan. :) Good luck :)


Top
 Profile  
 
 Post subject: Re: Seagate Barracuda 7200.11 (ST341000340AS)
PostPosted: September 11th, 2011, 14:32 
Offline

Joined: September 9th, 2011, 5:02
Posts: 5
Location: Birmingham, UK
Ok, well I spoke to pcimage & provided some information about the issue. I was told that for the to reconstruct the RAID & recover the data even after supplying my own media to clone to & backup would cost about £800+
I understand that there could potentially be a lot of work involved, but there was no way I could afford that, so decided to see what I could do with the 3 cloned drives - even if 1 of them had a few bad blocks, (there was always the recovery option for the original drives).

My thinking was that if the 1 drive that I couldn't read from had dies well before the other, then I must have unintentionaly been using the array in a degraded state, but that also meant that the data I hadn't backed up wouldn't be on that disk. If I could mount the array with 3 drives & put it back into a degraded state I had a 50/50 chance that nearly all the data would be ok.

I used dd_rhelp to attempt to recover even more data from the bad sectors of the disk that had 16 bad sectors & managed to reduce the number of errors to 3. I did a lot of googling regarding the raid used in the Thecus setup & found that it was a fakeraid, using LVM & it should be possible to mount it by just using mdadm to scan & assemble the array.

I attempted it, but the 3rd couldn't be added to the array due to the bad sectors / different superblock. I did some more googling & found that I should be able to force the array to mount using 3 disks.

Code:
mdadm --assemble --run --force /dev/md1 /dev/sda /dev/sdb /dev/sdc


I then mounted it & found I could see all my files & all the data since it died at the beginning of Sept. I quickly plugged in the 3TB usb drive & started copying the data. It's still got a few hours to go, but so far everything looks fine.

Once I have the backup. I will place the 3 cloned drives into the Thecus box & get thecus support to force mount it if they can, before adding in a new drive to restore the redundancy.

Will also be ensuring I have a better backup procedure in place, rather than the ad-hoc method I was using previously.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 52 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group