Switch to full style
Data recovery and disk repair questions and discussions related to old-fashioned SATA, SAS, SCSI, IDE, MFM hard drives - any type of storage device that has moving parts
Post a reply

"Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS drive

December 7th, 2011, 18:43

I have a Dell PowerEdge SC1435 server with 2 Seagate 146GB SAS hard disks attached to a SAS5/IR PCI-E card. The server had been running fine for 11 months since its last reboot. My hosting provider needed to move the cabinet it was housed in, so I powered it down remotely. It's a LAMP server with full specs below. My provider confirmed it had powered off, made the move and powered it back up. Since then, I've been getting "Device Not Available at HBA0" and "Hard Disk Error" every time I boot. The boot stops on the hard disk error.

My LAMP server is using software RAID. So in the SAS Adapter BIOS, I don't have any RAID arrays setup -- just 2 independent SAS disks.

I've done some diagnostics and trial-and-error testing with these results:

- SAS adapter BIOS utility (Ctrl-C) saw both disks. A test on the first disk failed immediately. A test on the second disk passed fine over the course of about 25 minutes.

- Dell diagnostics on the first disk failed on the Confidence Test & Drive Self Test with Error Code 4400:011A Target Not Ready. The same tests passed fine on the second disk.

- I have removed the SAS5/IR card and reseated/reconnected it. No change in error messages.

- I have removed both hard disks and re-installed/reconnected them. No change in error messages.

- I have disconnected the cable from the first drive but left the second drive connected. Don't get the "Device Not Available at HBA0" message, but *still* get the "Hard Disk Error" and no boot.

- I have switched the cables between the hard drives. Still get the "Hard Disk Error" and no boot.

- I have updated the Dell firmware for the Seagate drives with a bootable CD. The process failed with Error Code: 4400:281A Target Not Ready on the first drive, but passed on the second drive. No change in error messages.

- I have booted from CD/DVD with Fedora 10's install CD and with a Knoppix CD. Neither programs see any hard drives.

It appears the first drive failed and I can accept that. I'm less inclined to believe both drives failed on the same boot. What really stumps me, however, is that no diagnostics have indicated a failure on the 2nd drive, yet the server won't boot from it or boot into an OS that will recognize it -- even if I completely disconnect/remove the first drive.

If anyone has any suggestions on troubleshooting or root cause, I would love to try something before I purchase 2 replacement drives (minimum $70 apiece for a used drive, up to $$$ hundreds for new ones).

Thanks in advance for any help,

Chris

Server Specs:

1U Dell PowerEdge SC1435 running Fedora 10 Linux OS
(2) AMD Opteron 2212 2.0 GHz Dual-Core
24 GB RAM ECC DDR2 667
4 x 2GB and 4 x 4GB modules PC2-5300P DDR2-667 5-5-5 240-pin Registered DIMM w/ cmd/addr Parity, 1.8V
(2) 146GB Seagate Cheetah T10 Serial Attached SCSI (SAS) 10K RPM (ST3146755SS T107)
SAS5/IR PCI-E Controller Card
(2) Gig Ethernet
ATI RN50 PCI video controller with 16MB RAM
CD-RW/DVD-ROM
600-W Power supply

Re: "Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS d

December 7th, 2011, 19:45

What is the Version of the OS installed ? I assume this is a Linux Raid setup ? Did you run any of the MDM utility to check the raid status ?

Re: "Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS d

December 7th, 2011, 20:00

@bmandotcom: PM sent (although I don't think you can reply via PM).

Re: "Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS d

December 8th, 2011, 1:13

I have to give Vulcan a shout out for sharing his ideas that pointed me in the direction of the boot partition on the 2nd drive.

I removed the presumably bad first drive and booted from the Fedora install CD with just the good drive connected. That gave me access to a shell where I could see the hard drive partitions. Then it was a matter of reassembling the logical RAID device for the boot partition. I was able to do that with a few mdadm commands, mount the /boot partition and then reinstall the boot loader (GRUB).

I restarted the system and sure enough Fedora booted. All the partitions looked good.

So indeed the 2nd drive was fine, after all. I apparently just missed a step when I initially set up the RAID1 for the /boot partition -- so the 2nd drive would be bootable if the 1st drive failed.

I'm going to contact DELL to get a replacement for the 1st drive as I believe it's still under warranty (only 4 years old). That should get me back up and running at full capacity.

Thanks again, Vulcan!

Re: "Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS d

December 8th, 2011, 2:41

You can try our product SRT For SEAGATE SAS
Free demo download:
http://www.hydata.com/download/SRT-SEAG ... SAS-en.zip

Re: "Hard Disk Error" boot msg - Dell SC1435 / Seagate SAS d

December 8th, 2011, 13:20

@bmandotcom:

Thanks for the follow-up and for the "shout out". :) As I mentioned, you'd done all the work by that initial troubleshooting and presenting your results so clearly, meaning that I could spot the clues about the remaining problem.

Glad you're back up and running with that second disk :D
Post a reply