Data recovery and disk repair questions and discussions related to old-fashioned SATA, SAS, SCSI, IDE, MFM hard drives - any type of storage device that has moving parts
March 17th, 2012, 12:14
Have a newly purchased used Dell 2850 (with DRAC disabled). Set up the two 146 GB drives as RAID 1, built a test Win 2003, and then
used Symantec System Recovery 2011 to restore the Exchange from a white box server that is failing. Things looked good. The RAID
controller has been flashed to the last Dell version BIOS H435 Apr 23, 2008. Only goofy thing is a rebooted from a bug check in the
event viewer but I'm seriously doubting this is causing anything at the RAID level.
With the power off I pulled out physical drive #1 and attempted a boot. Get a message that SCSI ID not responding Channel-0: 1. The
server then hangs at the RAID controller and wants me to use <CTRL><M> to run controller configuration utility. I'm mostly an HP
server guy and I know that I'd just get an error message and can continue.
It's Saturday and I'd like to install tomorrow but I can't do it unless I get this resolved. HELP...
March 17th, 2012, 12:35
The raid controller most likely wants you to mark the drive as failed I assume drive 0 is still in the same slot. Not sure why you pulled the drive in the first place ?
March 17th, 2012, 12:48
I'll PM you with my info and I'll help by phone...
March 19th, 2012, 10:09
Interesting results.
Found that at seemingly random intervals I can reboot thru CTRL-ALT-DEL, get a message "1 Logical Drive(s) Degraded" - and then the system boots into Windows 2003. A reboot and <CTRL><M> and the controller then shows the drive as "FAIL". I checked the read patrol and one server was set for 4:00AM this morning, but even after it ran the drive was still on-line (the server had been running at the <CTRL><M> prompt for over 6 hours). If the server hangs at the <CTRL><M> prompt the drive shows as "ONLIN".
One other interesting thing - so far this only "randomly" seems to work if the drive that is pulled is #1, never with #0 (so far).
It looks like the controller does not well handle the situation where the drive is removed - which is equivalent to a complete failure (realizing that most failed drives are actually platters, motors, start circuits, etc.) and not the entire drive.
Anyone got any guidance on this?
March 19th, 2012, 16:17
Maybe your not giving the server time to resync the drives between pulls ? I have always added another line to the boot ini for the 2nd drive as windows reads it and assumes drive is in 0 and if zero is out it wont automatically look in 1 to boot the added line allows you to choose the 2nd drive asma boot used it for years in IBM raid1 setups
March 20th, 2012, 8:10
I've waited even overnight and have the same results - although I'm aware of the the resync problems especially with multiple marginal drives in a RAID 5. I'm supposing now that the PERC won't normally mark a drive "FAIL" unless it can see the electronics .
March 22nd, 2012, 13:40
It's very apparently a philosophy difference from what I'm accustomed to. The PERC works as expected if the drive is pulled when the power is on. I'm concluding that Dell assumes that if the drive is failed when the server boots to not mark it as bad in the controller - and of course the vast majority of real failures are when the server is running. Just like most boot failures are caused by the user playing with cables, trays, etc. Probably really handy when someone forgets to recable your large RAID 5 array.
Powered by phpBB © phpBB Group.