MultiDrive – free backup, clone & wipe disk utility from Atola Technology

All times are UTC - 5 hours [ DST ]


Switch to mobile style


Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: Bad disks or not??? I'm confused!
PostPosted: August 6th, 2009, 1:56 
Offline

Joined: August 6th, 2009, 1:44
Posts: 2
Location: Ontario
I've been playing with software raid (Raid 5) for some 2 weeks now. I've gone between ZFS setup with Freenas and now Linux software raid with mdadm. With ZFS I kept getting errors thrown at me, I wasn't sure what the deal was so in an effort to do some troubleshooting I moved to Linux software raid. No more errors but odd behaviour is still around.

here is example output of one of my drives
nas01:/# smartctl -a /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Warning! Drive Identity Structure error: invalid SMART checksum.
=== START OF INFORMATION SECTION ===
Device Model: ST3750528AS
Serial Number: 5VP0CM59
Firmware Version: CC34
User Capacity: 750,156,374,016 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Aug 6 01:49:00 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 600) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 144) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 85044048
3 Spin_Up_Time 0x0003 096 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 22
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2410597
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 301
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 22
183 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 097 097 000 Old_age Always - 3
190 Airflow_Temperature_Cel 0x0022 073 067 045 Old_age Always - 27 (Lifetime Min/Max 26/29)
194 Temperature_Celsius 0x0022 027 040 000 Old_age Always - 27 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 036 016 000 Old_age Always - 85044048
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 246741576188240
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 4147851726
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1190436351

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


----

With google I couldn't come up with anything (Except this forum) here is my hdparm output

nas01:/# hdparm -iI /dev/sdb

/dev/sdb:

Model=ST3750528AS , FwRev=CC34 , SerialNo= 5VP0CM59
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1465149168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 udma6
AdvancedPM=no WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-4,5,6,7

* signifies the current active mode


ATA device, with non-removable media
Model Number: ST3750528AS
Serial Number: 5VP0CM59
Firmware Revision: CC34
Transport: Serial
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1465149168
device size with M = 1024*1024: 715404 MBytes
device size with M = 1000*1000: 750156 MBytes (750 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Recommended acoustic management value: 254, current value: 0
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
SET_FEATURES required to spinup after power up
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* SATA-I signaling speed (1.5Gb/s)
* SATA-II signaling speed (3.0Gb/s)
* Native Command Queueing (NCQ)
* Phy event counters
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Long Sector Access (AC1)
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
124min for SECURITY ERASE UNIT. 124min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c50014414df2
NAA : 5
IEEE OUI : c50
Unique ID : 014414df2
Checksum: correct


HOWEVER for some of my devices the hdparm Checksum will say

Security:
Master password revision code = 65502
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
142min for SECURITY ERASE UNIT. 142min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c50013e8f1aa
NAA : 5
IEEE OUI : c50
Unique ID : 013e8f1aa
Checksum: incorrect (0xe0), expected 0x20

I have a total of 4 harddrives, 3 give that checksum error in smartctl and 2 show a bad checksum in hdparm. Are there any other tools I can use to check or diagnose? Are these numbers normal with the high Raw_Read_Error_Rate? Possible I have bad harddrives or could it be the raid/sata controller? (I'm not using it for raid just to connect the harddrives)

Computer info:

as01:/# cat proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) processor
stepping : 2
cpu MHz : 996.330
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 1995.14
clflush size : 32
power management:

nas01:/# free -m
total used free shared buffers cached
Mem: 789 781 8 0 2 732
-/+ buffers/cache: 46 743
Swap: 976 0 976


nas01:/# uname -a
Linux nas01 2.6.26-2-486 #1 Sun Jul 26 20:43:17 UTC 2009 i686 GNU/Linux


I'm running Debian +lenny

Thanks!


Top
 Profile  
 
 Post subject: Re: Bad disks or not??? I'm confused!
PostPosted: August 6th, 2009, 4:22 
Offline
User avatar

Joined: March 28th, 2008, 7:52
Posts: 1466
Location: Europe, Hungary
zeny,

If your kernel is 2.6.18+, then you can trust the RAID5 in kernel. ;)
This is the best raid ever as far as i know...
Use the bitmap architecture for better flexibility (mdadm -G --bitmap=internal /dev/mdX)

For hdd check, use only the smartmontool, and check for the 4 and the 197 numbered values only.
The smartmontool will show you the internal error logs as well if any problem occurs.
If you are still unsure, test the hdds with mhdd. (without linux)

And don't forget to check your hdd's fw version (the others), the SD15 is dangerous!
(the sdb have CC34, this is OK)

Janos


Top
 Profile  
 
 Post subject: Re: Bad disks or not??? I'm confused!
PostPosted: August 6th, 2009, 6:36 
Offline

Joined: August 6th, 2009, 1:44
Posts: 2
Location: Ontario
Interesting, thanks for the answer :D

I do have one drive on that firmware version

nas01:/# smartctl -a /dev/sdc
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Device Model: ST3750330AS
Serial Number: 5QK0EZCS
Firmware Version: SD15
User Capacity: 750,156,374,016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Aug 6 06:35:16 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


Is upgrading it a problem?


Top
 Profile  
 
 Post subject: Re: Bad disks or not??? I'm confused!
PostPosted: August 6th, 2009, 14:32 
Offline
User avatar

Joined: March 28th, 2008, 7:52
Posts: 1466
Location: Europe, Hungary
Yes, this is a self-bricking drive.
You need to update this fw.

Janos


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 80 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group