hello pclab,
thanks for your concern and reply that too on sunday!
i will answer your questions one by one.
do i need the data? absolutely not. this is more or less a drive where i run too many vm and i mostly use it to build firmware. and the main git server is elsewhere in the lan, so i push pull little from git on this HDD.
moving to new a new HDD? aah!
here is an update i replugged in the device, and my post 2 is still pending and waiting approval and what all i said there, is somehow getting automagically fixed.
viz. now smartctl shows me this.....
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen,
http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6
Device Model: ST9500325AS
Serial Number: XXXXXXXX
LU WWN Device Id: X XXXXXX XXXXXXXXXX
Firmware Version: 0021LVM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Jan 25 21:23:31 2015 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 142) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 094 094 034 Pre-fail Always - 150679552
3 Spin_Up_Time 0x0002 099 097 000 Old_age Always - 0
4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 483
5 Reallocated_Sector_Ct 0x0033 001 001 036 Pre-fail Always FAILING_NOW 2047
7 Seek_Error_Rate 0x000f 063 060 030 Pre-fail Always - 2015290
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 253007933472972
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 300
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 098 098 099 Pre-fail Always FAILING_NOW 2
187 Reported_Uncorrect 0x0032 100 098 000 Old_age Always - 8590065668
188 Command_Timeout 0x0032 086 086 000 Old_age Always - 12885098510
189 High_Fly_Writes 0x003a 070 057 000 Old_age Always - 505217054
190 Airflow_Temperature_Cel 0x0022 001 001 045 Old_age Always FAILING_NOW 36 (Min/Max 34/36)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 7
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 139
193 Load_Cycle_Count 0x0032 030 099 000 Old_age Always - 30
194 Temperature_Celsius 0x0022 054 053 000 Old_age Always - 12288 (0 18 8 251)
195 Hardware_ECC_Recovered 0x001a 056 044 000 Old_age Always - 63299340572277
196 Reallocated_Event_Count 0x0033 100 001 030 Pre-fail Always In_the_past 0
197 Current_Pending_Sector 0x0012 100 001 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 --- Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 48939 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 48939 occurred at disk power-on lifetime: 202 hours (8 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 71 04 91 00 32 e0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
a1 00 00 00 00 00 a0 00 00:36:39.461 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:39.460 IDENTIFY DEVICE
00 00 00 00 00 00 00 ff 00:36:39.148 NOP [Abort queued commands]
a1 00 00 00 00 00 a0 00 00:36:34.141 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:34.140 IDENTIFY DEVICE
Error 48938 occurred at disk power-on lifetime: 202 hours (8 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 71 04 91 00 32 e0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 00 00:36:39.460 IDENTIFY DEVICE
00 00 00 00 00 00 00 ff 00:36:39.148 NOP [Abort queued commands]
a1 00 00 00 00 00 a0 00 00:36:34.141 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:34.140 IDENTIFY DEVICE
00 00 00 00 00 00 00 ff 00:36:33.828 NOP [Abort queued commands]
Error 48937 occurred at disk power-on lifetime: 202 hours (8 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 71 04 91 00 32 e0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
a1 00 00 00 00 00 a0 00 00:36:34.141 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:34.140 IDENTIFY DEVICE
00 00 00 00 00 00 00 ff 00:36:33.828 NOP [Abort queued commands]
a1 00 00 00 00 00 a0 00 00:36:33.789 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:33.746 IDENTIFY DEVICE
Error 48936 occurred at disk power-on lifetime: 202 hours (8 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 71 04 91 00 32 e0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 00 00:36:34.140 IDENTIFY DEVICE
00 00 00 00 00 00 00 ff 00:36:33.828 NOP [Abort queued commands]
a1 00 00 00 00 00 a0 00 00:36:33.789 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:33.746 IDENTIFY DEVICE
2f 00 01 10 00 00 a0 00 00:36:33.030 READ LOG EXT
Error 48935 occurred at disk power-on lifetime: 202 hours (8 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 71 04 91 00 32 e0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
a1 00 00 00 00 00 a0 00 00:36:33.789 IDENTIFY PACKET DEVICE
ec 00 00 00 00 00 a0 00 00:36:33.746 IDENTIFY DEVICE
2f 00 01 10 00 00 a0 00 00:36:33.030 READ LOG EXT
61 00 00 ff ff ff 4f 00 00:36:33.029 WRITE FPDMA QUEUED
61 00 00 ff ff ff 4f 00 00:36:29.868 WRITE FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short captive Completed without error 00% 78 -
# 2 Short captive Completed without error 00% 78 -
# 3 Short captive Completed without error 00% 77 -
# 4 Short captive Completed without error 00% 77 -
# 5 Short captive Completed without error 00% 74 -
# 6 Extended offline Interrupted (host reset) 00% 74 -
# 7 Short offline Completed without error 00% 74 -
# 8 Short offline Completed without error 00% 2 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[root@development-box] [Sun Jan 25, 21:23:32] [20]
[/home/live-user ]
#
whoa i am not seeing the lba errors! sector errors are gone. i guess its recreating the g-list, hmmmmmmm may be. i am not a pro in this domain you guys are so you guys understand this situation in way much more detail than i do perhaps.
now in the previous post i mentioned totally 12 beeps? 1 shortest and 11 short beeps. and now its gone and i dont hear any beeps. having said that, now it even mkfs.ext4 also way faster w/o spitting error every second in dmesg and thus ddos my syslog!
# time { mkfs.ext4 -v /dev/sdc1 ;}
mke2fs 1.42.5 (29-Jul-2012)
fs_types for mke2fs.conf resolution: 'ext4'
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
7815168 inodes, 31250000 blocks
1562500 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
954 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
real 4m47.616s
user 0m0.760s
sys 0m2.184s
# fdisk -l -G /dev/sdc
GNU Fdisk 1.2.4
Copyright (C) 1998 - 2006 Free Software Foundation, Inc.
This program is free software, covered by the GNU General Public License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Disk /dev/sdc: 500 GB, 500105249280 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 1 15562 125001733 83 Linux
/dev/sdc2 15562 31124 125001765 83 Linux
/dev/sdc3 31124 46686 125001765 83 Linux
/dev/sdc4 46686 60801 113378737 83 Linux
[root@development-box] [Sun Jan 25, 21:40:51] [23]
[/home/live-user ]
and when i ran mkfs.ext4 it ran with a kill signal and aborted mkfs.ext4.
# time { mkfs.ext4 -v /dev/sdc1 && mkfs.ext4 -v /dev/sdc2 && mkfs.ext4 -v /dev/sdc3 && mkfs.ext4 -v /dev/sdc4; }
mke2fs 1.42.5 (29-Jul-2012)
fs_types for mke2fs.conf resolution: 'ext4'
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
7815168 inodes, 31250000 blocks
1562500 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
954 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
ext2fs_mkdir: Attempt to read block from filesystem resulted in short read while creating root dir
real 2m27.563s
user 0m0.304s
sys 0m1.620s
and the dmesg log again sparks up with block error
[23900.107422] lost page write due to I/O error on sdc1
[23900.107436] Buffer I/O error on device sdc1, logical block 30937652
[23900.107439] lost page write due to I/O error on sdc1
[23900.107448] Buffer I/O error on device sdc1, logical block 30937653
[23900.107451] lost page write due to I/O error on sdc1
[23900.107458] Buffer I/O error on device sdc1, logical block 30937654
[23900.107461] lost page write due to I/O error on sdc1
[23900.107468] Buffer I/O error on device sdc1, logical block 30937655
[23900.107471] lost page write due to I/O error on sdc1
[23900.114074] sd 13:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23900.114082] sd 13:0:0:0: [sdc] Sense Key : Illegal Request [current]
[23900.114088] Info fld=0x0
[23900.114091] sd 13:0:0:0: [sdc] Add. Sense: Logical block address out of range
[23900.114097] sd 13:0:0:0: [sdc] CDB: Write(10): 2a 00 0e c0 92 9f 00 00 f0 00
[23900.114109] end_request: I/O error, dev sdc, sector 247501471
[23900.120829] sd 13:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23900.120836] sd 13:0:0:0: [sdc] Sense Key : Illegal Request [current]
[23900.120842] Info fld=0x0
[23900.120845] sd 13:0:0:0: [sdc] Add. Sense: Logical block address out of range
[23900.120852] sd 13:0:0:0: [sdc] CDB: Write(10): 2a 00 0e c0 93 8f 00 00 f0 00
[23900.120863] end_request: I/O error, dev sdc, sector 247501711
[23900.127949] sd 13:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[23900.127956] sd 13:0:0:0: [sdc] Sense Key : Illegal Request [current]
and why not getting rid of it? well FOSS mentality, nothing is waste and lets squeeze the last tip out of the device till its fully non functional.
looks like its healing and then failing? what could be the reason?
did i test the drive when i got it in seconds from the store? yes i did. i did run all tests and smart test too. and it passed with flying colors. the thing is i swap this drive between many laptops either as sata or usb drive. and many times when i am doing openwrt and coreboot development work? and i do happen to screw up this is drive used on those machies that time.
so this is like a caretaker drive and being used for mostly testing and R&D. so i dont wish to throw it away, but fix it and use it till it chokes and finally RIP's.
what i am failing to understand is? what exactly happened? why it started to fail? why the smart tripped? because of R&D in the laptop with may be varying signals in different test laptops? i means voltage power and signals?
and how come a wrong set of commands issued in a complete out of sync fashion got the drive almost dead. and why did the drive stop beeping 1 shortest and 11 shorter beeps? and then recreated the tables. and then when i ran the mkfs.ext4 it started to fail again. what am i missing or failing to register/understand?
may be its too early to dump this drive i will test more and use it a little more. its the drive has reached a point of no return.
but yes if someone wishes to give me the original firmware pref in .lod format and also some help and tips as how to use with hdparm will be nice.
some testing is always nice. and if things work outnice i will write an article such that another user suffers and panics less.
thanks!
-paul