Hi,
I have a Western Digital Elements / My Passport (USB, AF) 2.5'' external USB drive I use for backups.
It was working fine before. I finished my backup, unmounted the disk, unplugged the usb cable, put in my backback, and hit the road. When I got home and went to use the disk again I noticed the performance issues.
The drive did not suffer any bump or drop. I'm wondering if when I unplugged the cable the arm/heads weren't parked yet and the tiniest of vibrations during travel were enough to damage the drive.
I've been recovering my data using ddrescue at a file object level. Just to be clear, I'm not copying the partition or disk.I'm not sure if this is a good idea or not but there are some reasons:
- I don't need to access the whole disk, just about 50% of it. So less wear and tear?
- The disk is software encrypted. So I'm afraid if I end up with missing or bad blocks in my output image I won't be able to determine which files are corrupt.
- I don't have enough disk space to create an image. Although I've now bought another drive and its on its way.
Recovery speed is excruciatingly slow. On average files are being copied at 100KB/s.
I've noticed that for the same file speed can burst to the normal range of ~50MB/s but then at some point it grinds to a halt. This seems to happens for all big files (think >300 MB).
Is it to be expected to encounter this behavior for all files in the disk? Or maybe just a subset of them?
I'm wondering about the root cause problem and its implications.For a period of time I could hear some clicking at about a rate of 1 click per second. Some hours later it got louder but was getting half the clicks. Right now no more clicking at all.
I ran smartmoontools when I first noticed the disk speed issues:
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 134 134 051 - 2175
3 Spin_Up_Time POS--K 189 185 021 - 5508
4 Start_Stop_Count -O--CK 100 100 000 - 109
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 179
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 70
192 Power-Off_Retract_Count -O--CK 200 200 000 - 64
193 Load_Cycle_Count -O--CK 200 200 000 - 2287
194 Temperature_Celsius -O---K 129 098 000 - 23
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
And now
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 001 001 051 NOW 65535
3 Spin_Up_Time POS--K 192 185 021 - 5375
4 Start_Stop_Count -O--CK 100 100 000 - 110
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 213
10 Spin_Retry_Count -O--CK 100 100 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 70
192 Power-Off_Retract_Count -O--CK 200 200 000 - 64
193 Load_Cycle_Count -O--CK 200 200 000 - 2290
194 Temperature_Celsius -O---K 111 098 000 - 41
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 5
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0
From wikipedia:
Quote:
Raw_Read_Error_Rate - Stores data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. For some drives, this number may increase during normal operation without necessarily signifying errors.
Current_Pending_Sector - Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased.
From the userspace side things have been fine.
ddrescue hasn't reported any error. The recovered files seem fine too. But from SMART Current_Pending_Sector it looks some of my file(s) I've recovered are no longer intact!?The disk has been in use for more than 24h now trying to recover the data, and temperature has almost doubled. But still within operational levels. Should I power it off and let it cool off?
I've already copied the most important files.
But at this speed it will take more than a month 24h a day. Which is just not feasible. I'm guessing the drive will die before that time too.
I'm ok with the drive dying. Very annoying but nothing critical.
I'll be rescuing files based on their importance level.
Is there any Linux software I can use to check the disk heads health? I looked into hdparm but couldn't figure out a way.
Or something that lets me understand the root cause and get an idea how much time the drive has? I mean, how much data should I be able to recover?
Any other comments or suggestions?
Thank you for reading.