SSD drives is it worth to use in everyday life ?

CPUs tend to be the most reliable chips. Memory, AD & DA, mixed signal & RF, and power conversion components are much less reliable. With memory being the most unreliable of the whole gang.

fzabkar wrote:
Keatah wrote:Saving $0.05 USD on a NAND chip can mean millions in extra profit.

What percentage of SSD faults are the result of NAND flash failures?

AFAICT from various storage forums, including SalvationData's, the most common failure mode appears to involve corruption of the Flash Translation Layer. In such cases there is no actual failure in any physical component. Can anyone enlighten me if I'm wrong?

I believe it is a combination of the 2.

I have read strong rumours that silicon wafers that are not 100% at the Micron plant are sent to Spectek, like a budget manufacturing arm of the company or something. While I am not sure I believe it is as bad as it sounds, I have heard for years such things like the Celerons were tested and throttled to a lower GHz/MHz rating based on tests, storage tech of all types cut in size to chop out bad blocks such as HDDs labelled with geometry to suit of 750Gb when it is a 1TB actual hardware etc.

so that's the quality of the physical, so the other side of the coin - the quality of the software (FTL etc)

If you read the change log of the docs, then bad or immature coding can be a factor.

eg:

In Table 1 on page 10, added “Connect this pin with a 4.7uF capacitor to ground.” to
VREG “External Capacitor Pin”.

this was 2 years after the initial release. Does it mean that for 2 years, this cap was not know to be required so possibly causing issues with devices? maybe.

Re-order pin location due to floor plane issue

was this caught before or after a few thousand got n the wild?

To avoid crystal power noise affecting PHY, separating the power domain by pin 16 and 17

SM3257ENAAISP fix MAC compatibility issue

3257EN ISP fix NTFS format will mis-compare fail issue.

- a very popular and widespread controller.

SM3257ENAAISP fixed the system block have ECC fail and initial fail.

SM3257ENAAISP fixed 5V normal power cycling FAT block serial number error issue

SM3257ENAAISP fixed double mark bad block issue

The other thing is the speed of new devices. If you look at datasheets, not a lot get past initial release or v1.0.. by the time a defect in code is found, the device has a replacement. Sure the manufacturers are able to make better "rev 1.0" products due to experience, but it is like writing "hello world" in a brand new language ever single program you write.

top it off with more factors..
"acceptable failrates"
the sheer bad situations drives get into... flash disks broken, taught to swim, plugged and unplugged many times, at the mercy of different levels of quality of the mainboard, hard drives dropped, external drives with wrong power.. well meaning DIY

etc etc..

The sheer number of different parts to try and learn also is an issue, along with expensive tools, tools that don't work well, bad advice, etc etc

before anyone says most of that is off topic.. how many times have you heard that this drive doesn't work anymore, therefore it must be a crap drive (and it turns out to be something related to what Ive listed)

You DR gus are not going to be without work anytime soon. Sleep, now that a whole different thing. It is over rated anyway, Ive heard you can die in your sleep!

fzabkar wrote:AFAICT from various storage forums, including SalvationData's, the most common failure mode appears to involve corruption of the Flash Translation Layer. In such cases there is no actual failure in any physical component. Can anyone enlighten me if I'm wrong?

And what would be the cause of "Flash Translation Layer" corruption? Translation tables live on the very same NAND chips
The issue usually happens when you have writing problems of any kind

There are a few resources that mention some reasons why bits fail, so ISTM that where there is bits failing, there can also be bytes failing.. It seems "it is just the way it is"

I liked this paper: http://cyclicdesign.com/whitepapers/Cyclic_Design_NAND_ECC.pdf

and there is

http://www.t13.org/Documents/UploadedDocuments/docs2007/e07185r0-NAND_Error_Sources_and_Options_for_Reliable_Management.pdf

Reliable NAND management is increasingly complicated
–Bit error probabilities increase with shrinking geometries and MLC
Bit “disturbs” are inherent to the NAND architecture

Cells not being programmed received elevated voltage stress.

Charge collects on the floating gate causing cell to appear weakly programmed

and

http://users.ece.cmu.edu/~omutlu/pub/flash-error-patterns_date12.pdf

This paper examines the complex flash
errors that occur at 30-40nm flash technologies. We demonstrate
distinct error patterns, such as cycle-dependency, location dependency
and value-dependency, for various types of flash
operations. We analyse the discovered error patterns and explain
why they exist from a circuit and device standpoint.

and

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4558857&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4558857

Bit error rate in NAND Flash memories

Published in: Reliability Physics Symposium, 2008. IRPS 2008. IEEE International

Date of Conference: April 27 2008-May 1 2008

..NAND flash memories have bit errors that are corrected by error-correction codes (ECC). We present raw error data from multi-level-cell devices from four manufacturers, identify the root-cause mechanisms, and estimate the resulting uncorrectable bit error rates (UBER). Write, retention, and read-disturb errors all contribute...

Seems mfgs are going to greater and greater lengths to have the controller cover-up internal errors, which are occurring at smaller and smaller geometries. When is the point of no return matched?

Keatah wrote:Seems mfgs are going to greater and greater lengths to have the controller cover-up internal errors, which are occurring at smaller and smaller geometries. When is the point of no return matched?

I think only when the word has spread so much that people stop buying it, which unfortunately is difficult to actually happen.

We have seen some of these trends in HDDs, and people still buy them.

Doomer wrote:
fzabkar wrote:AFAICT from various storage forums, including SalvationData's, the most common failure mode appears to involve corruption of the Flash Translation Layer. In such cases there is no actual failure in any physical component. Can anyone enlighten me if I'm wrong?

And what would be the cause of "Flash Translation Layer" corruption? Translation tables live on the very same NAND chips
The issue usually happens when you have writing problems of any kind

I suspect that much of this corruption occurs as a consequence of unexpected power loss. Many people in various forums are reporting that SSDs fail after shutdown rather than in-flight. In fact I saw one Intel user intentionally provoke failures in his SSD simply by power cycling it. I'm wondering if this is the reason that several manufacturers are now incorporating backup capacitors. Moreover, I wonder if these capacitors do in fact reduce the failure rates of SSDs.

I have proposed elsewhere that OCZ's high failure rates, compared with similar designs from their competitors, might be due to a lower Vio supply voltage (+2.8V), but I haven't been able to determine whether this reduced voltage was by design, or whether it was a genuine fault in one specific case. If true, then one consequence of a reduced Vio would be much less time for emergency "housekeeping" after a power loss event, which in turn would make the FTL much more vulnerable to corruption.

HaQue wrote:There are a few resources that mention some reasons why bits fail, so ISTM that where there is bits failing, there can also be bytes failing.. It seems "it is just the way it is"

I would think that bit failures would be unlikely to cause an SSD to die completely, unless they impacted on the FTL, but even then I would expect that there would some redundancy that would allow the drive to recover in such cases.

fzabkar wrote:I suspect that much of this corruption occurs as a consequence of unexpected power loss.

That seems like a plausible cause but I wouldn't rule out NAND degradation as well

On the original question of suitability for everyday use, IMHO I would think that SSDs (or CF cards, etc) would be the only logical choice for storage in mobile applications, yet obviously manufacturers still use HDDs in laptops, automotive applications, and iThings, among others. Obviously it's a cost/capacity issue, so it must be that mobile HDDs are sufficiently reliable so that the increased risk of physical damage is not a major consideration.

Doomer wrote:
fzabkar wrote:(128 gigabytes x 3000 cycles) / (10 years) = 105 gigabytes per day

FYI 105GB/day is not much

Most users in other storage forums seem to be utilising SSDs as OS drives in combination with spinning discs for data storage. Assuming that the spinner is used for paging, then ISTM that 100GB/day is huge.

That said, and if I have correctly interpreted Samsung's Wear Leveling Count attribute, then the following thread suggests that the OP has racked up 69GB/day over a period of 3 years:
http://www.tomshardware.com/forum/id-16 ... tatus.html

I don't know why it seems huge to you
It doesn't look huge to me
100GB is not only the user data to be written to SSD
When you read from SSD often it causes charges in NAND cells to dissipate, that means if you only read data you still need to re-copy it after 5-10 reads from the same NAND cell to another NAND cell
So booting Windows will cause internal writing operation regardless of any writings from the host
Also I'd think Internet cache and background indexing would add a lot of writes to SSD on modern computers, also regardless of desires of the end user

Doomer wrote:When you read from SSD often it causes charges in NAND cells to dissipate, that means if you only read data you still need to re-copy it after 5-10 reads from the same NAND cell to another NAND cell

This statement sounds absurd. In fact I find absolutely no reference to any such constraints in any NAND flash datasheets.

Are you suggesting that those millions of devices that are built around SoCs and embedded flash memory need to "refreshed" after very 5-10 power-on events? What about flash "EEPROMs" such as those used in my 10-year-old PC?

fzabkar wrote:This statement sounds absurd. In fact I find absolutely no reference to any such constraints in any NAND flash datasheets.

I believe that Doomer is referring to the phenomenon of "read disturb". Research that and you'll find it really does exist and yes, it's not in the typical datasheets - but it's real

as various technical papers on issues with NAND flash technology will confirm. Hope that helps...

Some of the papers in a previous post of mine discuss it a little, though nearly everywhere barely mentions it, let alone explains it in any detail. I think it is due to the fact that you will never be able to predict the life of any cell realistically.

This is from a Toshiba datasheet "TC58DVG02A5TAI0_datasheet_en_20100713.pdf":

Reliability Guidance

This reliability guidance is intended to notify some guidance related to using NAND flash with
1 bit ECC for each 512 bytes. For detailed reliability data, please refer to TOSHIBA’s reliability note.
Although random bit errors may occur during use, it does not necessarily mean that a block is bad.
Generally, a block should be marked as bad when a program status failure or erase status failure is detected.
The other failure modes may be recovered by a block erase.
ECC treatment for read data is mandatory due to the following Data Retention and Read Disturb failures.

Write/Erase Endurance
--------------------------
Write/Erase endurance failures may occur in a cell, page, or block, and are detected by doing a status read
after either an auto program or auto block erase operation. The cumulative bad block count will increase
along with the number of write/erase cycles.
Data Retention
-----------------
The data in memory may change after a certain amount of storage time. This is due to charge loss or charge
gain. After block erasure and reprogramming, the block may become usable again.

Read Disturb
---------------
A read operation may disturb the data in memory. The data may change due to charge gain. Usually, bit
errors occur on other pages in the block, not the page being read. After a large number of read cycles
(between block erases), a tiny charge may build up and can cause a cell to be soft programmed to another
state. After block erasure and reprogramming, the block may become usable again.

To not is the sheer number of "may do this", "might do that". Type statements. Seems also to imply that the implementer must deal with it in their application themselves.

This would make sense as you could put either bare minimum data integrity implementation for a "don't really care" device like a really cheap mp3 player, or a robust scheme for something important like a medical device or flight control application.

Datasheets are very vague IMHO

fzabkar wrote:What about flash "EEPROMs" such as those used in my 10-year-old PC?

The geometry and size of the cells in these kind of chips is huge compared to SSD NAND. These BIOS FLASH EEPROMS have cells that hold tens of thousands of electrons, as opposed to the 15-20 in TLC or small geometry MLC.

I believe the industry overall in general doesn't want to address the reliability issues that these ultra-high density cells have. The latest TLC devices rely on presence of or absence of just 10-20 electrons to represent data.

go for SSD! We use it everywhere for the past 3 years. laptops, desktops, servers

Vulcan wrote:I believe that Doomer is referring to the phenomenon of "read disturb". Research that and you'll find it really does exist and yes, it's not in the typical datasheets - but it's real as various technical papers on issues with NAND flash technology will confirm.

Thanks. I confess that I have never encountered this phenomenon. I am familiar with crosstalk in DRAMs, though. (In the old days our memory diagnostics used to write zeros to each cell in turn while filling its neighbours with ones, and vice versa.)

That said, Doomer's statement still strikes me as ridiculous. If it were true, then where are the gazillions of consumer gadgets with embedded NAND flash that should be failing in droves? Granted, I picked a bad example (as Keatah pointed out), but devices based around SoCs often use NAND flash memory for their firmware. I have disassembled many of these gadgets and I don't recall seeing any with a sophisticated flash controller. I believe these embedded flash chips may have their own internal ECC logic, but even so, I would think that if 10 reads were all that were required to "disturb" the data in any cell, then it wouldn't take very long for at least one bit to be flipped in the firmware, in which case the gadget would be dead.

BTW, the following paper reports that, when Micron 2Gb SLC NAND flash devices were cycled 1 million times, no program disturb or read disturb failures were detected. That said, the paper also states that "the devices showed inconsistent bad block performance and retention behavior" and recommended that all devices be "individually screened and characterized before being accepted for use" (ie the samples were not to be trusted).

Disturb Testing in Flash Memories:
http://trs-new.jpl.nasa.gov/dspace/bits ... /08-07.pdf

fzabkar wrote:
Doomer wrote:When you read from SSD often it causes charges in NAND cells to dissipate, that means if you only read data you still need to re-copy it after 5-10 reads from the same NAND cell to another NAND cell

This statement sounds absurd.

Nevertheless this mechanism present in modern SSDs

fzabkar wrote:Are you suggesting that those millions of devices that are built around SoCs and embedded flash memory need to "refreshed" after very 5-10 power-on events? What about flash "EEPROMs" such as those used in my 10-year-old PC?

If you suggesting that all SoCs and 10-year-old EEPROMs using NAND technology then perhaps you need to check your absurdometer again

I don't want really to try to waste my time and try to prove you wrong, I know that modern SSDs do that, because I RE them and I've seen it, that's enough for me. You can continue to theorize if you want to, using SLC chips as examples that nobody uses in modern SSDs anymore. Enterprises's reality is eMLC, consumers will have to stick with TLC. You knowledge is outdated like a dinosaur in modern world

SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?

Re: SSD drives is it worth to use in everyday life ?