All times are UTC - 5 hours [ DST ]




Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: How much faith do you put in SMART?
PostPosted: May 23rd, 2021, 13:55 
Offline

Joined: August 6th, 2020, 17:13
Posts: 5
Location: Canada
Let's play a game.

inb4 You should never use a questionable drive in order to avoid DR :|

You are tasked with refurbishing a large quantity of used, valuable (high capacity) HDDs and SSDs of various makes and models. Power-on time will be all be 4yrs or above.

Each drive will be placed into a TSA3000 (Technician Scrutiny and Accountability 3000TM). This contraption will benchmark the drives (write random + verify to all sectors then loop) to ensure that they survive at least 1 year of power-on time (data corruption, click of death, etc). The TSA3000 is well cooled and equipped with state-of-the-art vibration dampening (humidity and other environmental factors are guaranteed).

    * You will classify these drives as either "REFURBISHED" or "DISPOSAL".
    * You will be rewarded $1 per drive classified as REFURBISHED.
    * You will be neither rewarded or fined for drives classified as DISPOSAL.
    * You will be fined $0.50 for each REFURBISHED drive that does not survive 1 year of TSA3000.
    * You will be fined 1$ for each DISPOSAL drive that survives 1 year of TSA3000.

***

What is your strategy to maximize your profit?

    * What SMART attributes would you relax your concerns and which would you be more rigid with? Would you bother with vendor specific attributes or does it add too much complexity?
    * What tests would you perform and which tools would you use to help improve your predictions?
    * Of course the make, model and many other factors are what make this game challenging. All you need to do is play based on your own experience, or just guess!
    * I would love answers that are unique to specific technologies.
    * Remember you do not know or care about the data of these drives or their use case.


Top
 Profile  
 
 Post subject: Re: How much faith do you put in SMART?
PostPosted: May 24th, 2021, 11:12 
Offline

Joined: August 6th, 2020, 17:13
Posts: 5
Location: Canada
Here is version 2 since I did not put enough thought into my first post.

Let's play a game.

Scenario:

    You are tasked with refurbishing a large quantity of used, valuable (high capacity) HDDs and SSDs of various makes and models.

    Power-on time will all be 4yrs or above.

    The drives ALL have pre-existing SMART problems, however nothing critical and they were all functioning last powered on.

    Your objective is to best accurately predict which drives would and would not survive a 1 year long stress test.

RULES:

    You will classify these drives as either "REFURBISHED" or "DISPOSAL".

    You will be rewarded 1 Bitcoin for each drive classified as REFURBISHED that survives 1 year of the stress test.

    You will be fined 1 Bitcoin for each REFURBISHED drive that does not survive 1 year of the stress test.

    You will be rewarded 1 Bitcoin for each drive classified as DISPOSAL that does not survive 1 year of the stress test

    You will be fined 1 Bitcoin for each DISPOSAL drive that survives 1 year of the stress test.

    You will be rewarded or fined after the 1 year stress test is completed.

The stress test:

Each drive will be placed into a "TSA3000" (Technician Scrutiny and Accountability 3000TM). This contraption will benchmark the drives (write random + verify to all sectors then loop) to ensure that they survive at least 1 year of power-on time (data corruption, click of death, etc). The TSA3000 is well cooled and equipped with state-of-the-art vibration dampening (humidity and other environmental factors are guaranteed).

Objective:

What is your strategy to avoid fines and make as much BTC as possible?

What SMART attributes would you relax your concerns and which would you be more rigid with? Would you bother with vendor specific attributes or does it add too much complexity?

What tests would you perform and which tools would you use to help improve your predictions?

Of course the make, model and many other factors are what make this game challenging. All you need to do is play based on your own experience or notions.

EXTRA

The objective of this game is to simulate a situation where it is beneficial to keep storage devices running as long as possible. Feel free to participate or provide feedback. Answers are not required to be rule-of-thumb based or fact based, more so personal experience and opinion.

The purpose of doing this is to:

    Obtain a better understanding of SMART and HDD/SSD failure methods.

    Best predict drive failure using SMART or other tools based on user experience.

Why?

    SMART is not always reliable.

    The idea of using questionable storage is a taboo and not often discussed in detail.

    A potential use case for low value data storage is being explored. This would be for data that is worth less than the device it is stored on.

    Redundancy and accurate failure prediction is difficult. From our testing many storage devices predicted to fail have greatly exceeded life expectancy while others have failed without warning.


Top
 Profile  
 
 Post subject: Re: How much faith do you put in SMART?
PostPosted: May 26th, 2021, 18:55 
Offline

Joined: August 13th, 2016, 17:10
Posts: 192
Location: Vienna, Austria
* What SMART attributes would you relax your concerns and which would you be more rigid with? Would you bother with vendor specific attributes or does it add too much complexity?

SMART completely depends on the vendor and the model and the firmware, due to bad standardisation of the meaning of the values. I have reverse engineered the SMART calculation algorithms, and found very strange code, which e.g. duplicates the same raw value into different SMART values. I would expect that the vendor specific attributes are necessary.

* What tests would you perform and which tools would you use to help improve your predictions?

I would reverse engineer the SMART functions and analyze the behaviour of each model. For a large-scale operation, if possible, I would start collecting information of at least a datacenter scale environment and train some machine learning algorithms to estimate remaining lifetime. I would most likely train one predictor per each manufacturer+model/firmware.

* Of course the make, model and many other factors are what make this game challenging. All you need to do is play based on your own experience, or just guess!

Yes, for that purpose one could build up an economics model, which should help to decide whether to play your game at all or not :-)

* I would love answers that are unique to specific technologies.

* Remember you do not know or care about the data of these drives or their use case.

The problem that I see is that by the time you know about the reliability of the drives, the models are most likely uninteresting/outdated already. (Predictions are hard, especially for the future :-)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group