* What SMART attributes would you relax your concerns and which would you be more rigid with? Would you bother with vendor specific attributes or does it add too much complexity?
SMART completely depends on the vendor and the model and the firmware, due to bad standardisation of the meaning of the values. I have reverse engineered the SMART calculation algorithms, and found very strange code, which e.g. duplicates the same raw value into different SMART values. I would expect that the vendor specific attributes are necessary.
* What tests would you perform and which tools would you use to help improve your predictions?
I would reverse engineer the SMART functions and analyze the behaviour of each model. For a large-scale operation, if possible, I would start collecting information of at least a datacenter scale environment and train some machine learning algorithms to estimate remaining lifetime. I would most likely train one predictor per each manufacturer+model/firmware.
* Of course the make, model and many other factors are what make this game challenging. All you need to do is play based on your own experience, or just guess!
Yes, for that purpose one could build up an economics model, which should help to decide whether to play your game at all or not
* I would love answers that are unique to specific technologies.
* Remember you do not know or care about the data of these drives or their use case.
The problem that I see is that by the time you know about the reliability of the drives, the models are most likely uninteresting/outdated already. (Predictions are hard, especially for the future