Last update: 2016-11-30
Introduction
Sorry about the outrageous title of this article. Actually it should be “Consumer-grade Solid State Drive (SSD) is less reliable than Hard Disk Drive (HDD) with typical computer usage”. (I can already foresee people attacking this article or accusing me of trolling, but I hope other people would find the information here help them a bit with their own research, and perhaps give more considerations before deciding to have SSD as their only drive in a new high performance PC. I know that many people will disagree with me, and for the record I do know that there are people who have been using SSD for several years without a single problem.)
Initially when SSD was introduced to the market a few years ago, I thought this type of product would be really great since it’d be fast and very reliable, since it has no moving parts like HDD does. When I researched about choosing the best SSD in terms of reliability, performance and price, I came across too many posts about SSD dying or not being detected. In a more extreme case, someone bought 8 SSD of various brands and they all died within 2 years. So I decided to write this article to share what I’ve found. I know people will disregard the information presented here as outdated and claim that nowadays SSDs are a lot more reliable (which is true to a certain extent), but I still see many reports of various SSD issues from various forums. Others may point out that HDD suffers from the same fate, as there are many new HDD turning bad within months of purchase as well – although this is true, the point of this article is that consumer grade SSD is not really more reliable than HDD. To be fair, I can say that both HDD and SSD are unreliable. 😉
In late 2015 the cheapest SSD already matches the cheapest HDD in device price (not cost per GB). Since SSD does solve the single most important performance bottleneck in most PC, SSD can be considered to be a standard item in a PC.
I’m not trying to advocate not to use SSD. The point of this article is not to avoid SSD at all costs, but to understand the higher risks associated with SSD. The following precautions are suggested:
- Use SSD for boot drive, and HDD as data drive
- Backup (if the data size is small, prepare copies in both the SSD and HDD)
- Keep installable OS media handy in case your SSD boot drive suddenly fails
In 2016, Schroeder and Google published the results of a large-scale field study. They concluded that “comparing with traditional hard disk drives, flash drives have a significantly lower replacement rate in the field, however, they have a higher rate of uncorrectable errors.” Meaning: SSD users will have a higher chance of data loss. Even though there is some positive results that can be drawn from this study, keep in mind that the drives in this study are not the cheapest consumer grade drives people typically buy for home use. In particular, with the industry wide adoption of TLC (which is not in the study), things will be very different for consumer grade SSD.
Ultra Cheap SSD
In 2016, two trends have emerged – all vendors are switching to TLC for their consumer products. The second one is more worrisome – the rise of ultra cheap SSD that is manufactured with the lowest cost in mind, making use of inferior-grade NAND chips and cheapest components, and it may even eliminate the DRAM cache. To understand why this is such a great threat, one needs to understand that in NAND manufacturing, it is just not possible to have all output to be perfect, so the good chips are marked with the brand of the manufacturer, and the bad-but-not-unusable chips are sold at a lower price to someone else, probably under a different brand. (In Chinese these are sometimes described as 白片.) The use of inferior NAND chips seals the fate of the ultra cheap SSD – they come with a significantly higher probability of causing troubles (such as data loss) than a quality SSD from a reputable manufacturer.
The DRAM cache is a critical piece of component that reduces writes to NAND, thereby increasing the life time, and also improves performance. Eliminating the cache will affect performance and increase the writes to NAND – coupled with inferior grade NAND, the threat becomes even greater. (Some models eliminate the DRAM cache but rely on a SLC cache – I don’t think this is ideal.) Since typical consumers often buy the lowest-priced component they can find, I foresee many people will have SSD troubles due to these low quality SSD.
Some of the DRAM-less SSD include:
- WD Green SSD
- SanDisk SSD Plus / Z400s / Z410
- Transcend SSD360S
- OCZ TL100
SSD Endurance and Write Speed Slowdown / Degradation
Computer enthusiasts already know that flash and SSD have limited lifetime in form of write cycles, so naturally there are worries about SSD endurance. However, for most people, a variety of SSD issues can come up as described in this article, before one can use up the endurance of SSD drives using good MLC (or even SLC) NAND chips. For real-world test results, read this Xtreme Systems thread and this Tech Report SSD Endurance Experiment. TLC has even lower endurance, unfortunately.
A related problem is that some manufacturers opt to put the SSD drives in a degraded mode that makes them perform much slower, once a write limit is reached. To make it worse, generally users have no way to reset it, not even doing a secure erase or letting the drive idle for days will recover the lost performance. Although not restricted to TLC SSD, it is probably common for TLC SSD to employ such a design due to inherent low endurance of TLC.
- Kingston SSDNow V300 SSD: It renders itself useless after writing a certain amount due to a “feature” known as Drive Life Protection.
- OCZ Trion 150: Write speed is limited after reaching the rated amount of write, e.g. 30TB for 120GB drive.
- Plextor M7V: 128GB model write speed is limited after writing about 60TB.
TLC SSD Read Speed Slowdown / Degradation
It is common knowledge that SSD slows down over time, so many reviews attempt to test steady state performance after stressing the drives. However, with TLC there is an additional, much more serious and non-obvious slowdown. In particular, Samsung 840 EVO is notorious for slowing down 10x. Firmware updates could not solve it completely. The way that Samsung deals with the 840 EVO slow down is to have the SSD rewrite the old data – that means the already low TLC endurance is further stressed by invisible background rewriting.
This type of slowdown occurs with old data – not newly written data. Since benchmark tests typically write new data to an SSD, this problem is not detectable by usual benchmark tests and therefore not discussed in ssd review articles. The proper benchmark for detecting TLC old data read speed slow down is SSD Read Speed Tester.
Unfortunately, this problem is not limited to 840 EVO only. Some people believe that Samsung 850 EVO does not suffer from the slowdown of TLC read of old data problem, but there is a report that indicates the 850 EVO 1TB model does slow down like 840 EVO did.
It was found that similar slowdowns occur with Crucial BX200, ADATA SP550, and in general “SM2256 paired with 16nm TLC from either Micron or SK Hynix”. Note: Intel 540s uses SM2258 with SK Hynix 16nm TLC NAND.
With the switch to TLC by all vendors, the performance advantage of SSD is negated by the slow TLC in some scenarios even if we ignore the case about old data – especially when the cache is filled up. While 3D TLC performs reasonably well, cheap planar TLC as found in low cost SSD drives do not perform well. Some of them has a cache, but once the cache is filled up, the SSD performs very slowly and is comparable to HDD. If the need for SSD is because it’s fast, it does not make too much sense to pay a price premium for a slow-performing SSD (the cheapest SSD is still more expensive than HDD on a price per GB basis).
Buy a MLC SSD instead of planar TLC SSD – if you can still find one.
SSD Firmware Bugs
[Since 2015 there has been far fewer critical SSD firmware issues that I know of.]
There have been many critical bugs with various brands of SSD. At a time when a SandForce controller was the best performing consumer-grade SSD controller, many brands of SSD used it. Naturally, SandForce controller bugs were inherited by all those drives that use it. It took a while for SandForce to fix its infamous Blue Screen of Death (BSOD) bug. (If any reader would like to argue that having firmware bugs do not imply SSD is unreliable, my response is: can you tell your boss, who is experiencing daily BSOD with the PC you built for him, that the PC is actually reliable?) Even some models of Intel SSD are affected by SandForce controller.
There are other BSOD bugs, not just with SandForce. As another example, Crucial m4, which uses a Marvell controller, had a BSOD bug after being used for 5184 hours. Although this is specific to one series of one brand, it is really important because at a time, this particular series was widely regarded as the best buy for SSD because of its high performance at a reasonable price.
BSOD is bad enough, but there were even more serious bugs. For example, a certain firmware upgrade for Intel X25-M may actually brick it. There are other similar incidents of firmware upgrade bricking SSD as well.
Apple issued a recall for some MacBook Air 2012 models with 64GB or 128GB SSD. Earlier Corsair recalled Force 3 drives. Intel and Kingston also recalled early models of their SSD drives.
Although it was reported that the Samsung-specific SSD-killing Secure Erase bug was fixed, there are still a few reports of issues with it. A firmware update also bricked some users’ Samsung 850 Pro.
While I like manufacturers to release regular firmware updates, one should always exercise caution (i.e. backup) in doing upgrades. There is a report of a Transcend 370S (using SM2246EN controller) user losing the drive data after doing a firmware upgrade.
SSD Freeze Problems
Some SSD have issues that are prone to freeze the system for several seconds (e.g. Plextor SSD Freeze problem in Chinese). This is a particularly hard issue to resolve, since different people using the same firmware on the same SSD have different results. Anyway, if you happen to be experiencing this problem, see if a firmware that fixes it is available. Also read why some of these SSD Freeze problems are related to Link Power Management and how to deal with it for Intel chipset users (in Chinese), for AMD chipset users (in Chinese), or in English.
SSD Sleep Problems
For a variety of technical reasons, it is bad for SSD to sleep/hibernate. In the worst case, sleep/wake operations may cause BSOD, especially with SandForce controllers (again). Even if not using a SandForce controller, Crucial m4 has a firmware update 070H that resolves a hang issue that “would typically occur during power-up or resume from Sleep or Hibernate.” So it’s usually a good idea to set SSD to Never Sleep, disable Windows hybrid sleep, disable hibernate, and set BIOS to use S1 instead of S3.
(By the way, even CPU sleep states affect SSD performance. On some (but not all) system configurations, disabling C1E, C-States and/or EIST in BIOS yield higher benchmarks for SSD.)
SSD Not Detected Problems
For unknown reasons, there are many reports of problem detecting SSD by the system for pretty much all brands, but some models of Crucial, such as MX100 is especially prone to this symptom. Google “MX100 disappear”, or “镁光 掉盘” if you can read Chinese. If you’re suffering from this problem, try these:
- Turn on SATA Hotplug in BIOS
- Change Windows Power Plan to High Performance
- Set SSD to Never Sleep, and disable Windows hibernate and hybrid sleep
- Disable Link Power Management if you use Intel RST
- Enable HIPM+DIPM for SSD power option
- Upgrade SSD firmware
- Uninstall Intel RST altogether and use MSAHCI driver instead
- Try a different SATA cable
This problem is so prevalent that there are both an official procedure and an unofficial procedure to recover from this problem for Crucial SSD. Crucial explains this as being caused by a sudden power loss, but some people simply shut down their computer normally and then got this problem when powering up the computer next time.
[Note: It looks like that beginning with MX200 and BX100, Crucial has managed to decrease the probability of this issue. The company I work for has 6 or 7 MX200 and have no problems so far.]
Some people advocate that when you finish using the PC, log off instead of shutdown to let the SSD perform its garbage collection (GC). Since the SSD Not Detected problems occur at boot time, I figured that in order to eliminate this chance, perhaps SSD users should forget about saving the environment and just run their PC 24×7, never let it sleep or shut it down, and equip it with a UPS – Just kidding 😉
SSD Power Fault Problems
There is a 2013 paper called Understanding the Robustness of SSDs under Power Fault, which finds that SSDs suffer data loss or even become bricked after getting a power fault.
This is not just an academic study. It is very real: Intel SSD 320 series had an infamous “8MB bug” – the drive suddenly becomes 8MB in capacity after a power loss. The method of fixing it destroys all data, unfortunately.
Even if one uses an uninterruptible power supply (UPS), in the event of a computer hang, there are times when one needs to force power down the system.
SSD Sudden Death Problems
SSD tends to die abruptly, usually with no warning from SMART and definitely offers no click of death warning. Although HDD can also die abruptly, they have several other common modes of failures as well, so even for the same total number of SSD and HDD failures, SSD must have a much higher proportion of suffering from abrupt deaths. In the past there were articles which describe SSD as becoming read-only when they die, so your data would be secure. Reading real-life reports will quickly disprove this statement. (Although it might still be true at a flash chip level – I don’t know, it certainly is not true from the users point of view.)
Dying abruptly is a problem, but if very few users experience that then we should not be concerned about it. However, from users feedback it seems to me that the number of users suffering from sudden SSD deaths within 2 years of purchase is not “very few”.
SSD Data Loss if Left Without Power
This sounds unbelievable but it is a fact that SSD gradually loses data if it is left without power. The problem increases as temperature increases.
Dell Q&A: “Q: I have unplugged my SSD drive and put it into storage. How long can I expect the drive to retain my data without needing to plug the drive back in?
Answer: It depends on the how much the flash has been used (P/E cycle used), type of flash, and storage temperature. In MLC and SLC, this can be as low as 3 months and best case can be more than 10 years. The retention is highly dependent on temperature and workload.” See http://www.dell.com/downloads/global/products/pvaul/en/solid-state-drive-faq-us.pdf#page6 (credit: @timleavy)
SSD in RAID
There is a blog discussing the risk of using SSD in RAID, primarily about the danger of rebuild when all the SSDs in a RAID group are about to die. Although it seems theoretical and the real situation may not really be this bad due to the variance of flash chips dying at different times rather than roughly at the same time, one of the comments indicated it really happened.
Contradictory best practices for configuring Windows for SSD
There are many guides that teach people how to optimize their Windows for SSD. However, not everyone agrees with some of those recommendations. In particular, while many people suggest moving the pagefile to HDD or simply disabling it (which I consider to be as outrageous as the title of this article), this contradicts Microsoft Q&A for SSD drives. Although I generally trust Microsoft technical publications, this guide found that the Q&A about automatic disabling of Superfetch to be not true. These discrepancies make things confusing.
Good Power Supply of 5V Required by SSD
The importance of a good power supply unit (PSU) is well known to PC users who use graphics cards. It is not immediately obvious that a good PSU is really critical for SSD, because their specs usually claim a few watts only – which is nothing compared to any decent graphics cards, or the CPU itself. However, one thing makes SSD power consumption totally different from CPU or GPU – CPU and GPU use 12V output from PSU, but 2.5″ SSD use 5V output from PSU. Some decent PSU may have been well designed to handle high current draw of CPU and GPU, with less attention paid to 5V. Besides, peak current may go to 1A at 5V despite the really low power consumption in normal operations. So some systems that can handle stressing of overclocked CPU plus GPU do not automatically mean they are capable to support SSD equally well. This is discussed in Jim Handy’s article, and may have been the cause of one user’s 100% SSD failures.
SSD requirement for PSU: At least a minimum of 1A clean output at 5V is necessary per SSD.
Bait and Switch
Of course bait and switch is not limited to SSD – it’s mentioned here just for completeness.
OCZ Vertex 2 – changed from 34nm NAND to slower-performing 25nm NAND
Kingston SSDNow V300 – changed to much slower-performing NAND. It also renders itself useless after writing a certain amount due to a “feature” known as Drive Life Protection.
PNY Optima – retail drives use different controller than the ones sent to media for review.
Silicon Power SP60 – it appears that this model continually changes its controller, such that you don’t really know what controller you’re buying until you actually buy one and disassemble it
ADATA SP920 – in the past SP920 (silver version) was a good alternative to M550 because SP920 is Micron OEM M510 and use Micron firmware, especially when it was slightly cheaper and M550 is no longer available. However, the new black version downgrades the controller and use worse NAND. Stay away from it.
SanDisk SSD Plus – changed from MLC (SM2246XT controller) [Z400s equivalent model] to TLC (SM2256S controller) [Z410 equivalent model]. Strangely western media did not report it or care about it – perhaps it’s meant to be the slowest drive in the line up (behind Extreme Pro and Ultra II) and this positioning has not changed.
Linux Kernel TRIM compatibility
An investigation into a potential TRIM bug in Samsung SSD led to the discovery of a potential TRIM bug in Linux kernel.
Some SSD are not completely compatible with Linux, so Linux has a list of drives or firmwares that need special workaround:
/* devices that don't properly handle queued TRIM commands */ { "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Crucial_CT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Micron_M5[15]0_*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Crucial_CT*M550*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Crucial_CT*MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "Samsung SSD 8*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, { "FCCT*M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, }, /* devices that don't properly handle TRIM commands */ { "SuperSSpeed S238*", NULL, ATA_HORKAGE_NOTRIM, },
Which SSD to buy
Most importantly, stay from the ultra cheap SSD. Stay away from non-leading brands. Stay away from models that change its controllers and NAND without notice.
Given a choice between MLC and TLC: always choose MLC (assuming the best SLC is out of budget), e.g. SanDisk Extreme Pro and Crucial MX200. Between Crucial MLC MX200 and TLC MX300, MX200 is distinctly better, as shown in this AnandTech MX300 benchmark (likewise, MLC BX100 is distinctly better than TLC BX200):
This benchmark is noteworthy as it also shows that planar TLC drives are placed at the bottom, i.e. slowest. (Some may argue that I chose a benchmark that unfairly shows the weakness of TLC – no, I chose it because for most people, random 4K access is the single most important benchmark that affects general computer responsiveness. In fact planar TLC performs poorly in many benchmarks, not just in this one.) Unfortunately, with MX200 being discontinued, there are not too many MLC drives remaining.
Some people like NVMe drives for fastest performance, but to boot Windows from NVMe SSD you need a 9-series or newer Intel chipset motherboard, corresponding BIOS that includes NVMe support, and Windows 8.1 or above. Intel SSD 750 is slow to boot because it requires a long time to enumerate, if one does not mind the high price and the slow boot time, there is nothing else to complain about. Samsung NVMe drives are really fast, but they are affected by thermal throttling, more so than other brands.
No matter which you choose, remember to take a look at user reviews at online retailer websites and see how many users (with respect to total number of comments) have their SSD suddenly die within months.
Freemon Sandlewould said:
Really sounds like more trouble than it is worth for me. I suppose there may be some range of operation where it is merited in spite of its flakiness.
anthony martin said:
you are so very very wrong. ive been using Kingston ssd’s since 2009 and I have had 0 failures. It’s now 2013. I have had several in many machines I own or have sold. infact the only failure ive had have been a hdd. I suggest you actually try one out before you bash them. every product has defective units. someone winds up with them and complains. its just what happens when a product is mass produced.
Eugene said:
Not so long ago I purchased and started to use a brand New Kingston SSD Now 300, that died after 45 days.
Guest said:
You’re kidding, right? ONE data point, and you think that proves a case?
sickboy said:
So, you got a properly working SSD and you’ve thus disproved the generalization that “all the SSD are bad”. Congratulations. Unfortunately for you, no such generalization has been made in the article. Go play with your SSD while you can.
Steve W said:
I just built a new desktop system and I chose to have three 1TB traditional hard drives in it rather than SSDs. I know that SSDs are much faster, but for me anyway, that “WOW” factor will go away after the first boot-up. Then I’ll start worrying about the C: drive…
For the time being, I’m using an internal SSD as an external storage backup, thanks to the hot-swap drive bay on my new case. This will be my “off site” backup which I’ll keep in my car. See how it works out.
Maybe SSD technology is still too new and some of the bugs and manufacturing processes will be worked out over time.
So HDD it is for me. At the moment.
Pingback: Ervaringen BTO notebooks - Pagina 140
Abraham Jose said:
What you claim is exaggerated, although we cannot discount that SSD failures are common. Samsung and Intel makes the best SSD’s. still reliability is not even near ancient HDD’s which are still giving a fight back with helium tech and kinetic(Seagate). IMO, another tech must be found, that will resolve the reliability issue of both HDD and SSD. HDDs and the 3-4 year cycle of replacements/RMAs..SSD’s with lot of complicated installation features for optimization and yet failing.
Guest said:
I have an SSD and I like it. Works fine. But it worries me, and I know I’ll end up replacing it with an HDD. Reliability is all-important.
Francis Jacob Christian said:
Add me to the list of dissatisfied SSD users. Mine flaked out in 6 months. I’m back to normal hard drives. At least platter technology has had over 30 years to perfect. I’m sticking with it. I might consider a hybrid in the future but that’s it!
Fergus said:
SSD’s have proven very unreliable to me.
Crucial C100 dirty page issue and BSODs after 1 month, complete failure after 3.
2ND crucial C100 lasted 3 months.
After several years I just tried a Mushkin PCIe (2Gb/s read) in June 2015 and it lasted 1 month
John said:
Have used lower end consumer SSD from PNY and SanDisk and a system that came with an Intel SSD. Out of 4 systems – 3 of the drives have failed, one suddenly with zero recovery options and the other 2 suffering multiple BSOD issues until drives no longer recognized. All drives failed within 3 months to 1 year.
Love the performance but haven’t seen this with HDD over the past 15 years. Can only recall a few drives ever failing and always had time to move the data. My experience may very well be related to an issue with a common controller or memory used but I don’t have the time to explore the reasoning. Will wait a few more years and keep an eye out for recommendations, complaints before buying again.
moonlightknighthk said:
Thanks for the feedback. If you need to use SSD again and if those failed systems use the same power supply unit, consider getting it replaced as well. SSD requires very good +5V output.
Brian said:
I came searching for articles related to this topic based on a stack of failed SSDs that have accumulated on my desk over the past year. I wanted a sanity check on whether I was having bad luck with SSDs or if they are just not as reliable as HDD and your article was informative. After using different brand SSDs for about 6 years and HDD for more than 3x as long (in just casual home enthusiast usage), I’ve noticed a far greater proportion of failure among my much smaller collection of SSDs (they stop booting, unable to turn on, they mount but are inaccessible for analysis or reformatting etc.), The SSDs that have failed have done so in under 3 yrs use, malfunction without warning and I’ve never been able to recover any data from the failed units no matter what utility I’ve tried. For my most recent failure (1 yr of use so still under warranty) the manufacturer will replace the unit and they offered 10% off professional data recovery services, which I’m actually considering (a drive where the user didn’t do backups,which I acknowledge is foolhardy).