Data Corruption what is it, strategies to avoid it
In front of a file damaged, even a system file, you may not be desperate. However, when it comes to personal photo, valuable data or work files, the reaction changes drastically. Let’s discover the world of Data Corruption and take a few measures against it.
What is Data Corruption
The term indicates the impossibility by a user or a program to access data because of corruption. And the worst is that it is even more common than we thought. Corruption comes in many forms caused by many different causes, and there are recoverable and unrecoverable cases. Let’s take a look at most of the cases:
- Power outage: an unexpected power outage may damage data when the operating system is currently writing.
- Improper shutdown: just as the power outage, an improper shutdown can cause data corruption.
- Defective firmware: although uncommon, there have been report of disks losing data because of bugs in the firmware.
- Bad sectors: this is by far the most common data corruption example, it happens when a sector becomes unavailable due to a physical problem.
- Storage failure: whatever the type of storage, it will most likely die in time, and worse it can slowly die over time in some cases.
- Defective software: unstable or poorly tested software, especially operating at low-level can cause data corruption.
- Malware: with software growing stronger, malware grows too. A constantly growing example of this kind of data corruption is a branch of virus called ransomware.
- Bad RAM: this is probably the most concerning since it is undetectable unless using ECC RAM.
There are a several other causes that don’t really fit in the term, and are less destructive than these ones.
How to prevent it
Acknowledging the problem is the best way to prevent it. Let’s take a look at cause/problem/solution tuple.
- Power outage: using a UPS (Uninterruptible Power Supply) can usually solve the problem, just be sure to shut down the computer before the battery wears out. In case of a server, you should use UPS which communicate to the computer and set it to shutdown automatically. This way your files will be safe.
- Improper shutdown: a basic computer literacy skill set will be enough to prevent this. Keeping the power supply on and not easily accessible is another good starting point. Disabling the power/reset buttons is also another measure. For those wondering, these kind of things happened to me with other skilled people in work environments.
- Defective firmware: this isn’t really an easy one, you can’t notice a firmware bug until it’s too late, so the best way to avoid this is buy tested drives from known vendors. The best solution remains backups. Also continue reading the next paragraph for a consistent solution.
- Bad sectors: usually happens on hard disks, the solution is already built-in in most of the modern devices. Keeping backups is once again the most effective solution.
- Storage failure: be it optical disk, hard disk or solid state drive, a physical mean is subject to failure, and when that happens you lose everything on it. Again, backups are the omnipresent solution, but what if you need that backup so often that it might fail too? The solution to the problem is called RAID, an enterprise-class solution supported by almost any common operating system. Be careful however: RAID5 has a known “write hole” that can cause data corruption if a power outage were to happen. Also let me be clear: RAID is by no mean a replacement for backups.
- Defective software: this is one of the most daunting. They say there’s no software without bugs. That means that even machines can do wrong, and when that happens you don’t want your data to be on the edge. In this case the best prevention is using well-tested software and having a good backup strategy since RAID won’t protect you.
- Malware: no discussion: Antivirus for prevention, backup for eventuality.
Did you notice bad RAM was missing? Well that’s because there’s something worse than data corruption and that’s silent data corruption.
Silent Data Corruption
Alias the latest and most daunting plague in the list. Data corruption can sometimes be detected and corrected, but silent data corruption is different, it mostly happens because of errors inside RAM or disks caused by an external factor, to know more about this kind of errors read the relevant paragraph in my article about ECC RAM. Loud noises, cosmic rays, or vibrations may cause errors undetectable by the operating systems, hence creating data corruption that spreads silently. But what can we do against something that can’t be detected by an operating system?
The answer is ZFS (which I will speak of in another article), the Zettabyte File System as it was called in the early days, is a file system initially developed by Sun Microsystems and later on by Oracle. Its primary purpose is to address the phenomenon of silent data corruption since the most widespread file systems are not capable to provide enough protection. ZFS is as a matter of fact the state of the art file system for data integrity. ZFS however, isn’t only a file system, it provides also the features of a Logical Volume Manager and software RAID. By combining all these features and adding checksumming ZFS is able to guarantee data integrity (provided you use its software RAID capability). As a downside ZFS needs a lot of RAM, roughly 1GB per 1TB of storage, and using non-ECC RAM might cause entire pools of data to become corrupted.
ZFS and alternatives
ZFS is available as a native file system on Solaris and is compatible with many *nix operating systems including FreeBSD, NetBSD and also Linux. It is also the foundation of the popular FreeNAS operating system. Aside from ZFS, there are also other solutions but they are not quite as advanced as ZFS itself. One native solution for Linux is BTRFS that is gaining more and more attention nowadays, it is however still not mature enough to compete with ZFS and is still not production ready (though it is becoming more and more stable). There are also other alternatives, but in my opinion they do not match these two file systems in their work. A good comparison table can be found here.
Good old backups, even manual ones are still good and will be around for a while. Data corruption is more present than we thought. ECC RAM + UPS + ZFS is the de facto best you can get.
Image courtesy of William Warby.