RAID explained (easy and basic...)

azasadny · May 6, 2013

[h=3]RAID Controllers[/h]And now on to our guest editorial by Mike Pepe...
Choosing the right way to protect your data can be a daunting task. Many system administrators may simply opt to add more drives to a system and implement mirroring and consider the task done. However there are many options available to you and understanding the implications of one data protection scheme over another will help you make the best choice.
The basic forms of RAID
RAID (Redundant Array of Independent Disks) has been around for almost as long as there have been hard drives. The most commonly encountered RAID types are defined by their number; 0, 1, 5 and (most recently) 6. Let's quickly review these RAID levels and what they really mean:

RAID Level 0 (that's a zero) is sometimes called "non-RAID" or simply "striping". In this scheme, data is read and written across some number (n) disks simultaneously. This improves read and write performance up to n times that of a single drive, and also gives you n times the capacity of a single drive. The big downside to RAID 0 is that the failure rate of such an array increases n times that over a single drive. When a drive does fail, the array will become unavailable and the data irrevocably lost. In some circumstances, however, when raw performance or capacity are the only concern: RAID 0 may be a good choice.
RAID Level 1 is also often called "mirroring". A volume using RAID-1 will contain (at least) two drives, and the data will be read and written to all drives simultaneously and in the same order, often at the sector level. Read performance may be improved up to n times, however, as there are now multiple copies of the data to potentially read from. Write performance may suffer, as it may take n times as long to commit a write to all the copies when compared to a single standalone drive. Capacity remains fixed to the size of one drive, no matter how many copies you make.
RAID Level 5 may also be called "parity" by some folks. A RAID-5 array must consist of at least three drives. In this type of array, data is written to n-1 drives, and a "parity" unit is calculated and written to the remaining drive. The drive holding the parity chunk per stripe is rotated through the physical drives, distributing it evenly across the drives in the array. The primary advantage of using a RAID-5 array is that the failure of any single drive does not produce any data loss; the missing data can be reconstructed from the parity and the array can continue to operate, albeit with some degradation in performance. Capacity of the array is n-1 drives, and read performance can be improved by up to n-1 times that of a single drive. Write speeds can suffer in RAID-5 arrays since there must be a parity computation before the stripe can be committed to disk.
RAID Level 6 improves upon the ideas of RAID-5 by providing another, different parity calculation and distributing it across the available drives. You need at least 4 drives for RAID-6, and the capacity of the array will be n-2. The chief advantage is that a RAID-6 array can sustain two disk failures without loss of data, again at the penalty of having to computationally reconstruct the missing data. Writes similarly suffer as with RAID-5 due to the need to compute two different parity chunks for every write to the array.

RAID-10 and compound RAID levels
RAID Level 10 is a compound RAID level. More precisely, it's RAID 1+0 (or sometimes, RAID 0+1) – and combines both striping and mirroring. A RAID-10 must consist of at least 4 drives: (two mirrors of two two-drive stripes is the minimum) but can consist of any number of drives, which we'll call stripe width (n) multiplied by the number of mirrored copies (m).
RAID-10 arrays combine excellent performance characteristics as well as good data integrity. There are potentially a great number of drives to pull data from, meaning there is a theoretical read performance of n * m times over that of a single drive. Their biggest downfalls are in capacity: which is only that of n drives, cost: since you must purchase n times m drives, and in write performance depending on how well you data stripes across the drives, which is something we will explain later.
Other "compound" RAID levels are possible, for instance, striping across multiple RAID-5 arrays (RAID 50) or mirroring two RAID-5's (RAID 51) although not every controller supports these more complex scenarios. These compound RAIDs are not officially defined and therefore may not be portable across systems or controllers.
Different types of RAID controllers
Now that we've reviewed the different ways which we can use multiple disk drives for varying degrees of performance, capacity and reliability you may have already decided on what the best scheme is for your application: but RAID type is only one part of the equation. How you control these disks is also important. We can bundle RAID controllers into three distinct categories.
Hardware RAID controllers offer the best performance since they are, in effect, self-contained computers dedicated to running RAID arrays. The controller manages all the aspects of the RAID, and the host system is free to do other tasks while the RAID controller manages everything behind the scenes. Hardware RAID controllers often have their own cache to improve performance, and often have an option for a battery back-up to prevent data loss if the contents of a write cache were not written to disk. All this power has a price, however in this case literally. The best high-end RAID controllers can be very expensive. There are other potential pitfalls as well, which we will discuss a little later.
Software-based RAID uses your host operating system to virtualize your storage into RAID (or RAID-like) groups. For instance, creating a mirror (RAID-1) of your boot disk in Windows Disk Administrator is a simple example of a software RAID. On the other end of the complexity spectrum, Windows Server 2012 introduces a storage management system called Storage Spaces. Using Spaces, you can make a pool out of your storage and apply different protection schemes to your data on a folder-by-folder basis rather than at the partition or disk level. Software RAID has the advantage of being the least expensive option in most cases since the functionality is part of the operating system and requires no additional hardware, or the addition of relatively low-cost host bus adapters to connect disks to your system if you need more ports. Software RAID also has the potential to be the most flexible. For instance, it is possible in Windows to create a RAID-1 mirror using half the capacity of two disks, and then create a RAID-0 volume out of the remaining storage. You'd then have a volume for data that needs protection and one for data that's not critical: all on the same two disks. The main disadvantage of a software RAID setup is that your operating system must manage it, therefore performance may suffer as your CPU time is used for disk I/O rather than for your application. We'll also examine the real-world implications of this later.
Somewhere in the middle are "hybrid" RAID controllers. These sort of controllers are marginally more expensive (or in some cases, the same price as) non-RAID host bus adapters. They generally have firmware that host CPU actually runs to provide the RAID controller functionality, and OS drivers that do the same. In that sense, they are not much different than a software RAID. However, these devices may have some form of caching or dedicated hardware to help speed up operation of a RAID array: for instance, a hardware parity calculator for RAID-5 and 6 arrays. So these devices sit somewhere in between the functionality provided by software and hardware and therefore the pros and cons of both may apply.
Choosing the right RAID controller
So which one is best? Most people would assume that a high-end hardware RAID controller is obviously the best choice, but that's not always the case. At the entry-level server spectrum, a high-end RAID controller can be more costly than an entire server! Some of them have their own out of band network configuration and can be rather complex devices for the non-techie to get working. Interchangeability also is a potential issue for the hardware and hybrid RAID controllers: if your controller fails, you'd likely need at least something from the same product family with similar firmware installed to insure you can read/recover the disks. Good luck to the system administrator who has to try and track down a specific version of a RAID controller that hasn't been made in half a decade!
Contrast this to a software RAID where there's a very good chance that any machine running the same operating system can have transplanted disks from a failed server back up and running very quickly. Recoverability in the event of a crisis may be better here, unless you keep a spare RAID controller card handy. The software/hybrid solutions do utilize your system's resources to a much greater degree than the hardware solutions, but except in the most demanding and critical systems the few percentage points of processor utilization is hardly likely to be noticed.
Price versus performance is a second key decision point, but let's talk more about recoverability. We touched on this earlier with a key advantage of a software-based RAID: the RAID volume should be readable in any machine running the same operating system, whereas with a hardware RAID controller there's a good chance that your RAID volume would not be readable with another brand or type of controller. However there is one exception to this; a RAID-1 that consumes an entire disk; often these volumes are simple block-by-block copies of what would normally be written to a single disk. In many cases it is indeed possible to take one of the copies of a failed mirror volume and put it into any random machine and read it.
Recovery time and reliability
Recovery time and reliability are another point of consideration. As of today, a 4TB drive is the largest available capacity. Average transfer rate on a drive of this size is somewhere around 180 megabytes per second, which means it would take, on average, over seven and a half hours to completely fill this drive up. (In the real world, the time to rebuild an active RAID-5 using drives of this size would be two or more times that!)
Why is this important? Let's consider a RAID-5 built with five 4TB drives. One drive fails and is replaced, and the rebuild process begins. Since hard drives are electro-mechanical devices, there is an engineered in error rate. In this case our drives have a 1 in 10E14 chance of an uncorrectable bit error during any read. In order to reconstruct the RAID-5, we must read a total of 16TB of data, which is 1.28x10E14 bits! There's a very real chance that during the rebuild, we'll encounter an uncorrectable error on one of the remaining drives: if the controller deems that drive bad, we'll have a RAID-5 array with two dead drives and the entire array will fail, and our data disappears.
RAID-6 will help here, since it will continue operating even if two drives fail. However given the high likelihood of an error, even RAID-6 starts to look less and less attractive.
The value of triple redundancy
Given that there is a statistical chance of catastrophic failure of a parity-based RAID group, you should always remember a few things; first and foremost: RAID is not a replacement for a sound backup (and recovery) strategy. Make sure you have backups in place, and test them periodically to make sure that they are recoverable. Secondly, consider triple-redundant options using RAID-10 striping and mirroring.
It's probably safe to say that many people have encountered random silent corruption in their daily lives. It's that picture that won't display anymore, or the video that's broken at some point in playback. Sure these things can happen with single drives and single copies of data, but they do appear even when disk mirroring is in place. Why would that be? Consider the following scenario: a server running a RAID-1 array with two drives crashes or loses power. A random spurious write corrupts a random sector on the hard drive. When the machine comes back up, the controller detects a dirty shutdown and re-mirrors the drive, and encounters a data difference. Which block is the correct one? It's entirely possible the RAID controller doesn't know, and there's potentially a 50% chance that it'll guess wrong, permanently corrupting the file.
What if there were three copies instead of two? Well, in that case, the RAID can take a vote; if two of the blocks agree, it's probably the "right" data. Add a checksumming filesystem, such as Windows Server 2012's ReFS and Storage Spaces on top of that with triple mirroring, and the chances of silent corruption in your data drop dramatically.
Stripe size and RAID performance
Also consider performance of your stripes. RAID types that stripe data across disks have what is known as a "stripe size". A common stripe size is 64k, meaning that data is written to each drive in 64k chunks. As an example, a 4 drive RAID-5 would then commit data to disks in chunks of 256k (4 drives, 64k each). There is nothing wrong with this, as long as your files are generally larger than 256k. If they are not, updating the smaller files within this stripe will require a read of all 256k, a modification to the data, recalculation of parity, and then a 256k write back to all the drives! If you have a lot of very small files, the performance penalty to write or modify them can be enormous.
A few guidelines concerning RAID
Armed with these basic guidelines, the data protection scheme you choose is a balance between needed capacity, performance, and the ever-present constraints of budget. However here are some guidelines based on some real world experience:

Mirrors of single, whole drives is simplest. The ability to take one drive out of an array and read it elsewhere can be a real timesaver over restoring from backup.
If your application demands utmost performance, consider investing in a hardware-based RAID controller and RAID-10. Otherwise, a hybrid or software-based RAID-10 solution may be sufficient.
If capacity needs are high and performance and budget requirements are low, a parity-based solution may be a good fit. Consider using RAID-6, particularly if the array will have large numbers of high capacity drives.
If data integrity is the utmost importance, consider a three-copy mirror and ReFS using Storage Spaces. Background data scrubbing and majority-vote-wins concepts will significantly reduce the chance of spurious data corruption.
And most importantly: Make sure you have a good backup strategy, and you know you can restore!

About Mike Pepe
Mike Pepe joined Microsoft in 2006 after working in the IT field for ten years providing clustering, backup, and storage solutions for the telecommunications industry. He is currently a Service Engineer working on datacenter-scale automation and service design for Bing.is a Service Engineer for the Bing Information Platform at Microsoft.
Send us feedback
Got any comments or stories concerning RAID solutions and controllers? Let us know at wsn@mtit.com

JLyman · May 6, 2013

Mike is a pretty cool dude and definitely knows what he's talking about.

louwin · Jun 1, 2014

I don't know if this thread is closed or whether I can reply to it so I apologize if I'm breaking any rules by posting to this old thread.

I have been using hardware RAID 1 for about 3 years. Apart from having to replace 2 dead drives I have been happy with things.

I bought a new desktop and created my 3 mirrors with 4Tb drives. All good.

I needed another mirror but my motherboard only supports 6 RAIDable ports so I created a software/Windows RAID mirror. On copying stuff to the new mirror I noticed, in Disk Management, that the new mirror was "Re-synchronising".

I know that I CAN cleanly shutdown while my hardware mirrors are "verifying and rebuilding" etc and the process will continue when I switch on again.

My question is - can I do the same with Windows RAID? Can I cleanly shutdown while my Windows mirror is "Re-synchronising" and expect it to continue the next time I switch on? :shock: :think:

I don't like leaving my system switched on while I am not home so I am going to do the next best thing, I'm going to put it to sleep. Hopefully I can "wake" it when I get home? :geek:

veldthui · Jun 2, 2014

Also note that just because you are using RAID does not mean your data is safe and you still need backups. Found this the hard way when the directory structure of my RAID5 NAS drive got corrupted. The drives were all fine but the data was not recoverable without a huge expense

LMiller7 · Jun 2, 2014

It is very important to understand what RAID is and what it is not. This is often misunderstood, to the cost of those who use it.

The purpose of RAID 0 is performance. But while the benchmarks are impressive performance under real world conditions is more modest. Except for some specific types of data, not typical usage, the disadvantages are often more trouble than it is worth. One problem is that if one drive fails you loose everything in the RAID array. Don't even consider RAID 0 without a backup of everything on the drive. You may well need it. Very important.

The purpose of other forms of RAID is to maintain access to your data in the event of a drive failure. Drive replacement can then be deferred to a more convenient time. With appropriate hardware this can be done with no downtime at all. This is a big deal on a busy server where even a short downtime is very disruptive to the normal activity of the organization. That can be very expensive. But on a desktop the extra cost is often hard to justify. With RAID 1 that means 2 drives to provide the storage capacity of 1. RAID 5 is more efficient in terms of storage capacity but requires a minimum of 3 physical drives.

But this really must be understood. If you forget everything else, remember this: The purpose of RAID in any form is NOT to protect your data. That is what backups are for. No form of RAID ever devised is a valid backup solution. All files of any importance require at least 1 backup copy, 2 or more backup copies for files of particular importance. Neglect this to the peril of your data.

RAID provides protection only from drive failure, and even that cannot be relied on. For all other causes of data loss it offers no protection at all.

Use RAID if you consider it justified in your situation. But be very sure your understand the implications and are using if for the right reasons.

louwin · Jun 2, 2014

Everyone stresses how RAID is not a backup option. I keep 2 copies of ALL my data.... Everything....

I take your point but basically I have 3 choices....

1 mirroring
2 Syncing 2 drives on line
3 Syncing 2 drives, one online one offline after the sync

I do 1, option 2 is a bit safer but more inconvenient and 3 is totally inconvenient.

I'm not talking of backing a few files. I am talking terabytes..... I, in my infinite wisdom, have decided I am happy with mirroring.

Thank you for your opinions but I just want to know if the system will continue a "Re-sync" on software RAID on later switch on after a clean shutdown or do I have to allow the system to complete the "Re-sync" before I switch off?

louwin · Jun 17, 2014

No responses?

Anyway, I did shutdown while "Re-synchronising" and it continued correctly the next time I switched on so that answers MY original question

Anybody else using Software RAID on this forum?

My 4Tb SW RAID spends MOST of the time "Re-synchronising"

For instance, I noticed it was "Re-synchronising" on Sunday evening. Left it running all night and on Monday morning the mirror was "Healthy". Carried on using the system for a couple of hours. Didn't write to the mirror. Shutdown.

In the evening I switched back on and it started "Re-synchronising" again.

My understanding is a "write" happens to both physical discs at the time and that "Re-synchronising" should happen "once in a "blue moon"'"?

My HW RAID mirrors have been running for a couple of months (on my new system) with NO "Verifying" or "Rebuilding". On my old system they "Verified and rebuilt" half a dozen times in 3 years

Anybody else using SW RAID got any comments?

RAID explained (easy and basic...)

azasadny

Moved to ten*****s.com

My Computer

System One

JLyman

New Member

My Computer

System One

louwin

Member

My Computer

System One

veldthui

Member

My Computer

System One

LMiller7

Active Member

My Computer

System One

louwin

Member

My Computer

System One

louwin

Member

My Computer

System One