HDFS Erasure Coding and RAID basics

HDFS Replication is expensive – the default 3x replication scheme in HDFS has 200% overhead in storage space and other resources (e.g., network bandwidth). However, for warm and cold datasets with relatively low I/O activities, additional block replicas are rarely accessed during normal operations, but still consume the same amount of resources as the first replica.

Therefore, a natural improvement is to use Erasure Coding (EC) in place of replication, which provides the same level of fault-tolerance with much less storage space. In typical Erasure Coding (EC) setups, the storage overhead is no more than 50%. Replication factor of an EC file is meaningless. It is always 1 and cannot be changed via -setrep command.

In storage systems, the most notable usage of EC is Redundant Array of Inexpensive Disks (RAID). RAID implements EC through striping, which divides logically sequential data (such as a file) into smaller units (such as bit, byte, or block) and stores consecutive units on different disks.

Integrating EC with HDFS can improve storage efficiency while still providing similar data durability as traditional replication-based HDFS deployments. As an example, a 3x replicated file with 6 blocks will consume 6*3 = 18 blocks of disk space. But with EC (6 data, 3 parity) deployment, it will only consume 9 blocks of disk space.

Parity computations are used in RAID drive arrays for fault tolerance by calculating the data in two drives and storing the results on a third. The parity is computed by XOR'ing a bit from drive 1 with a bit from drive 2 and storing the result on drive 3 (to learn about XOR, see OR). After a failed drive is replaced, the RAID controller rebuilds the lost data from the other two drives. RAID systems often have a "hot" spare drive ready and waiting to replace a drive that fails. See RAID.

An exclusive OR (XOR) is true if only one of the inputs is true, but not both.

RAID is a disk or solid state drive (SSD) subsystem that increases performance or provides fault tolerance or both. In the past, RAID was also accomplished by software only but was much slower. In the late 1980s, the "I" in RAID stood for "inexpensive" but was later changed to "independent."

RAID 0 - Striping for Performance (Popular)

Widely used for gaming, striping interleaves data across multiple drives for performance. However, there are no safeguards against failure.

The more drives in a RAID 0 array, the higher the probability of array failure.

RAID 1 - Mirroring for Fault Tolerance (Popular)

Widely used, RAID 1 writes two drives at the same time. It provides the highest reliability but doubles the number of drives needed.

RAID 10 combines RAID 1 mirroring with RAID 0 striping for both safety and performance.

The more drives in a RAID 1 array, the lower the probability of failure.

RAID 3 - Speed and Fault Tolerance

Data are striped across three or more drives for performance, and parity is computed for safety. RAID 3 achieves the highest data transfer rate because all drives operate in parallel. Using byte level striping, parity bits are stored on separate, dedicated drives.

RAID 5 - Speed and Fault Tolerance (Popular)

Data are striped across three or more drives for performance, and parity is computed for safety. RAID 5 is similar to RAID 3, except that the parity is distributed to all drives.

HDFS Erasure Coding and RAID basics

Post a Comment

Post a Comment

Contact Form

HDFS Erasure Coding and RAID basics

You might like

Post a Comment

Post a Comment

Contact Form