Hadoop is a open source framework software for distributed storage and processing on large clusters of commodity computers.
Let us check the parameters by which Hadoop is taking the world of data storage and processing.
Origins of Hadoop:
Dough Cutting and Mike Cafarella set out to improve Nutch(Apache product). What they needed, as a foundation of the system, was a distributed storage layer that satisfied the following requirements:
schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes).
durable once data is written it should never be lost.
capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB).
automatically rebalanced to even out disk space consumption throughout cluster.
Started in 2002 in Yahoo to then in Apache, Hadoop 1 stable version (Hadoop Map Reduce) released in April 2009 and Hadoop 2 version(Hadoop Yarn) released in May 2013.
Let us check the parameters by which Hadoop is taking the world of data storage and processing.
Parameter
|
Database
|
Hadoop
|
Volume
|
Limited data storage
|
Unlimited data storage
|
Variety
|
Only structured data
|
Structured + Unstructured data
|
Velocity
|
Low speed
|
High speed
|
Architecture
|
Designed on shared disc
architecture
|
Designed on shared nothing disc
architecture
|
Rule
|
Satisfy RDBMS Rules + ACID property
|
Developed on CAP Theorem
(Consistency, Availability, Partitions) |
Processing
|
Serial processing
|
Massive parallel processing
|
Schema
|
Designed on schema
|
Schema free approach
|
Cost
|
Software is costly
|
Software is free
|
Integration Services are licenced
|
Integration Services are free
|
Origins of Hadoop:
Dough Cutting and Mike Cafarella set out to improve Nutch(Apache product). What they needed, as a foundation of the system, was a distributed storage layer that satisfied the following requirements:
schemaless with no predefined structure, i.e. no rigid schema with tables and columns (and column types and sizes).
durable once data is written it should never be lost.
capable of handling component failure without human intervention (e.g. CPU, disk, memory, network, power supply, MB).
automatically rebalanced to even out disk space consumption throughout cluster.
Started in 2002 in Yahoo to then in Apache, Hadoop 1 stable version (Hadoop Map Reduce) released in April 2009 and Hadoop 2 version(Hadoop Yarn) released in May 2013.
Post a Comment
Post a Comment
Thanks for your comment !
I will review your this and will respond you as soon as possible.