Thursday, August 27, 2015

What is Hadoop anyway?

Hadoop will change the way businesses think about storage, processing and the value of ‘big’ data.

Apache Hadoop is an open source project governed by the Apache Software Foundation (ASF). Hadoop enables the user to extract valuable business insight from massive amounts of structured and unstructured data quickly and cost-effectively through three main functions:

Processing – MapReduce. Computation in Hadoop is based on the MapReduce paradigm that distributes tasks across a cluster of coordinated “nodes.”

Storage – HDFS. Storage is accomplished with the Hadoop Distributed File System (HDFS) – a reliable file system that allows large volumes of data to be stored and accessed across large clusters of commodity servers.

Resource Management – YARN. Coming in Hadoop 2.0, YARN performs a resource management function further increasing efficiency and extends MapReduce capabilities by supporting non-MapReduce workloads such as Graph, Steaming, In-memory, MPI processing and more.

Hadoop is designed to scale up or down without system interruption and runs on commodity hardware making the capture and processing of big data economically viable for the enterprise.

“By 2017, I believe that 50% of the world’s data will be stored and analyzed by Apache Hadoop.” 

Followers