Thursday, July 11, 2013

Apache Hadoop: Solution for Bigdata

Nowadays “Bigdata” is the most hitting word all over the business world, peoples are not just talking about bigdata but finding business out of it. What exactly bigdata is? Simplest definition of bigdata is nothing but a data comes with high velocity with different varieties and huge volumes. The purpose of publishing this paper to not just to talk about bigdata but how to integrate bigdata in our current solution, how to find more business insights around the bigdata and hidden bigdata dimensions around your business. 
Apache Hadoop is the open source framework provided by Apache foundation to deal with bigdata, the power of Apache Hadoop is to provide cost efficient and effective solution to businesses for focusing more on exactly what matters: extracting business values from bigdata. In this paper we will be addressing more about the technical details about Hadoop Ecosystem architecture and integration with real time application to process and analysis and to find out the various hidden dimensions of bigdata, which helps our business to grow up.

Apache Hadoop as a Team:
Consider a regular scenario; you have a project team, one project manager and ten resources under him. 
If a client comes to your project manager and asked him to sort out the ten files, each file of 100 pages record.  What will be best approach your project manager will follow? 
Exactly! what you are thinking is right, Project manager will distribute the ten files among ten resources and keep the only record track with him. This approach will reduce to work load about 1/10th, ultimately increases speed and efficiency. 

Hadoop Team Structure:
This is what hadoop is, data storage and processing team. Hadoop has data storage and processing components. Hadoop follows master-slave architecture 

Physical structure of Hadoop cluster is same as above project team we have a Manager called namenode and team members called datanodes and Data storage is the responsibility of  datanodes(slaves), controlled by name node at master level and data processing is the responsibility of task tracker(slave) and controller over task tracker is job tracker at master level.

You can see in the diagram and do map with the project team that you have already and see how interesting it isTry to map everything with the real world things you can find many possible ways and solutions out of it.