Friday, May 24, 2013

NoSQL brings Hadoop Live

Hadoop designed for processing the large amount of data, with different varieties and with high processing speed, but not really real time, there is some latency in the hadoop response and actual real time application request. Integration of hadoop with real time application is the more tedious and complex and of course the most important job. If we have capability of large data storage and process but we are not able to access it real time so that is of no use.
Previously for this integration, Apache HttpFS and Hoop were used but as per time goes that new WebHDFS(RESTful) services becomes active to access HDFS(Hadoop Distributed File System) over HTTP or similar protocols. 

Its a era of NoSQL databases, which are replacing the current and traditional RDBMS systems because of many more advantages over them. "NoSQL" database are designed to deal with huge amount of data in short "Bigdata", when the data is in the any form, doesn't requires a relational model, may or may not be structured, but the NoSQL is used only when there is data storage and retrieval matters not the relationship between the elements

Now think what happens when two bigdata handling giants come together and what will be their power together. We can use hadoop with NoSQL database to respond real time application.

Hadoop-NoSQL Integration with Realtime Application

In above architecture diagram you can see the frontend application can communicate with the NoSQL database (As we are replacing RDBMS with NoSQL DB) and Hadoop can Integrate with the NoSQL database, Hadoop can take a input data from NoSQL database does the processing and again stores the output data into the NoSQL database, so frontend application can easily access the processed data on UI. It is as it is simple. Here is mmion complex part is to access NoSQL data into the hadoop jobs.

Nowadays many NoSQL database provides connectors with Hadoop (e.g. MongoDB-Hadoop Connector) so we can easily get data from and stores data into the NoSQL database from hadoop jobs. 

Even we can generate a BI reports from bigdata, like we can import database tables (structured) and Application logs (unstructured) into HDFS from ETL jobs as a Hive/HBase tables using Sqoop/Flume and then we have BI connectors available to integrate with HDFS/Hive/HBase so we can generate business reports from bigdata.