Friday, June 28, 2013

Apache Hadoop YARN : Next Generation MapReduce

MapReduce has a complete transformation in hadoop-0.x and now we have MapReduce v2 or YARN

Main inspiration behind development of MapReduce v2 that is YARN is to divide major functionality of JobTracker that resource management and job scheduling/monitoring into a  separate daemons. MapReduce v2 have a global resource management(RM) and Application Master per application(single client job or job workflows)

The Resource-Manager(RM) has authority to control over the Node-Manager(NM), the per-node slave and co-ordinates resources among all the applications in the system. The Application-Master(AM) is the framework, has a responsibility coordinating with Resource-Manager for resources negotiation and Node-Manager to execute and monitor the tasks.

As MapReduce v2 has two core responsibilities i.e.  resource management and job scheduling/monitoring so Resource-Manager(RM) have two core components, Scheduler and Applications-Manager

Scheduler is responsible for allocating execution time slots and resources to the various running applications as per the requirements/configurations, the Scheduler is pure Scheduler, it does not perform monitoring or status tracking of the application. The Scheduler performs its scheduling function as per the resource requirements of the applications; it does it through resource Container which examines elements such as memory, cpu, disk, network etc. 

Application-Manager(AM) is responsible for accepting the jobs, negotiating with Container for executing the application specific Application-Master and restarting the Application-Master Container on application failure or hardware failure. The Node-Manager is the per slave machine agent who is responsible for Containers, monitoring their resource usage and reporting the same to the Resource-Manager. The per-application Application-Master has the responsibility of negotiating appropriate resource Containers from the Scheduler, tracking their status and monitoring for progress.

MapReduce v2 jobs are compatible with all previous stable releases means all previous jobs will run on MapReduce v2 just need to recompile.


Monday, June 24, 2013

Fraud Detection and Risk Prediction in the Era of Bigdata

Fraud detection and Risk predictions is a multi-million dollar business and it is increasing proportionally every year. As mentioned on Wikipedia,  the PwC global economic crime survey of 2009 suggests that close to 30% of companies worldwide have reported being victims of fraud in the past year. 

Traditional methods of data analysis and mining have long been used to detect fraud. They require too complex architecture and time-consuming computations that deal with different domains like financial, economics and business practices, and still the results produces are not that much accurate  Fraud often consists of many instances or incidents involving repeated offences using the same method. Fraud instances can be similar in content wise and appearance wise but usually are not identical.

How exactly Bigdata helps to find out the Fraud or to predict most likely risk factors?
There are thousands of data sources with too large volumes and varieties, which are ignored by the traditional fraud analysis techniques and methods in short termed as Bigdata includes social media, transaction logs, application logs, weblogs,  geographical data etc.

For an example: A guy who has taken loan from bank say 1,00,000 with returning monthly installment of 10,000. He regularly paid installments of first four months as per policy after that he unable to pay remaining installments as unavailability of funds, But he is posting his new car, or new home or foreign trip pics on twitter. The guys who is already defaulter in banks record because of unavailability of funds and keeps posting a photos his new car on twitter or facebook. So bank officials can take immediate action on it without waiting for fraud to be happen.

Second example is like, A person whose is living in India, keeps/tries withdrawing money from Delhi, NewYark, Londan, Paris everyday, we can find out his geolocation history using google maps and  will compare with transaction location, resulting into immediate action.

There are many more use cases with bigdata to find out fraud and risk analysis, Advantage of using bigdata over traditional systems is most important is high accuracy towards results and most likely predictions, ultimately because of huge data, high accuracy and likely predictions are directly proportional to the size and sources of data.

Nowadays we have technology which can take over the bigdata analytics nearly real time, without wasting much time in computations and calculations, so action can be taken prior fraud to be happen. High performance analytics is just an technology fad, With new distributed computing options like Hadoop and in-memory processing on commodity hardware, insurers can have access to a flexible and scalable real-time big data analytics solution at a reasonable cost.

Saturday, June 15, 2013

What people really thinks about Bigdata?

How much do you think people are aware of bigdata world and its advantages and disadvantages, or they are just aware of it, don't know how to use it? Bigdata analytics is really a hell? Bigdata is playing a role of hero or villain in our day today life?

Yes, these are the some headlines I found on internet while I was studying for bigdata analytics. Is that bigdata analysis is really difficult job? As per my experience I dint found such hardness and difficulties while going through. "If you know how to create a bigdata, then you should know how to bring business values out of it" this is the simple line I'm following.

Just think of end user perspective, you will get known many more dimensions and directions to analyse bigdata, do it and get successful in bigdata era.

being a simple end user is not that much difficult task I think so:) 

Tuesday, June 4, 2013

Bigdata and Business Verticals

As we are an active part of Bigdata ecosystems, where our day to day lifestyle and activities are responsible for data generation, and systems around us can collect the data, analyse it and consume it for their business to help our lifestyle. Nowadays world gets too much interconnected because of internet and mobile devices as never been in history, each day we are creating about 2.5 quintillion( 2.5×1018) of data, its huge amount created by different verticals in the industry, This verticals using this massive amount of information to rise above the business cloud. But before using this such huge amount of information industry must aware of the real time business scenarios, in short 'Usecases' to implement the solution for analysis of Bigdata.

We'll focus on some industry key verticals/domains which are using or most likely to use Bigdata analysis. Below are the some Bigdata value creation opportunities.

Financial Services:
-Fraud Detect
-Model and manage risk
-Improve debt recovery rates
-Personalized banking and insurance products
-Recommendation of banking products

Retail and Consumer Packaged Goods Industry:
-Customer Care Call Centers
-Customer Sentiment Analysis
-Campaign management and customer loyalty programs
-Supply Chain Management and Logistics
-Window Shoppers
-Location based Marketing
-Predicting Purchases and Recommendations

Manufacturing Industry:
-Design to value
-Consumer Sentiment Analysis
-Supply Chain Management and Logistic
-Preventive Maintenance and Repairs
-Digital factory for lean manufacturing
-Improve service via product sensor data

-Optimal treatment pathways
-Remote patient monitoring
-Predictive modeling for new drugs
-Personalized medicine
-Patient behavior and sentiment data
-Pharmaceutical R&D data

Web/Social/Mobile Industry:
-Location based marketing
-Social segmentation
-Sentiment analysis
-Price comparison services
-Recommendation engines
-Advertisements/promotions and Web Campaigns

-Reduce fraud
-Segment population, customize action
-Support open data initiatives
-Automate decision making
-Election Campaigns

Data growth in each section of each vertical is viral, speed of data generation is tremendous so needed a Bigdata capability for addressing such business problems, get ready soon and make your business to capable to hit big elephant of information.

Monday, June 3, 2013

Bigdata : Impact on day to day life

Would Bigdata really impact on our day to day life? If you asked this question 10 years before, the answer  might be No, but nowadays if you going for shopping to any mall, Google maps are tracking you, your home, you rout towards a mall and suggests the similar malls near to you. You reached to mall and  went to the mobile store, shop cameras are watching you, in which section you are spending more time and suggest you similar section to shop, Now you picked up a any gadget, they will calculate your interest and recommend you the gadgets with similar features and functionalists with discounts. (As they also want to grow up with their business:) ). Result leaving from home you decided for a-gadget and you b-gadget actually because of attractive offer on it.

From healthcare, to sports, from retails stores to the e-banking, from the business to the social networking, to the way we used to go for office, big data will making big changes to the way we live our lives. Specially internet is getting more and more importance to everyones life everyday, everyone is like to sharing his information on social site and social networking sites are becoming very popular for Business world. Businesses are becoming more and more consumer centric with the help of social networking and easily available information. Businesses are using this information to find out the customer trends and business out of it. Think of this we get an reason why E-Commerce businesses are getting more and more popularity these day. How weather forecasting is always being correct, Why healthcare programs are getting arranged in particular days of year, How fraud is detected in bank between millions of transactions per day. 

This is all about bigdata, we are surrounded by it as we are responsible for generating it and Businesses are just using it for their purpose to help us, ultimately both get benefited, We are happy because of we get better and  convenient solution even if we dint thought about it and Its impacting directly to Annual Revenue of Businesses.