Tuesday, December 20, 2016

Apache Spark, Apache Flink & Apache Strom

Apache Spark: Apache Spark is a batch processing engine that emulates streaming via microbatching. It has a well developed ecosystem and incorporates besides a Scala and Java API a Python and R library as well. Apache Spark very well integrates with Apache Hadoop Ecosystem components.

Apache Flink: Apache Flink is streaming dataflow engine. It can be programmed in Scala and Java. You can emulate batch processsing, however at its core it is a native streaming engine. Flink shines by features under the hood, such as exactly once fault management, high throughput, automated memory management and advanced streaming capabilities, Apache Flink also very well integrates with Apache Hadoop Ecosystem.

Apache Storm: Is a technology created by Nathan Marz. Compared to Flink and Spark, it has a compositional API. Meaning you build up your own topology with basic building blocks like sources or operators(spouts and bolts) and they must be tied together in order to create topology(program flow).

For more detailed comparison for all other streaming and batch processing frameworks, drop me a reply here, I'll try to reply as earliest.