1 | Language | Backer | Purpose | Reliable? | Ordered? | pub/sub? | Content transformable? | HDFS Integration? | Host discovery? | Open Source? | Active Community? | Link | Integrates w/ Storm? | Notes | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Kafka | Scala (JVM) | Distributed message queue / log aggregation / activity stream | yes* | no | yes | yes | yes | yes | yes | yes | http://incubator.apache.org/kafka/ | https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka | * Relies on OS file cache for persistance, so durability is dependent on filesystem. | ||
3 | Scribe | C++ | Scalable and reliable file logging | yes | no | no | no | yes | no+ | yes | no | https://github.com/facebook/scribe | https://github.com/nathanmarz/storm-contrib/tree/master/storm-scribe | + Twitter has done some work for remote discovery: https://groups.google.com/forum/?fromgroups#!msg/scribe-server/zjHc6B7sF94/3F3iDjtM0MYJ | ||
4 | Flume | Java | Cloudera | HDFS logging | yes* | no | no | yes | yes | yes | yes | yes | http://incubator.apache.org/flume/ | no(?) | * Reliability level is configurable. | |
5 | Kestrel | Scala (JVM) | Message Queue | yes | no | yes | no | no | no | yes | yes | http://robey.github.com/kestrel/ | https://github.com/nathanmarz/storm/wiki/Kestrel-and-Storm | |||
6 | Fluentd | Ruby | Log collection | yes | no? | no | yes | yes | no | yes | yes | http://fluentd.org/ | no | |||
7 | RabbitMQ | Erlang | VMWare | Message Queue | yes | yes | yes | yes | no | no? | yes | yes | http://rabbitmq.com/ | https://github.com/Xorlev/storm-amqp-spout | ^ Order requirements are configurable | |
8 | Hedwig/BookKeeper | Java | Yahoo | Distributed write ahead log, cross data center pub/sub logging | yes | yes | yes | no | no | yes | yes | no | http://zookeeper.apache.org/bookkeeper/ | no | ||
9 | ActiveMQ | Java | Message Queue | yes | yes^ | yes | yes | yes | http://activemq.apache.org/ | no(?) | ^ Order requirements are configurable | |||||
10 | Calligraphus | Java | Pub/Sub HDFS log aggregation | yes? | no | yes | no | yes | yes | no | no | no | Works like a pubsub on top of HDFS. HDFS needs custom patches for this to work (sync(), concurrent file reads). Not openly available. | |||
11 | Chukwa | Java | Yahoo | Log aggregation + batch processing analysis | yes* | yes^ | no | no (not in stream, only in batch) | yes | no? | yes | http://incubator.apache.org/chukwa/ | no | * Durability level is configurable. ^ Order is configurable by using sequence numbers(?) | ||
12 | ZeroMQ | C++ | iMatrix | concurrency framework / fancy networking library | http://zeromq.org/ | (otto) Too general, this is fancy networking library (dsc) I dunno about that. http://www.zeromq.org/whitepapers:brokerless (otto) Yea but we'd have to implement any of that ourselves. And isn't what they are describing there basically ZooKeeper? (dsc) I mean, yea. You're right. Wasn't really suggesting it, just surprised they have a paper on brokers. |
1 | Kafka | Scribe | Flume | Kestrel | Fluentd | RabbitMQ | Hedwig/BookKeeper | ActiveMQ | Calligraphus | Chukwa | ZeroMQ | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Language | Scala (JVM) | C++ | Java | Scala (JVM) | Ruby | Erlang | Java | Java | Java | Java | C++ | |
3 | Backer | Cloudera | VMWare | Yahoo | Yahoo | iMatrix | |||||||
4 | Purpose | Distributed message queue / log aggregation / activity stream | Scalable and reliable file logging | HDFS logging | Message Queue | Log collection | Message Queue | Distributed write ahead log, cross data center pub/sub logging | Message Queue | Pub/Sub HDFS log aggregation | Log aggregation + batch processing analysis | concurrency framework / fancy networking library | |
5 | Reliable? | yes* | yes | yes* | yes | yes | yes | yes | yes | yes? | yes* | ||
6 | Ordered? | no | no | no | no | no? | yes | yes | yes^ | no | yes^ | ||
7 | pub/sub? | yes | no | no | yes | no | yes | yes | yes | yes | no | ||
8 | Content transformable? | yes | no | yes | no | yes | yes | no | no | no (not in stream, only in batch) | |||
9 | HDFS Integration? | yes | yes | yes | no | yes | no | no | yes | yes | |||
10 | Host discovery? | yes | no+ | yes | no | no | no? | yes | yes | no? | |||
11 | Open Source? | yes | yes | yes | yes | yes | yes | yes | yes | no | yes | ||
12 | Active Community? | yes | no | yes | yes | yes | yes | no | yes | no | |||
13 | Link | http://incubator.apache.org/kafka/ | https://github.com/facebook/scribe | http://incubator.apache.org/flume/ | http://robey.github.com/kestrel/ | http://fluentd.org/ | http://rabbitmq.com/ | http://zookeeper.apache.org/bookkeeper/ | http://activemq.apache.org/ | http://incubator.apache.org/chukwa/ | http://zeromq.org/ | ||
14 | Integrates w/ Storm? | https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka | https://github.com/nathanmarz/storm-contrib/tree/master/storm-scribe | no(?) | https://github.com/nathanmarz/storm/wiki/Kestrel-and-Storm | no | https://github.com/Xorlev/storm-amqp-spout | no | no(?) | no | no | ||
15 | Notes | * Relies on OS file cache for persistance, so durability is dependent on filesystem. | + Twitter has done some work for remote discovery: https://groups.google.com/forum/?fromgroups#!msg/scribe-server/zjHc6B7sF94/3F3iDjtM0MYJ | * Reliability level is configurable. | ^ Order requirements are configurable | ^ Order requirements are configurable | Works like a pubsub on top of HDFS. HDFS needs custom patches for this to work (sync(), concurrent file reads). Not openly available. | * Durability level is configurable. ^ Order is configurable by using sequence numbers(?) | (otto) Too general, this is fancy networking library (dsc) I dunno about that. http://www.zeromq.org/whitepapers:brokerless (otto) Yea but we'd have to implement any of that ourselves. And isn't what they are describing there basically ZooKeeper? (dsc) I mean, yea. You're right. Wasn't really suggesting it, just surprised they have a paper on brokers. |
1 | Kafka | Scribe | Flume | ||
---|---|---|---|---|---|
2 | Language | Scala (JVM) | C++ | Java | |
3 | Backer | Cloudera | |||
4 | Purpose | Distributed message queue / log aggregation / activity stream | Scalable and reliable file logging | HDFS logging | |
5 | Reliable? | yes* | yes | yes* | |
6 | Ordered? | no | no | no | |
7 | pub/sub? | yes | no | no | |
8 | Content transformable? | yes | no | yes | |
9 | HDFS Integration? | yes | yes | yes | |
10 | Host discovery? | yes | no+ | yes | |
11 | Open Source? | yes | yes | yes | |
12 | Active Community? | yes | no | yes | |
13 | Link | http://incubator.apache.org/kafka/ | https://github.com/facebook/scribe | http://incubator.apache.org/flume/ | |
14 | Integrates w/ Storm? | https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka | https://github.com/nathanmarz/storm-contrib/tree/master/storm-scribe | no(?) | |
15 | Notes | * Relies on OS file cache for persistance, so durability is dependent on filesystem. | + Twitter has done some work for remote discovery: https://groups.google.com/forum/?fromgroups#!msg/scribe-server/zjHc6B7sF94/3F3iDjtM0MYJ | * Reliability level is configurable. |