I’ve used LinkedIn’s Kafka->HDFS pipeline Camus. Unfortunately the generated HDFS files are too small (something about 20k to 4m) in my case. That small files are a killer for MapReduce jobs running afterwards. Their processing time was up to 5 hours per job. Continue reading
The post Camus Sweeper appeared first on FaKod's Blog.