Getting Creative with MapReduce

One problem with many existing MapReduce abstraction layers is the utter difficulty of testing queries and workflows. End-to-end tests are maddening to craft in vanilla Hadoop and frustrating at best in Pig and Hive. The difficulty of testing MapReduce workflows makes it scary to change code, and destroys your desire »

Simple Hadoop Clusters

I'm excited to announce Pallet-Hadoop [https://github.com/pallet/pallet-hadoop], a configuration library written in Clojure for Apache's Hadoop [http://hadoop.apache.org/]. In the tutorial, we're going to see how to create a three node Hadoop cluster on EC2, and run a word »

No more posts