Due Wed Oct 21 at 11:59pm in Github.
Project demos on Thu Oct 22 and Tue Oct 27.
You must work in groups of two.
For this project you are going to implement the reliable, distributed MapReduce framework in Python using ZeroRPC and Gevent. In this project you will develop distributed code and tests that will validate your implementation.
1. Start the master:
$ python mr_master.py <port> <data_dir>
The data_dir is the location in the file system where input files can be found and where resulting output files will be placed.
2. Start the workers (locally or remotely)
$ python mr_worker.py <ip_address_master:port> [<ip_address_worker:port>]
The workers will register with the master.
3. Start a MapReduce job:
$ python mr_job.py [<name> | <mr_class.py>] <split_size> <num_reducers> [<input_filename> | <base_filename>_] <output_filename_base>
$ python mr_job <ip_address_master:port> wordcount 100000 4 book.txt count
This will result in running wordcount across all workers. There will be 4 output files total: count_00, count_01, count_02, count_03
And for sequential execution (no master or workers):
$ python mr_seq.py [<name> | <mr_class.py>] <split_size> <num_reducers> [<input_filename> | <base_filename>_] <output_filename_base>
4. Collect results from workers:
$ python mr_collect.py <filename_base> <output_filename>
$ python mr_collect.py count count_all
Will collect the distributed results and put them into a file call called count_all. You need to ensure that ordering in the output file is preserved. You may consider turning this in a MapReduce program with one reducer on the local machine.
You need to write and demonstrate the execution of the following MapReduce programs:
You also need to write automated benchmarking and testing code:
You will be grading on the completeness of you solution and you ability to demonstrate the required functionality.
You can receive extra credit for implementing the following features
You can also propose your own extra credit.
Points will be determined by the completeness and quality of the extra credit feature. Please get all the standard features working before attempting the extra credit.