Projects‎ > ‎


MicroSpark: A Spark Subset and Extension Research

Due: TBA

You must work in groups of three or four.

You goal is to implement a subset of Spark using Python, Gevent, and ZeroRPC.

MicroSpark Subset (50%)

Your MicroSpark implementation should include the following:
  • Driver and worker support
  • Dynamic creation of workers
  • Script and interactive usage
  • RDD implementation
  • Closure and code shipping support
  • Fault tolerance of workers
  • Ability to run the following applications:
    • Word Count
    • Page Rank
    • Interactive queries on log files
  • Performance results

Extension Research (50%)

For the second half of the project your group needs to propose a MicroSpark extension that you implement and test. This is open ended, but you want to come up with functionality not currently supported in Spark and test your idea in MicroSpark.

You will need to send me your proposals by Friday, November 20th, 2015.


  • A MicroSpark implementation with applications
  • Extension Research
  • Performance results
  • Failure testing
  • A research paper with design, implementation, and results.
  • A final presentation with slides

Greg Benson,
Nov 12, 2015, 5:59 PM