alright this felt like a reasonable stopping point to pause and get feedback before continuing.
This diff introduces a pyspark EMR deployment, which can be produced as either (1) a zip of a folder/set of python files to stash on the PYTHONPATH on the EMR cluster, or (2) an sdist of a Python module, which will similarly be installed on the PYTHONPATH on the cluster.
Not yet covered:
- As discussed on zoom w/ Alex, rethink using the selector to choose
- In both cases, we should install stuff into a virtualenv instead of the default system python
- Need to handle requirements for a module install