Page MenuHomePhabricator

[EMR 1/N] Extract EMR implementation
ClosedPublic

Authored by nate on Sun, Dec 1, 5:50 AM.

Details

Summary

Adds an EMR implementation based on Yelp's mrjob https://github.com/Yelp/mrjob - which is far more battle-tested than my previous impl.

In subsequent diffs, will pull in other parts of mrjob where it makes sense e.g. for synthesizing step definitions for pyspark/spark/etc. jobs on EMR.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

nate created this revision.Sun, Dec 1, 5:50 AM
nate edited the summary of this revision. (Show Details)Sun, Dec 1, 6:07 AM
nate added reviewers: schrockn, alangenfeld, max.
nate updated this revision to Diff 7025.Sun, Dec 1, 6:28 AM
nate edited the summary of this revision. (Show Details)

rebuild

alangenfeld requested changes to this revision.Mon, Dec 2, 9:54 PM

unclear whats from mrjob and whats not - sending to your queue for another clean up pass. seems overall fine though

python_modules/libraries/dagster-aws/dagster_aws/emr/emr.py
18–19

should we default this?

55

is this our issue #?

199–200

?

203

?

python_modules/libraries/dagster-aws/dagster_aws/vendor/retry.py
2 ↗(On Diff #7025)

not a huge fan of vendor directory name

This revision now requires changes to proceed.Mon, Dec 2, 9:54 PM
nate marked 5 inline comments as done.Mon, Dec 2, 10:56 PM
nate added inline comments.
python_modules/libraries/dagster-aws/dagster_aws/emr/emr.py
18–19

yeah, on further thought I think we shouldn't. mrjob defaults to us-west-2 but I would prefer to force the user to explicitly choose. Removed

55

ugh, copypasta. removed.

199–200

yeahhhhh I didn't port over mrjobs log-capturing machinery because it is pretty tightly coupled to a bunch of their other systems, e.g. file system abstractions. and this bootstrap action check parses the logs and decides if that caused the failure.

TL;DR; going to remove these two lines for now and will file an issue to deal with bootstrap failures later

203

same as above

python_modules/libraries/dagster-aws/dagster_aws/vendor/retry.py
2 ↗(On Diff #7025)

discussed on slack, utils/mrjob

nate updated this revision to Diff 7053.Mon, Dec 2, 10:56 PM
nate marked 5 inline comments as done.

up

schrockn added inline comments.Tue, Dec 3, 6:57 PM
python_modules/libraries/dagster-aws/dagster_aws/emr/solids.py
40

why do we have this both here and in the resource?

python_modules/libraries/dagster-aws/dagster_aws/utils/mrjob/retry.py
37

This class is a bit bananas.

I'd much rather have a more functional eventually that is more like but this current works

def _core_fn(some_arg):
    # do something that might fail

ret_val_of_core = with_retry(lambda: _core_fn(some_arg),  backoff=blah, multiplier=2, ...)
37

to be clear no action to take here just noting it

python_modules/libraries/dagster-aws/dagster_aws/utils/mrjob/utils.py
59

this function is epic

64

lol

max requested changes to this revision.Tue, Dec 3, 9:25 PM

requesting changes just for the apache 2 license link in emr.py

python_modules/libraries/dagster-aws/dagster_aws/emr/emr.py
2

i'd like those portions to be clearly marked and for us to link to the apache 2.0 license here

27

this makes me uneasy -- is there a good reason not to feed the execution context through to the methods on EMRJobRunner and use context.log.*?

30

EmrJobRunner pls

199–200

reasonable to add an issue for this and link in the code?

209

or set explicit roles in your config, etc

python_modules/libraries/dagster-aws/dagster_aws/emr/resources.py
20

EmrPySparkResource pls

23

doesn't this reproduce the functionality of EMRJobRunner.make_emr_client?

This revision now requires changes to proceed.Tue, Dec 3, 9:25 PM
nate marked 12 inline comments as done.Thu, Dec 5, 8:50 PM
nate added inline comments.
python_modules/libraries/dagster-aws/dagster_aws/emr/emr.py
2

good call. probably too tough to call out which parts are theirs and which are ours, but I will include Apache license and copyright notices here which will ensure Apache compliance

27

sounds good, updated throughout

30

done, thx

199–200

Yes done

209

I'm replacing this warning entirely because that link doesn't even work anymore anyway

python_modules/libraries/dagster-aws/dagster_aws/emr/resources.py
20

thx done

23

good catch, thanks - this should have been removed and now is removed

python_modules/libraries/dagster-aws/dagster_aws/emr/solids.py
40

mainly keeping this solid around to avoid breaking changes before 0.7.0, I plan to remove this in 0.7.0 in favor of just providing the resource.

a couple of other differences to note: (1) this solid uses run_job_flow vs. add_job_flow_steps, (2) this is for generic job flows vs. the pyspark-specific resource

python_modules/libraries/dagster-aws/dagster_aws/utils/mrjob/utils.py
64

it took more characters to write the comment about the cute string prefix than to just include the real strings in the line of code below

nate updated this revision to Diff 7123.Thu, Dec 5, 8:51 PM
nate marked 9 inline comments as done.

address comments

alangenfeld resigned from this revision.Fri, Dec 6, 12:23 AM
max accepted this revision.Tue, Dec 10, 7:19 PM
max added inline comments.
python_modules/libraries/dagster-aws/dagster_aws/emr/solids.py
40

this makes sense to me, my only question is whether we want to just land the breaking change, maybe in a quick follow-on diff (we can cherry-pick for patch releases before 0.7.0)

python_modules/libraries/dagster-aws/dagster_aws_tests/emr_tests/test_pyspark.py
81

if i understand this correctly, we have to do this here because resource setup/teardown is improperly scoped atm?

This revision is now accepted and ready to land.Tue, Dec 10, 7:19 PM
nate added inline comments.Tue, Dec 10, 11:38 PM
python_modules/libraries/dagster-aws/dagster_aws_tests/emr_tests/test_pyspark.py
81

nah, this is because run_job_flow isn't plumbed through to a resource yet - I want to think a little more about how best to do that before proceeding. but adding a comment to the code to explain this.

nate updated this revision to Diff 7540.Tue, Dec 10, 11:38 PM

rebase/comment