Page MenuHomePhabricator

[EMR 3/N] Spark refresh
Needs RevisionPublic

Authored by nate on Dec 27 2019, 2:09 AM.

Details

Reviewers
schrockn
Summary

This updates dagster-spark to provide @spark_solid, aligning APIs with @pyspark_solid.

In a follow-up diff, I'll add the EMR implementation of this (also based on the mrjob stuff) as I did with pyspark, but wanted to split up the diff into more reviewable chunks.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Branch
emr_spark
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

nate created this revision.Dec 27 2019, 2:09 AM
nate updated this revision to Diff 8193.Dec 27 2019, 2:37 AM

fix test failures

nate retitled this revision from [EMR 3/N] EMR Spark refresh to [EMR 3/N] Spark refresh.Dec 27 2019, 3:04 AM
nate edited the summary of this revision. (Show Details)
nate added a reviewer: schrockn.
schrockn requested changes to this revision.Dec 28 2019, 12:30 AM
schrockn added inline comments.
python_modules/libraries/dagster-spark/dagster_spark/configs.py
56

whats going on in this file?

python_modules/libraries/dagster-spark/dagster_spark_tests/test_decorators.py
44

should we be encoding these in config or in code? This seems like another spot where the config is more code than config. I would recommend encoding information like main class in the solid and use the body of the function to specify it, rather than config.

e.g.

@spark_solid
def first_pi():
   return { # could also do strong typed name tuples
      'main_class: 'some_class' 
   }
This revision now requires changes to proceed.Dec 28 2019, 12:30 AM