Page MenuHomePhabricator

Dask integration
ClosedPublic

Authored by natekupp on May 15 2019, 3:09 AM.

Details

Summary

Add Dask as an execution target

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
natekupp updated this revision to Diff 864.May 21 2019, 6:44 PM

Updating

natekupp added inline comments.May 21 2019, 6:46 PM
python_modules/dagster-dask/dagster_dask/engine.py
79 ↗(On Diff #864)

this parsing shouldn't happen here, but wanted to get eyes on the overall approach here before I go write a DagsterEvent parser

Harbormaster failed remote builds in B708: Diff 864!
natekupp updated this revision to Diff 879.May 21 2019, 9:44 PM

get things working

natekupp updated this revision to Diff 898.May 21 2019, 10:32 PM

Fix failing tests

natekupp retitled this revision from First cut at Dask integration to Dask integration.May 21 2019, 11:06 PM
natekupp edited the summary of this revision. (Show Details)
natekupp edited the test plan for this revision. (Show Details)
natekupp added reviewers: max, alangenfeld.
alangenfeld requested changes to this revision.May 22 2019, 3:40 PM

I dont see anything to object with here, i think its safe to button it up a bit and get it ready to land

python_modules/dagster-dask/dagster_dask/engine.py
97 ↗(On Diff #898)

well need a SolidHandle.from_id() or some such that splits on the . and reconstructs the parent hierarchy

79 ↗(On Diff #864)

ya we should be able to set up a test that ensures we have an example event for every entry in the event type enum, and that each of those examples can round trip through graphql correctly.

It might make sense to have a central GraphQL fragment that different executors can include in their query to ensure all the fields the centralized parser will need to reconstruct.

python_modules/dagster/dagster_tests/core_tests/test_pipeline_execution.py
310–355 ↗(On Diff #898)

?

This revision now requires changes to proceed.May 22 2019, 3:40 PM
natekupp updated this revision to Diff 916.May 22 2019, 8:11 PM

Add testing

alangenfeld requested changes to this revision.May 22 2019, 8:23 PM

spoke offline

python_modules/dagster-graphql/dagster_graphql_tests/test_util.py
127–185

this chunk should be provided as a fragment by dagster_graphql.util and used in both places

This revision now requires changes to proceed.May 22 2019, 8:23 PM
alangenfeld added inline comments.May 22 2019, 9:03 PM
python_modules/dagster-graphql/dagster_graphql/util.py
17–19

maybe namedtuple?

python_modules/dagster-graphql/dagster_graphql_tests/test_util.py
148–149

i think this is worth digging in to

natekupp updated this revision to Diff 926.May 22 2019, 9:30 PM

namedtuple

natekupp marked an inline comment as done.May 22 2019, 9:31 PM
natekupp added inline comments.
python_modules/dagster-graphql/dagster_graphql/util.py
17–19

good call, updated

python_modules/dagster-graphql/dagster_graphql_tests/test_util.py
148–149

I might pick up https://github.com/dagster-io/dagster/issues/1379 and take care of this by flipping to a consolidated API that uses pipeline run storage, vs. trying to hack this into executePlan

natekupp updated this revision to Diff 946.May 23 2019, 1:53 AM
natekupp marked an inline comment as done.

use variables

alangenfeld added inline comments.May 23 2019, 3:05 PM
python_modules/dagster-graphql/dagster_graphql_tests/test_util.py
148–149

you can use startPipelineExecution instead of executePlan in this test if that gives you run storage - as long as we are using the same fragment it should not matter which mutation we use

natekupp updated this revision to Diff 956.May 23 2019, 5:52 PM

Move to startPipelineExecution

python_modules/dagster-graphql/dagster_graphql_tests/test_util.py
148–149

Yup, moved to startPipelineExecution, now going to update tests to pair up dagster events

natekupp updated this revision to Diff 957.May 23 2019, 5:54 PM
natekupp marked 3 inline comments as done.

lint

natekupp updated this revision to Diff 960.May 23 2019, 6:34 PM

Add event counts to test

natekupp updated this revision to Diff 969.May 23 2019, 10:17 PM

Add tests and include skip events

Harbormaster failed remote builds in B793: Diff 970!
natekupp updated this revision to Diff 973.May 23 2019, 11:00 PM

Fix errors from rebasing

natekupp updated this revision to Diff 975.May 23 2019, 11:08 PM

Make variables a dict instead of roundtripping json

max added a comment.May 24 2019, 12:12 AM

i wonder if we should factor out the common stuff between this and dagster-airflow for a common out-of-process toolkit

python_modules/dagster-dask/dagster_dask/config.py
5

strong reason not to use the namedtuple pattern we often follow?

36

hmmm, 0.0.0.0? this seems brittle but if we're just using it for a seatbelt, ok

44

did this name change?

max accepted this revision.May 24 2019, 12:18 AM

I'd like to have a README warning people explicitly about the limitations here

natekupp updated this revision to Diff 983.May 24 2019, 4:57 AM
natekupp marked 2 inline comments as done.

comments

natekupp updated this revision to Diff 999.May 24 2019, 5:46 PM

Rebase and add README

natekupp updated this revision to Diff 1000.May 24 2019, 5:54 PM

Add dask to buildkite

This revision is now accepted and ready to land.May 24 2019, 6:10 PM
This revision was automatically updated to reflect the committed changes.