Page MenuHomeElementl

Move user-code-dependent executor validation to execution plan generation, rather than the run worker
ClosedPublic

Authored by dgibson on Apr 5 2021, 3:39 PM.

Details

Summary

Continuing the work of moving logic that requires loading definitions in the run worker to execution plan generation.

Test Plan

BK (verify that leaving this out breaks tests)
launch a multiprocess run in dagit without setting an intermediate storage - now pops up an error right away rather than waiting until the run worker starts

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 4:00 PM
Harbormaster failed remote builds in B28432: Diff 34889!

OK, this balooned a bit, as these changes are wont to do. We need to be able to load the Executor itself within the plan generation, which requires passing the instance down into ExecutionPlan.build and create_execution_plan, which kind of spreads throughout the system in lots of places.

dgibson retitled this revision from Move persistent storage requirement check to execution plan generation, rather than in the run worker to Move user-code-dependent executor validation to execution plan generation, rather than the run worker.Apr 5 2021, 6:39 PM
Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 7:01 PM
Harbormaster failed remote builds in B28443: Diff 34903!
Harbormaster returned this revision to the author for changes because remote builds failed.Apr 5 2021, 9:15 PM
Harbormaster failed remote builds in B28448: Diff 34911!
dgibson published this revision for review.Apr 5 2021, 9:39 PM

probably 80% of this diff is passing DagsterInstances around so that we can access it during plan creation to validate the executor, I could split that part out into a separate diff if that would make it easier to review

python_modules/dagster/dagster/core/execution/plan/plan.py
915–925

oof - all this threading of instance to construct and discard an Executor.

What about a property on ExecutorDefinition? One of the few custom executors I know have been built is in process so likely wouldn't want to enforce these constraints

to your Q for question

This revision now requires changes to proceed.Apr 5 2021, 9:57 PM

enum on the ExecutorDefinition

@alex you have accepted the two follow-on diffs to this but the base of the stack remains :) ExecutorProcessSetting giving you pause?

ExecutorProcessSetting giving you pause?

Yea I was giving my self some time to think on it since theoretically this is public API surface area that we cant change easily.

  • The framing of "process setting" doesn't feel super intuitive to me
  • Theres a fundamental question of whether we go this direction of trying to capture some broad property of the executor and enforce requirements based on that, or whether the executor should directly communicate what requirements it should enforce, ie ExecutorRequirement.NO_IN_MEMORY_IO or something. If we get it right, the former allows us to build additional stuff around this captured understanding, but if we're wrong its a pain. In this specific case it's unclear to me if we should try to capture the distinction between multiple processes on one machine versus multiple processes on multiple machines.

What do you think of the approach of having a requirements: Optional[List[ExecutorRequirement]] ? Existing custom executors will gracefully enforce nothing and we can enforce what we want on our custom executors. If we continue to build on this the list of enum entries feels like it can grow somewhat gracefully.

python_modules/dagster/dagster/core/definitions/executor.py
19–21

this should have a doc block and an apidocs entry

145–161

these docs should be updated

ExecutorProcessSetting => List[ExecutorRequirement]

This revision is now accepted and ready to land.Apr 9 2021, 2:34 PM
This revision was landed with ongoing or failed builds.Apr 9 2021, 3:07 PM
This revision was automatically updated to reflect the committed changes.