Page MenuHomePhabricator

thread pipeline_run through context
ClosedPublic

Authored by prha on Sat, Oct 26, 12:52 AM.

Details

Reviewers
alangenfeld
Group Reviewers
Restricted Project
Commits
R1:5dad924b6802: thread pipeline_run through context
Summary

This scrubs run config from all the lower level execution APIs, only exposing pipeline_run on the attribute.
The top-level execution API functions now contain the following:

  • execute_run and execute_run_iterator take a PipelineRun object.
  • execute_pipeline and execute_pipeline_iterator take in a RunConfig and construct a PipelineRun that is persisted using the instance run storage
  • execute_plan and execute_plan_iterator take in a RunConfig and construct an ephemeral PipelineRun that is not persisted.

In order to make this work with the multiprocessing engine, we have to be able to reconstruct a run config from a pipeline run and use it to set up the environment/context again, using execute_plan_iterator.

We also have a slightly confusing API now with the context mapper. The config traversal context is now different than the execution context in that the config traversal context contains a RunConfig while the execution context contains a PipelineRun. Would love to discuss the context mapper and what the appropriate API should be.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha created this revision.Sat, Oct 26, 12:52 AM
prha updated this revision to Diff 6045.Mon, Oct 28, 2:00 AM

update, for dagstermill tests

prha edited the summary of this revision. (Show Details)Mon, Oct 28, 2:17 AM
prha added a reviewer: Restricted Project.

pretty heroic refactor :) nice! let's sync on the config mapping context tomorrow.

LGTM, but I won't accept so that @alangenfeld can also take a look

python_modules/dagster/dagster_tests/core_tests/execution_tests/test_run_config.py
7

should we use execution/__init__.py for exported names vs. using api.py?

prha retitled this revision from RFC: thread pipeline_run through context to thread pipeline_run through context.Mon, Oct 28, 3:54 PM
alangenfeld requested changes to this revision.Mon, Oct 28, 5:15 PM
alangenfeld added inline comments.
python_modules/dagster/dagster/core/engine/engine_multiprocess.py
31–34

why does plan iterator need a run config?

python_modules/dagster/dagster/core/execution/api.py
446–451

my initial reaction: seems weird to do this overwrite dance if EXECUTION_TIME_KEY is already there

looks like there is some context in the blame rev https://dagster.phacility.com/D585 probably worth hoisting to a comment in line here

python_modules/dagster/dagster/core/execution/config.py
68–72

same as above

python_modules/dagster/dagster/core/execution/context/system.py
117–126

re back-compat: expose on run_config also

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
94–96

this i think could be alleviated by IRunConfig implementing PipelineRun

python_modules/dagster/dagster/core/storage/pipeline_run.py
77–85

[1]

136–143

feels like RunConfig -> PipelineRun should be a one way street

Should we avoid the need for this by making a RunConfig interface and having PipelineRun implement that?

python_modules/dagstermill/dagstermill/manager.py
81

instead of adding the to/from dict stuff why not switch this over to just using the serdes stuff?

This revision now requires changes to proceed.Mon, Oct 28, 5:15 PM

this is such an improvement

yesss

prha updated this revision to Diff 6046.Mon, Oct 28, 8:37 PM
prha edited the summary of this revision. (Show Details)

dagstermill serdes
backward compat API for run_config on context
add common interface, IRunConfig
switch execute_plan_iterator to take pipeline_run

alangenfeld added inline comments.Mon, Oct 28, 9:05 PM
python_modules/dagster/dagster/core/execution/api.py
345–381

_create_ephemeral_run feels off to me at a conceptual level

python_modules/dagster/dagster/core/execution/context/system.py
119–125

lets create a GH task and link it here

python_modules/dagster/dagster/core/types/evaluator/traversal_context.py
28–30

maybe slip a comment here since this is the one place that takes IRunConfig

python_modules/dagstermill/dagstermill/manager.py
175–199

is it not possible to just use the real parent pipeline_run object instead of this weird temp one?

alangenfeld added inline comments.Mon, Oct 28, 9:06 PM
python_modules/dagster/dagster/core/engine/engine_multiprocess.py
28–29

lets pass this information through some other way - PipelineRun is so close to only representing the overall run

alangenfeld added inline comments.Mon, Oct 28, 9:08 PM
python_modules/dagstermill/dagstermill/manager.py
175–191

comment that this is a fake pipeline run for out of pipeline execution

prha updated this revision to Diff 6047.Tue, Oct 29, 12:26 AM
  • switch execute_plan_iterator to take pipeline_run
  • force pipeline run for execute_plan
  • update graphql endpoint to require pipeline_run for execute plan
prha updated this revision to Diff 6048.Tue, Oct 29, 12:31 AM
prha marked 4 inline comments as done.
  • remove _create_ephemeral_run
prha updated this revision to Diff 6049.Tue, Oct 29, 12:51 AM
  • resolve airflow tests
alangenfeld accepted this revision.Tue, Oct 29, 6:07 PM

its beautiful

finallydone

This revision is now accepted and ready to land.Tue, Oct 29, 6:07 PM
prha updated this revision to Diff 6072.Tue, Oct 29, 9:27 PM
  • fix airflow tests
This revision was automatically updated to reflect the committed changes.