Page MenuHomeElementl

[RFC] Add all executable steps to known execution state
AbandonedPublic

Authored by rexledesma on Apr 8 2021, 3:11 AM.

Details

Reviewers
yuhan
alangenfeld
Summary

Context: https://demo.elementl.dev/instance/runs/d942aadc-8d9b-4cef-80d4-962147b3733b

The run above tries to read from the parent run's fs storage, even though all steps were selected
for re-execution. This was because the input manager determines where to read the upstream output
by checking if the execution plan has executed the upstream step (https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster/dagster/core/execution/context/system.py#L573)

If the execution plan is delegated to other machines and processes (say, in the multiprocess executor), the execution plan is no longer
the source of truth. Rather, we should rely on the known execution state is determine if the upstream step has been executed.

Here, we add all the executable steps to the known execution state and use that information to determine if a plan has executed a step.

Test Plan

added test cases failing on master, passing after this diff

Diff Detail

Repository
R1 dagster
Branch
rl/fix-multiprocess-load-input-execution (branched from master)
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

all_executable_steps -> step_keys_to_execute

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 8 2021, 3:34 AM
Harbormaster failed remote builds in B28579: Diff 35077!

the diagnosis looks right to me

in terms of testing, you may also need to test the graphql query that does the re-execution for dagit. i.e. adding multiproc tests in https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_reexecution.py
for more context, reexecute_pipeline is just the python api to reexecute a pipeline. the path dagit hits is`instance.submit_run` call directly. see in https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/dagster-graphql/dagster_graphql/implementation/execution/launch_execution.py#L24:8
the re-execution isn't very blessed atm bc we are duplicating logics in the react/gql and python path. where the paths get merged is the instance level (or maybe not anymore since we've separated submit_run and launch_run).

step_keys_to_execute is already part of PipelineRun right? If that's the case, would we be able to just access it "statically" instead of needing to pass it via KnownState?

+ the system.py changes

python_modules/dagster/dagster/core/execution/plan/plan.py
834–840

delete this method

python_modules/dagster/dagster/core/execution/plan/state.py
2–100

drop this

This revision now requires changes to proceed.Apr 8 2021, 5:24 PM

Dropping in favor of inspecting the pipeline run's step_keys_to_execute