Page MenuHomePhabricator

step-selection-1 parse step selection

Authored by yuhan on Aug 5 2020, 6:02 PM.



resolve step queries to step keys when building execution plan. so we will have step_selection (List[str]) prior to conversion and step_keys_to_execute (Frozenset[str]) post conversion.

this is going to a stack of diffs

  1. this one: parse step selection & rename reexecute_pipeline(step_keys_to_execute) to reexecute_pipeline(step_selection) (0.9.0)
    • there are many places marked with the github issue #2605 in case i can't get in the internal refactor diff in time
  2. rename pre-conversion to step_selection internally and change post-conversion step_keys_to_execute to frozenset + all necessary refactor (similar to the stack of D3225)
    • this could be post 0.9.0 which is also a safer choice. i may not have enough time get this in before 0.9.0 release, but want to at least make sure public api renaming is out.
Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

yuhan created this revision.Aug 5 2020, 6:02 PM
yuhan requested review of this revision.Aug 5 2020, 6:24 PM
yuhan edited the summary of this revision. (Show Details)Aug 5 2020, 6:29 PM
yuhan added reviewers: alangenfeld, schrockn, max.
yuhan added inline comments.Aug 5 2020, 6:31 PM

@schrockn can i loose this constraint introduced from D2850?
as we enable DSL in step_keys_to_execute (, we wont keep the step_keys_to_execute and execution_plan_snapshot.step_keys_to_execute the same.

wanted to get your idea first before doing more refactoring. if that doesnt sound right, i will need to persist step_selection (pre-resolve list of step queries) in execution_plan_snapshot) -- i was thinking of only keeping step_keys_to_execute in execution plan snapshot

schrockn added inline comments.Aug 5 2020, 7:09 PM

I think we should persist both things. I think this invariant can catch some really pernicious errors that would result in us reporting inaccurately that an execution is a subset when in fact it is not.

I think the "right" way to fix this would be to introduce a similar concept that PipelineSnapshot has with PipelineSnapshotLineage.

Would be curious to get @alangenfeld's thoughts

yuhan updated this revision to Diff 20198.EditedAug 5 2020, 9:14 PM
  • rename reexecute_pipeline(step_keys_to_execute) to reexecute_pipeline(step_selection)
  • persist both step_selection and step_keys_to_execute
yuhan edited the summary of this revision. (Show Details)Aug 5 2020, 9:15 PM
yuhan planned changes to this revision.Aug 10 2020, 9:26 PM