Page MenuHomePhabricator

#2346-1 [RFC] multi-step (arbitrary subset) re-execution through dagit Run
AbandonedPublic

Authored by yuhan on Apr 1 2020, 4:47 PM.

Details

Summary

TODO

  • Enable ComputeLogLink to take multiple selected steps
  • Address confusion between step subset selector (GraphQueryInput in GaantChart) and log filter (FilterInputGroup in LogsToolBar). Currently, the step subset selector only changes GaantChart.state.query while the log filter input changes the Run.state.logFilter which affects the following components
    • LogProvider.state.filter
    • RunWithData.logsFilter
    • LogsToolbar.filter
    • LogsScrollingTable.filterkey
    • selectedSteps in RunActionsButtons (which passes the step keys to execution params), GaantChart (to change highlight and focus), LogsToolbar (which passes step keys and states to ComputeLogLink)

Done

  • Toy pipeline to showcase status quo: how a model training pipeline would look like in order to enable subset re-execution with intermediates
    • read_csv and preprocess takes 2 sec each so when we don't want to fully re run the pipeline just to tune the hyper-parameters
  • enabled multi-step re-execution in Run
    • allowed users to re-execute any arbitrary subset, e.g. only re-execute "preprocess -> model_one" to skip "read_csv" in this example

  • added group tags for
    • single-step re-execution
    • multi-step re-execution from dagit

status quo:


now:

Related Issues

Test Plan
  • ran the example in dagit
  • bk

Diff Detail

Repository
R1 dagster
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

yuhan created this revision.Apr 1 2020, 4:47 PM
yuhan edited the summary of this revision. (Show Details)Apr 1 2020, 4:54 PM
yuhan edited the summary of this revision. (Show Details)Apr 1 2020, 5:05 PM
yuhan edited the summary of this revision. (Show Details)
yuhan edited the summary of this revision. (Show Details)Apr 1 2020, 6:26 PM
yuhan updated this revision to Diff 11259.Apr 1 2020, 6:26 PM

simplify pipeline code

yuhan added inline comments.Apr 1 2020, 6:35 PM
examples/dagster_examples/toys/run_group_sub_cache.py
19–60

these are still required when the data type is unpickleable. it makes reloading intermediates an unpleasant dev experience when there are custom hydration/materialization schemes

yuhan edited the summary of this revision. (Show Details)Apr 1 2020, 6:40 PM
yuhan updated this revision to Diff 11400.Apr 3 2020, 8:40 PM

single-step retry: add parent/run id tags

yuhan retitled this revision from #1993 [run groupnig] cache capability for solids to [WIP] #1993 [run groupnig] cache capability for solids.Apr 3 2020, 8:42 PM
yuhan edited the summary of this revision. (Show Details)
yuhan updated this revision to Diff 11457.Apr 6 2020, 4:30 PM

enable multi-step re-execution from dagit

yuhan retitled this revision from [WIP] #1993 [run groupnig] cache capability for solids to #2346-1 [run groupnig] multi-step (arbitrary subset) re-execution through dagit Run.Apr 6 2020, 4:41 PM
yuhan edited the summary of this revision. (Show Details)
yuhan edited the test plan for this revision. (Show Details)
yuhan added reviewers: max, prha, alangenfeld, bengotow.
yuhan added inline comments.Apr 6 2020, 4:45 PM
js_modules/dagit/src/gaant/GaantChart.tsx
396–408

js_modules/dagit/src/runs/LogsProvider.tsx
250–255

yuhan edited the summary of this revision. (Show Details)Apr 6 2020, 5:59 PM
yuhan updated this revision to Diff 11497.Apr 7 2020, 1:32 AM
yuhan edited the summary of this revision. (Show Details)

stack

yuhan retitled this revision from #2346-1 [run groupnig] multi-step (arbitrary subset) re-execution through dagit Run to #2346-1 [run grouping] multi-step (arbitrary subset) re-execution through dagit Run.Apr 7 2020, 1:47 AM

This looks great! Just added a few comments on the JS side of things—very cool to see multiple selection in the gaant view! I think that some of the changes to the "structured contents of the search bar" overlap with work I did but it might be easier for you to land this first, will check and see.

js_modules/dagit/src/gaant/GaantChart.tsx
371

This first ? might be unnecessary here! It may also be worth investigating whether we need selectedSteps to be optional since [] seems like it probably represents the same state as undefined.

This code here is responsible for the "auto-scroll to the selected node" and I think just looking at the first selected node is fine, but if things feel funny we could make this fancier in a follow-up!

401

This is a minor nit but I think this sort function is a bit unstable in the case where one node is missing a start, and if two happen to be missing a start somehow, the highlightedMs result might randomly switch between them? Since we ignore things without a start below and you can't have an end without a start, we may just want to do this to clean them out early:

const sortedSelectedSteps = selectedMeta.filter(m => m?.start).sort((a, b) => a.start - b.start)
js_modules/dagit/src/runs/LogsProvider.tsx
79

I think I also fixed this in D2364, we may be able to merge the changes with this file with the changes there but mine didn't support multiple selection - this really needed to be better structured!

256

This looks good! Just want to be sure though, it looks like this is either structured search OR a freeform term but not the combination of both? Wondering if the outer structuredFields if statement on 247 isn't necessary?

js_modules/dagit/src/runs/RunUtils.tsx
127

Totally optional but this is a great use case for .every, which sort of sums up what it's verifying:

if (!stepKeys.every(key => run.executionPlan.steps.find(s => s.key === key))) {
   return;
}
schrockn requested changes to this revision.Apr 7 2020, 6:31 PM
schrockn added a subscriber: schrockn.

We need to consider what this means from a product perspective. This diff leaves the world with overlapping notions of step subset selection that leaves the world in a more confused state.

Take this:

The logs are filtered down to those two steps selected, and the retry button text indicates that.

However the step subset selector is blank.

Versus:

I think we have to think more deeply about what we are doing from a product perspective before we add something like this incrementally.

This revision now requires changes to proceed.Apr 7 2020, 6:31 PM
max added inline comments.Apr 7 2020, 6:46 PM
examples/dagster_examples/toys/environments/run_group_sub_cache.yaml
18

We need to figure out a way to add a default input hydration config for all python types so they can read from the intermediates store. This should really be sth like

input_data:
  intermediate:
    run_id: 19dd35ed-d9d2-4a82-a5d3-a550679f584c
    solid: read_csv
    output: result
examples/dagster_examples/toys/run_group_sub_cache.py
11

as a rule we are trying to keep TODOs in GH issues - can you reference an issue here?

max added a comment.Apr 7 2020, 6:47 PM

@schrockn it might make sense for us to do this work against a feature branch rather than master

yuhan updated this revision to Diff 11551.EditedApr 8 2020, 1:14 AM
yuhan marked 4 inline comments as done.

update. gonna wait to address the confusion between FilterInputGroup and Step Subset selector until D2364 is merged.

yuhan added inline comments.Apr 8 2020, 1:15 AM
js_modules/dagit/src/runs/LogsProvider.tsx
256

I actually wonder why we structured it as {step, type}? it makes sense when there's only single filter element.
But here we made it possible to have multiple fields, would it make more sense to change the format, e.g. {steps: [...], types: [...]}?

yuhan retitled this revision from #2346-1 [run grouping] multi-step (arbitrary subset) re-execution through dagit Run to #2346-1 [RFC] multi-step (arbitrary subset) re-execution through dagit Run.Apr 8 2020, 1:27 AM
yuhan edited the summary of this revision. (Show Details)
yuhan planned changes to this revision.Apr 8 2020, 9:01 PM
yuhan abandoned this revision.Apr 24 2020, 7:05 PM

going to rework this