Page MenuHomePhabricator

Patition run matrix for schedule+solid execution history
ClosedPublic

Authored by bengotow on Aug 4 2020, 5:05 PM.

Details

Summary

This diff adds a matrix of partitions x runs to the bottom of the Schedule UI that makes it easier to see trends in step success / failure.

  • The search bar on the left allows you to filter the UI to a particular subset of the pipeline.
  • The run filter bar on the top right allows you to scope the UI to runs with a particular tag (for backfills)
  • The partial red squares indicate when steps failed and then later succeeded in a subsequent run of that partition.
  • The percent stats shown for each step indicate how often they transiently fail / terminally fail in partitions.
  • Clicking a partition shows the individual runs beneath the matrix so you can dig deeper.

I'm not 100% sure where this UI should live - right now it's at the bottom of the schedules UI because it can become pretty long, but we could also give it a dedicated tab / page.

I've tested this with a 10x200 grid and performance is pretty good because the grid is virtualized, but we may need to do more performance tweaking as we go.

Test Plan

Run tests

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

bengotow created this revision.Aug 4 2020, 5:05 PM
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 4 2020, 5:21 PM
Harbormaster failed remote builds in B16439: Diff 20050!
bengotow updated this revision to Diff 20434.Aug 11 2020, 6:31 PM

Refine partition matrix and improve performance by making grid virtualized

bengotow retitled this revision from Grid view for schedule+solid execution history to Patition run matrix for schedule+solid execution history.Aug 11 2020, 6:41 PM
bengotow edited the summary of this revision. (Show Details)
bengotow requested review of this revision.Aug 11 2020, 6:48 PM
prha requested changes to this revision.Aug 17 2020, 4:46 PM

overall, code looks good!

Currently, it feels a little bit off stashed at the bottom. It's also pretty different from the other graphs. Might be better to put up top, and then change the top run chart to use the run matrix component instead of the chart.js implementation we're currently using (though I can do that in a subsequent diff)

js_modules/dagit/src/gaant/useViewport.tsx
102–106

this is for when the container's DOM reference changes? when does this happen?

js_modules/dagit/src/schedules/PartitionRunMatrix.tsx
266–273

is there a way to draw the step dependency lines?

as is, it's a little confusing how the steps relate, and somewhat misleading as the indentation doesn't match the true sequence

455

I think the height of the scrollbar is throwing off the container heights and forcing a vertical scroll?

https://dagster.phacility.com/F253782

This revision now requires changes to proceed.Aug 17 2020, 4:46 PM
bengotow planned changes to this revision.Aug 18 2020, 7:16 PM
bengotow added inline comments.
js_modules/dagit/src/gaant/useViewport.tsx
102–106

Ahh I'll add a comment here—this can happen because the useViewporthook can't be conditional, but it's pretty easy to get into a situation where you render a loading spinner instead of the actual viewport div for a moment, and then render it (attaching the ref to another box or to a box for the first time)

js_modules/dagit/src/schedules/PartitionRunMatrix.tsx
266–273

Yeah I totally agree about this, going to see if I can improve the logic here a bit in a follow up.

455

Ahh yeah it is... I'll fix this. I think our best bet is probably to add 16-20px of padding to the bottom of this container all the time, but it might look /slightly/ odd when folks are using a trackpad.

bengotow updated this revision to Diff 20665.Aug 18 2020, 7:18 PM
  • Time slice bar, colorization of age of runs
  • Default to 30 partitions, add 120 partition option
  • Move the partition run matrix to the top
prha accepted this revision.Aug 18 2020, 8:21 PM

I think this looks good. We should get this in and iterate though.

Couple points:

  • Moving it to the top makes the distinct sections a little less clear. It'd be nice if the RunTable section felt more a part of the matrix component.
  • The default blank state for RunTable feels aggressive. Do we need to show anything if there are no partitions selected?
  • Ideally, we could use the same step filter from the run matrix to filter the graphs as well
  • The value of the Runs by Partition graph seems to have fallen a lot and it feels visually out of sync with the run matrix. We should consider dropping it.
This revision is now accepted and ready to land.Aug 18 2020, 8:21 PM
bengotow updated this revision to Diff 20681.Aug 18 2020, 9:33 PM
  • Add a small shift click note
This revision was landed with ongoing or failed builds.Aug 18 2020, 9:34 PM
This revision was automatically updated to reflect the committed changes.