Page MenuHomePhabricator

Add query to get runs grouped by root run id

Authored by max on Apr 5 2020, 1:21 AM.



This is to support run group-oriented UI

Test Plan


Diff Detail

R1 dagster
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sashank added inline comments.Apr 5 2020, 6:45 AM



Should this be called all_child_runs? It's just a little confusing, since we're querying for all runs that have a root run tag, not root runs themselves.


Hm if you flip these two statements around, the test will fail.

for root_run in root_runs:
    for _ in range(5):

I think the usage of cursor and limit is a bit confusing here. If I pass a limit to get_run_groups, it feels like I should be applying the limit to run groups as a whole.

i.e. if I say limit=3, I should get back a max of 3 run groups. But instead, it's limiting the number of runs we're querying for to work with to generate the run groups.

Not sure if that was intended.



max added inline comments.Apr 5 2020, 11:57 PM

this is intended, i will add a comment

yuhan added inline comments.Apr 6 2020, 1:21 AM

how about all_runs_with_root_tag and make the select more clear, like

all_runs_with_root_tag = ([
	.where(RunTagsTable.c.key == ROOT_RUN_ID_TAG)

also, a nit, it could be more clear to keep the variable name and the alias name same.


bc this is a left join, it should also include the root runs whose root_run_id would be NULL. So should we call it all_runs instead of child_runs?


s/RunTable/runs/ ?


Not sure if i'm processing this query correctly. Is it like this in SQL?

SELECT DISTINCT(RunsTable.run_id) AS run_id,
FROM RunsTable, child_runs
WHERE RunsTable.run_id = child_runs.run_id OR RunsTable.run_id = child_runs.root_run_id

Will this give you all the runs rather than the root runs?


Can we derive the count info from the child_runs subquery instead of joining three tables?

max marked 4 inline comments as done.Apr 6 2020, 3:36 AM
max added inline comments.Apr 6 2020, 4:17 AM

no, because the counts need to include all the other children (that aren't being returned in this query) -- this is to display UI hints like "...and 5 more"

max updated this revision to Diff 11460.Apr 6 2020, 5:32 PM

Clarify names

max marked 5 inline comments as done.Apr 6 2020, 5:37 PM
max added inline comments.

updates names hopefully to be clearer


this isn't a left join


we only do it this way so we can reuse the runs query logic in SqlRunStorage._runs_query


bad first name -- this gives us all the runs in our query, plus their roots

max marked 4 inline comments as done.Apr 6 2020, 6:38 PM
max added inline comments.Apr 6 2020, 10:35 PM

sorry, this is a left join, but not an outer join. i misunderstood what you were asking

yuhan accepted this revision.Apr 7 2020, 5:03 PM
This revision is now accepted and ready to land.Apr 7 2020, 5:03 PM
max planned changes to this revision.Apr 8 2020, 8:45 PM
max updated this revision to Diff 12490.Apr 23 2020, 6:44 PM

Rebase (stacked on D2646)

This revision is now accepted and ready to land.Apr 23 2020, 6:44 PM
max updated this revision to Diff 12523.Apr 23 2020, 11:06 PM


how do we fetch more / paginate runs from a single group after an initial fetch using this scheme?


hm this counting decision seems odd to me, the root feels like part of the group and there for part of the total count to me


this is a novel scheme but it took me a long time to understand what was happening


is there a better name for limit given the complex behavior here?


this is the final ordering of the returned results or am i missing something?

yuhan requested changes to this revision.Apr 24 2020, 10:44 PM

back to your queue b/c of the new comments

This revision now requires changes to proceed.Apr 24 2020, 10:44 PM
yuhan commandeered this revision.Sat, May 2, 5:45 AM
yuhan edited reviewers, added: max; removed: yuhan.
This revision now requires review to proceed.Sat, May 2, 5:45 AM
yuhan planned changes to this revision.Sat, May 2, 5:47 AM

splitting this and D2470 into

  • get_run_group, PipelineRunGroup
  • get_run_groups, PipelineRunGroups
max commandeered this revision.Wed, May 13, 8:00 PM
max added a reviewer: yuhan.
max updated this revision to Diff 13956.Thu, May 14, 12:44 AM


max planned changes to this revision.Thu, May 14, 12:44 AM
max requested review of this revision.Thu, May 14, 12:48 AM
max added a reviewer: alangenfeld.

Reopening this for comment, now with pseudosql in the comments as well as more color on why I still think this is the right way to do it.

max added inline comments.Thu, May 14, 12:58 AM

happy to make this change


could call it max_n_groups or something

as well as more color on why I still think this is the right way to do it

did you mean to update the diff summary or did i miss where this was added?

max added a comment.Thu, May 14, 6:57 PM

I just expanded the discussion of alternatives in the comments, but I'll expand on it here.

Why apply the filter machinery etc. first, and only then go and get the run groups to which the filtered runs belong? Doing it the other way breaks what I think is the sensible reverse chronological default; I don't know how to reason about "reverse chronological, but only for those runs which are roots or singletons". Suppose I have just been retrying runs belonging to a group created long ago; it makes sense to bump these to the top of the interface rather than burying them deeply paginated down. Suppose I have been rerunning exisitng runs to pick up code changes, and marking the new runs with user tags, like {'revised_revenue_per_user_definition': '2020Q2'}. When I later query with these tags, I don't want to retrieve only root runs and singletons that have those tags set -- even if only children in a run group have those tags set, I still want the run group.

The other thing that might be contentious is the choice to not return all of the runs for each run group we return. I'm less sure that this is right and happy to adjust this.

alangenfeld added reviewers: prha, bengotow.EditedThu, May 14, 7:47 PM
alangenfeld added subscribers: prha, bengotow.

are we pretty locked on the product experience ending up how you are describing? I think im convinced this is a good implementation of whats necessary to power what you are arguing for

+ @prha for another pair of eyes
+ @bengotow re: dagit experience

yuhan added a comment.EditedThu, May 14, 9:11 PM

I think it makes sense to return a group where ANY child runs can be applied to the filter, rather than a group where ALL child runs or the root applied. -- thinking about the email filter machinery: it shows the entire thread as long as there's one message matches the filter. also, it could be very real that I filter "status: failure" and I expect to see all the groups where at least one child run failed.

prha added inline comments.Thu, May 14, 9:24 PM

should there be an expectation around the ordering when there is a limit, cursor that is smaller than the number of groups?

Specifically, around the precise scenario where you have an older root run with a newer child run, vs a newer root run with an older child run.

max added a comment.Fri, May 15, 4:22 PM

@yuhan I think this observes those semantics, will add a test to demonstrate


I'll add a test to demonstrate. The root runs are always older than their own children.

max updated this revision to Diff 14150.Mon, May 18, 4:17 AM

Improve tests

yuhan accepted this revision.Mon, May 18, 9:51 PM
This revision is now accepted and ready to land.Mon, May 18, 9:51 PM
This revision was automatically updated to reflect the committed changes.