Page MenuHomePhabricator

Remove reexecution_config, add memoization strategy for reexecution
ClosedPublic

Authored by prha on Wed, Oct 30, 11:42 PM.

Details

Summary

This diff restructures the reexecution flow to use previous_run_id and the existing step_key arguments on ExecutionParams instead of a separate ReexecutionConfig. It moves the step_keys_to_execute onto ExecutionPlan to thread through the APIs and engine. Adds a RetryMemoization class that handles the intermediate management across runs.

This contains a lot of the major refactor changes in D1294 (minus the resume/retry), but builds upon the RunConfig => PipelineRun change from D1329.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

prha created this revision.Wed, Oct 30, 11:42 PM
prha updated this revision to Diff 6103.Thu, Oct 31, 12:12 AM

updated aws tests, reran graphql type gen

prha edited the summary of this revision. (Show Details)Thu, Oct 31, 6:11 AM
prha added a reviewer: Restricted Project.

This is the Lord's work. This looks great. Let me stew on it for a hot but I think this is quite the improvement.

Amazing. thanks for doing all this cleanup!

schrockn requested changes to this revision.Thu, Oct 31, 6:56 PM

This is a huge improvement. My questions/concerns are mostly around the memoization strategy class.

python_modules/dagster/dagster/core/execution/api.py
258

If there are "missing steps" which is the execution plan ever constructed?

288

so much better

python_modules/dagster/dagster/core/execution/memoization.py
51

manage_intermediates is a bit vague. had to read the function to understand why you named it as such.

I'm actually having a hard time following the function. i think a docblock describing the logic could be very useful.

71โ€“79

a comment or encapsulating this (this being what is in the if statement) in function with a descriptive name could be more clear

88โ€“98

I think we should resolve the TODO before we merge. This could easily become cost-prohibitive.

This revision now requires changes to proceed.Thu, Oct 31, 6:56 PM
prha added inline comments.Thu, Oct 31, 8:57 PM
python_modules/dagster/dagster/core/execution/api.py
258

Yeah, I'll move this to the PlanBuilder

python_modules/dagster/dagster/core/execution/memoization.py
51

๐Ÿ‘

71โ€“79

๐Ÿ‘

88โ€“98

๐Ÿ‘

prha updated this revision to Diff 6161.Fri, Nov 1, 7:05 PM

refactor memoization strategy, add single-step reexecution test

prha planned changes to this revision.Fri, Nov 1, 7:11 PM
prha updated this revision to Diff 6162.Fri, Nov 1, 7:36 PM

move missing step check to execution plan constructor

prha updated this revision to Diff 6177.Mon, Nov 4, 4:24 PM

rebase

alangenfeld added inline comments.
js_modules/dagit/src/schema.graphql
225

my last hang up is if this name sufficiently communicates whats going to happen if you set it. Some options would be a better name, a sibling arg, or a wrapper arg to communicate that this implies that we are retrying based on the results of said previous run.

prha planned changes to this revision.Mon, Nov 4, 7:24 PM
prha added inline comments.
js_modules/dagit/src/schema.graphql
225

Yeah, I erred on the side of just making the GraphQL API descriptive (e.g. "here is the previous run id"), allowing for different future behaviors in the memoization strategy (making it pluggable).

I feel comfortable renaming this retryRunId or something like that until we have a better sense of what those other behaviors look like.

prha updated this revision to Diff 6185.Mon, Nov 4, 7:40 PM

rename graphql parameter previousRunId to retryRunId

alangenfeld accepted this revision.Mon, Nov 4, 9:29 PM

A+ diff would review again

redfordnod

prha updated this revision to Diff 6197.Mon, Nov 4, 9:52 PM

handle multiprocess intermediates

schrockn requested changes to this revision.Tue, Nov 5, 1:21 AM
schrockn added inline comments.
python_modules/dagster/dagster/core/execution/api.py
270

why is this a separate function?

python_modules/dagster/dagster/core/execution/memoization.py
14

per slack thread let's just make this into a single function copy_required_intermediates_for_execution or something similar

31

can we just put this into init?

This revision now requires changes to proceed.Tue, Nov 5, 1:21 AM
prha updated this revision to Diff 6224.Tue, Nov 5, 8:06 AM
  • move from class to simple functions
max added a subscriber: max.Wed, Nov 6, 5:12 PM
max added inline comments.
js_modules/dagit/src/schema.graphql
225

or indeed parentRunId? but actually i don't think i understand the difference between retryRunId and previous_run_id

python_modules/dagster/dagster/core/execution/memoization.py
23

it might be nice to print the name or class of the intermediates manager here to aid debugging

max resigned from this revision.Wed, Nov 6, 5:13 PM
This revision is now accepted and ready to land.Wed, Nov 6, 6:54 PM
prha updated this revision to Diff 6299.Wed, Nov 6, 7:04 PM

add intermediates manager class name in error message

prha updated this revision to Diff 6306.Wed, Nov 6, 9:34 PM

rebase