Page MenuHomeElementl

[Memoization 12/13] Enable memoization on all out-of-the-box executors

Authored by cdecarolis on Aug 11 2021, 12:19 AM.
Referenced Files
Unknown Object (File)
Dec 27 2022, 7:35 PM
Unknown Object (File)
Dec 25 2022, 10:19 AM
Unknown Object (File)
Dec 17 2022, 5:41 AM
Unknown Object (File)
Dec 13 2022, 10:51 PM
Unknown Object (File)
Dec 11 2022, 5:33 AM
Unknown Object (File)
Dec 5 2022, 5:07 PM
Unknown Object (File)
Nov 30 2022, 5:38 PM
Unknown Object (File)
Nov 28 2022, 4:21 PM



This diff enables memoization on all out-of-the-box executors. More specifically, that includes multiprocess, k8s, celery, k8s-celery, and dask.

This means carrying the dagster instance across the execution plan creation process boundary, and carrying over step output versions to child workers so they know not to recompute memoization information for every step.

Test Plan

Added tests to ensure proper memoization behavior for all executors. Also verified proper memoization behavior via Dagit. integration

Diff Detail

R1 dagster
Lint Not Applicable
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

what about all the other executors? Just fixing it for multiprocess seems wrong

65–67 ↗(On Diff #42861)

big yikes

This revision now requires changes to proceed.Aug 12 2021, 7:46 PM

Add tests for celery, dask, and k8s executors, carry forward step output versions on known state instead of execution plan.

cdecarolis added inline comments.
65–67 ↗(On Diff #42861)

yea this is awful

Add integration k8s memoization test, switch integration to use s3 pickle io manager

Use s3_pickle_io_manager in place of intermediate storage for k8s modes

Migrate over more tests to use s3 io manager

Fix run config for integration test

Include instance ref when getting external execution plan

Update get_external_execution_plan callsites to feed through instance ref, add celery-k8s memoization test, change instance to persistent (because it is persistent more or less)

Remove intermediate storage changes from demo pipeline. Too invasive to loop into this change

cdecarolis retitled this revision from [Memoization 12/13] Fix bug with out-of-process memoization to [Memoization 12/13] Enable memoization on all out-of-the-box executors.Aug 18 2021, 3:42 AM
cdecarolis edited the summary of this revision. (Show Details)
cdecarolis edited the test plan for this revision. (Show Details)

Make s3 io manager memoizable, add instance ref to k8s run launcher test instance

Fix weird opt inst bug and airflow tests

Add bs list to see what is going on

alangenfeld added subscribers: dgibson, johann.
alangenfeld added inline comments.

make this instance: DagsterInstance


@dgibson / @johann where do we stand on this instance_ref shenanigans

I think I can do this without the instance ref shenanigans

Remove instance ref shenanigans (except across GRPC endpoint)

Get rid of nested DagsterInstance.get()

Fix memoization tests, scheduler

Add instance to execution plan creation during execution iteration

see inline - esp the contextmanager thing. I'll sleep easier when this passes your new back-compat integration tests, and you'll want to have an internal diff ready to go before landing this i think


finally still runs if there's an exception so this will run cleanup_memoized_results on an error




consider passing in the pipeline here vs assuming its define_memoization_pipeline?


this should be a contextmanager so that it cleans up

with (DagsterInstance.from_ref(args.instance_ref) if args.instance_ref else nullcontext()) as instance:


this should be ok. will require a small internal diff, and you'll want to keep instance_ref optional

This revision is now accepted and ready to land.Sep 8 2021, 3:28 PM