Page MenuHomeElementl

[Memoization 12/13] Enable memoization on all out-of-the-box executors
AcceptedPublic

Authored by cdecarolis on Aug 11 2021, 12:19 AM.

Details

Summary

This diff enables memoization on all out-of-the-box executors. More specifically, that includes multiprocess, k8s, celery, k8s-celery, and dask.

This means carrying the dagster instance across the execution plan creation process boundary, and carrying over step output versions to child workers so they know not to recompute memoization information for every step.

Test Plan

Added tests to ensure proper memoization behavior for all executors. Also verified proper memoization behavior via Dagit. integration

Diff Detail

Repository
R1 dagster
Branch
memoization_out_of_process_executor_fix
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster returned this revision to the author for changes because remote builds failed.Aug 11 2021, 12:44 AM
Harbormaster failed remote builds in B34642: Diff 42856!

what about all the other executors? Just fixing it for multiprocess seems wrong

python_modules/dagster/dagster/core/executor/multiprocess.py
65–67 ↗(On Diff #42861)

big yikes

This revision now requires changes to proceed.Aug 12 2021, 7:46 PM

Add tests for celery, dask, and k8s executors, carry forward step output versions on known state instead of execution plan.

cdecarolis added inline comments.
python_modules/dagster/dagster/core/executor/multiprocess.py
65–67 ↗(On Diff #42861)

yea this is awful

Add integration k8s memoization test, switch integration to use s3 pickle io manager

Use s3_pickle_io_manager in place of intermediate storage for k8s modes

Migrate over more tests to use s3 io manager

Fix run config for integration test

Include instance ref when getting external execution plan

Update get_external_execution_plan callsites to feed through instance ref, add celery-k8s memoization test, change instance to persistent (because it is persistent more or less)

Remove intermediate storage changes from demo pipeline. Too invasive to loop into this change

cdecarolis retitled this revision from [Memoization 12/13] Fix bug with out-of-process memoization to [Memoization 12/13] Enable memoization on all out-of-the-box executors.Aug 18 2021, 3:42 AM
cdecarolis edited the summary of this revision. (Show Details)
cdecarolis edited the test plan for this revision. (Show Details)

Make s3 io manager memoizable, add instance ref to k8s run launcher test instance

Fix weird opt inst bug and airflow tests

Add bs list to see what is going on

alangenfeld added subscribers: dgibson, johann.
alangenfeld added inline comments.
python_modules/dagster/dagster/core/execution/plan/plan.py
743

make this instance: DagsterInstance

python_modules/dagster/dagster/grpc/types.py
18–50

@dgibson / @johann where do we stand on this instance_ref shenanigans

I think I can do this without the instance ref shenanigans

Remove instance ref shenanigans (except across GRPC endpoint)

Get rid of nested DagsterInstance.get()

Fix memoization tests, scheduler

Add instance to execution plan creation during execution iteration

see inline - esp the contextmanager thing. I'll sleep easier when this passes your new back-compat integration tests, and you'll want to have an internal diff ready to go before landing this i think

integration_tests/test_suites/celery-k8s-integration-test-suite/test_integration.py
566–570

finally still runs if there's an exception so this will run cleanup_memoized_results on an error

integration_tests/test_suites/k8s-integration-test-suite/test_executor.py
509–513

same

python_modules/dagster-test/dagster_test/test_project/__init__.py
36

consider passing in the pipeline here vs assuming its define_memoization_pipeline?

python_modules/dagster/dagster/grpc/impl.py
341

this should be a contextmanager so that it cleans up

with (DagsterInstance.from_ref(args.instance_ref) if args.instance_ref else nullcontext()) as instance:

python_modules/dagster/dagster/grpc/types.py
18–50

this should be ok. will require a small internal diff, and you'll want to keep instance_ref optional

This revision is now accepted and ready to land.Sep 8 2021, 3:28 PM