Add instance to execution plan creation during execution iteration
Fri, Aug 27
Fix memoization tests, scheduler
Thu, Aug 26
Get rid of nested DagsterInstance.get()
Tue, Aug 24
Remove instance ref shenanigans (except across GRPC endpoint)
I think I can do this without the instance ref shenanigans
Aug 19 2021
Fix integration tests
Aug 18 2021
More verbose bs test
Add bs list to see what is going on
Fix weird opt inst bug and airflow tests
Make s3 io manager memoizable, add instance ref to k8s run launcher test instance
Remove intermediate storage changes from demo pipeline. Too invasive to loop into this change
Update get_external_execution_plan callsites to feed through instance ref, add celery-k8s memoization test, change instance to persistent (because it is persistent more or less)
Include instance ref when getting external execution plan
Aug 17 2021
Fix run config for integration test
Migrate over more tests to use s3 io manager
Update integration tests
Update integration test
Update memoization test
Use s3_pickle_io_manager in place of intermediate storage for k8s modes
Add integration k8s memoization test, switch integration to use s3 pickle io manager
Add tests for celery, dask, and k8s executors, carry forward step output versions on known state instead of execution plan.
Aug 13 2021
cc @sandyryza since this stack is making heavy use of the stacked diffs workflow atm, I think switching to a PR might be a bit rough. i'll definitely put up screenshots though.
Address comments. Add tests for guide. Add tags to execute_in_process. Add resource_key to version strategy s.t. you can differentiate between resources if you choose.
Aug 12 2021
Aug 11 2021
Aug 10 2021
Add an API reference page for memoization
Aug 9 2021
Aug 5 2021
Aug 4 2021
Properly fail if attempting to use mapping keys with versioning.
Move version inclusion to output_identifier fxn on output context
Aug 3 2021
Update for usage with version strategy
Aug 2 2021
Get rid of code example
Rebase + allow tag to toggle memoization off
Jul 31 2021
Jul 30 2021
to clarify, i agree that tags are less robust as it doesn't have good checks and could get lost - that may be the trade-off we are making or config may be a better alternative bc it's schematized.
Which brings us back to Alex's initial q: what's the right way to toggle memoization on and off?
I'd like to switch it on/off in dagit, so making it toggleable via either config or tag sounds good to me - if we have it defined on jobs, imo it'd be less efficient for development like ml training.
cc @alangenfeld regarding tags:
Account for run tags to determine whether to use memoization
Mypy execution plan snapshot, add a test to ensure that we don't carry around None values for step_output_versions
Jul 29 2021
Reimplement using VersionStrategy instance, add code example
Implement VersionStrategy class
Jul 28 2021
This strategy only really works for solids. A follow up diff will address the resources situation. Not sure if the best solution there is to keep the version argument on the resource decorator and make it not required to use memoization, or to add a resource_versioning_strategy argument (also not required) as well.
Hmm, I feel like I need to know the whole end state to evaluate this.
I think the f(solid) -> version pattern is reasonable, but should it be passed in just as a free function or something like a class with staticmethods that meets some interface MemoizationStrategy or something
While I can def see the utility of the idea, this interface feels a bit off to me. From what I understand, the difference between nodes and logical assets isn't entirely cut and dry, and I'm wondering if people shouldn't be able to just add nodes to their list of assets, and we infer them as such / foist them into an asset?
Jul 27 2021
cc @yuhan @alangenfeld might abandon this in favor of https://dagster.phacility.com/D9085, but the discussion is still relevant I think. I'm certainly open to having a versioned_outputs directory or something at that layer.
This doesn't actually work
@alangenfeld upon further examination, I don't think the execute_plan solution can work. Even if we keep around the original executors, we still need to create a new pipeline/mode in order to switch out the io manager. Then, we'll get a different set of complaints regarding the fact that we're using a non-persistent IO manager with an out of proc executor. I think we're forced to either do the config replacement or the permissive thing.
cc @alangenfeld ^ bc I forgot to tag you
have the "special" in process executor change make the whole execution section permissive at RunConfigSchema generation time - I think this is promising
I know this isn't a huge regression, but it still feels like a regression to me. I feel like it's a pretty core change for what is fundementally a workaround.