Page MenuHomePhabricator

[Asset Store Versioning 3 of 3] Enable use of asset store to decide which steps to rerun during memoized execution, and enable retrieval of versioned assets.
Needs RevisionPublic

Authored by cdecarolis on Tue, Nov 10, 5:31 PM.

Details

Summary

Added decision logic for memoized re-execution using asset store. This will only work with a VersionedAssetStore, as it has a special has_asset_with_version method.

Test Plan

Added a full dev loop test using the asset store.

Diff Detail

Repository
R1 dagster
Branch
executememoizedplanwithassets
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Tue, Nov 10, 5:51 PM
Harbormaster failed remote builds in B20949: Diff 25411!

Added repo to memoized_dev_loop_pipeline

Rebased on top of Versioned Asset Store changes

yuhan added inline comments.
python_modules/dagster/dagster/core/execution/context/system.py
310–314

nit: can be version = step_output_versions.get(step_output_handle, None) instead

python_modules/dagster/dagster/core/instance/__init__.py
536

what's this for?

559

nit: execution_plan.pipeline_def

570

is asset_store_key or "config" guaranteed to be in the resource? if not, lets do environment_config.resources.get(resource_name, {}).get("config") to be safer

570–575

this piece seems to be initing resource outside the resource_initialization_manager and im not confident about the run_id="" workaround either.
cc @prha: would like to let you weight in as you've touched most of resource/context init work

python_modules/dagster/dagster/core/instance/__init__.py
536

this is for line 570, necessary to determine if we're using intermediate storage or not.

570–575

Yea the run_id workaround in particular makes me pretty uncomfortable as well. I don't think there's currently a context for this use case.

python_modules/dagster/dagster/core/instance/__init__.py
532

should this be on DagsterInstance ? maybe better as a free method that takes an instance and an execution plan and lives somewhere else. This file is gnarly enough

570–575

woof ya i think if we want to do this we need a dedicated api for spinning up access to resources outside of an execution context - needs to be a context manager since the resource api supports that

worth taking a step back and thinking for a minute how this flow should actually work. What is the error experience for the user going to be if the asset store resource fails to init or if the invocation on the asset store throws

in a deployed world with separated user code when/where does this resolution happen

python_modules/dagster/dagster/core/instance/__init__.py
570–575

We should be able to do this within the execution context, right? After the context is built but before iterating through the plan?

python_modules/dagster/dagster/core/instance/__init__.py
570–575

After the context is built but before iterating through the plan

ya that is one path, my worry about that direction is how we will build the almost certainly required feature of being able to preview the memoized plan before hitting execute.

At the end of the day i think the core problem boils down to that - we need to spin up a context to do the resolution, so we either make a dedicated context spin up api or we require booting the execution context for the preview case

ok we talked through different options on how to proceed here and the end consensus was that its ok to land it in this known-janky state since this api is experimental and limited any way

follow ups:

  • make sure something clear is logged when a memoized execution plan transform is run
  • inline comments
python_modules/dagster/dagster/core/instance/__init__.py
532

move this out

570–575

clear comment block, tag with issue

This revision now requires changes to proceed.Mon, Nov 23, 8:20 PM