Added decision logic for memoized re-execution using asset store. This will only work with a VersionedAssetStore, as it has a special has_asset_with_version method.
Details
Added a full dev loop test using the asset store.
Ran pipeline to see how engine event registered in dagit:
Diff Detail
- Repository
- R1 dagster
- Branch
- executememoizedplanwithassets
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
python_modules/dagster/dagster/core/execution/context/system.py | ||
---|---|---|
311–315 ↗ | (On Diff #26021) | nit: can be version = step_output_versions.get(step_output_handle, None) instead |
python_modules/dagster/dagster/core/instance/__init__.py | ||
528 | what's this for? | |
554 | nit: execution_plan.pipeline_def | |
565 | is asset_store_key or "config" guaranteed to be in the resource? if not, lets do environment_config.resources.get(resource_name, {}).get("config") to be safer | |
565–570 | this piece seems to be initing resource outside the resource_initialization_manager and im not confident about the run_id="" workaround either. |
python_modules/dagster/dagster/core/instance/__init__.py | ||
---|---|---|
523 | should this be on DagsterInstance ? maybe better as a free method that takes an instance and an execution plan and lives somewhere else. This file is gnarly enough | |
565–570 | woof ya i think if we want to do this we need a dedicated api for spinning up access to resources outside of an execution context - needs to be a context manager since the resource api supports that worth taking a step back and thinking for a minute how this flow should actually work. What is the error experience for the user going to be if the asset store resource fails to init or if the invocation on the asset store throws in a deployed world with separated user code when/where does this resolution happen |
python_modules/dagster/dagster/core/instance/__init__.py | ||
---|---|---|
565–570 | We should be able to do this within the execution context, right? After the context is built but before iterating through the plan? |
python_modules/dagster/dagster/core/instance/__init__.py | ||
---|---|---|
565–570 |
ya that is one path, my worry about that direction is how we will build the almost certainly required feature of being able to preview the memoized plan before hitting execute. At the end of the day i think the core problem boils down to that - we need to spin up a context to do the resolution, so we either make a dedicated context spin up api or we require booting the execution context for the preview case |
ok we talked through different options on how to proceed here and the end consensus was that its ok to land it in this known-janky state since this api is experimental and limited any way
follow ups:
- make sure something clear is logged when a memoized execution plan transform is run
- inline comments
python_modules/dagster/dagster/core/instance/__init__.py | ||
---|---|---|
523 | move this out | |
565–570 | clear comment block, tag with issue |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/run_lifecycle.py | ||
---|---|---|
29–34 ↗ | (On Diff #26580) | this wont be visible to anyone really - will just be in the stdout logs of the webserver. I think you need to report an engine event after the run is created |
python_modules/dagster/dagster/cli/pipeline.py | ||
475–480 ↗ | (On Diff #26580) | did you test this? when does it show up? |
python_modules/dagster/dagster/scheduler/sensor.py | ||
331–336 ↗ | (On Diff #26580) | does this show up anywhere useful? |
python_modules/dagster/dagster/cli/pipeline.py | ||
---|---|---|
475–480 ↗ | (On Diff #26580) | yea I got rid of this. Only kept the run_lifecycle.py path due to it being the relevant path for a playground-launched execution. |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/run_lifecycle.py | ||
---|---|---|
29–34 ↗ | (On Diff #26580) | that makes sense, will do. |
python_modules/dagster/dagster/core/execution/resolve_versions.py | ||
---|---|---|
146 ↗ | (On Diff #26719) | Arguably, we don't need to tie this to versions at all anymore? I.e. the memoization can just be based on has_asset, and it's up to the asset store to implement has_asset using versions? |
177 ↗ | (On Diff #26719) | Not all asset stores have a has_asset method, right? Can we raise a helpful error if one is missing? |
python_modules/dagster-graphql/dagster_graphql/implementation/execution/run_lifecycle.py | ||
---|---|---|
48 ↗ | (On Diff #26719) | nit: link to issue |
Added docs link to engine event. Also linked to issue to change memoized re-execution code path.