Page MenuHomePhabricator

asset-store introduce AssetStoreContext
ClosedPublic

Authored by yuhan on Tue, Nov 10, 10:35 PM.

Details

Summary

per offline dicussion https://dagster.slack.com/archives/C019S1KD4NM/p1604968321000900
we decided not to expose StepOutputHandle and instead introduce da new context to capture all the information asset store may need

AssetStoreContext
- pipeline_run
- run_id
- step_key
- output_name
- asset_matadata

next step: provide a util method for path construction (get_keys)

Test Plan

asset store unit & docs preview

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

python_modules/dagster/dagster/core/execution/context/system.py
386

The naming of most of our contexts refers to the execution scenario where the context is used. E.g. HookContext is "the context we use while executing a hook." TypeCheckContext is "the context we use while executing a type check." By that pattern, StepOutputContext sounds like "the context we use while executing a step output", which I think might be confusing in the get_asset setting.

Maybe something like AssetOperationContext? Or splitting into SetAssetContext and GetAssetContext?

393

Any particular reason not to include the asset_metadata here? One of the things that excited me about this approach was the opportunity to remove all the boilerplate params that someone implementing an AssetStore had to include.

399

Do we need to set the log_manager and step?

413

I'm apprehensive about exposing this surface area at this point. I agree it's likely that we'll want to end up adding something like this, but it's probably worth at least some back-and-forth on what the name is for this collection thing before adding it to the public API. If you feel strongly about it, we could make it experimental?

python_modules/dagster/dagster/core/execution/context/system.py
386

My idea was to make it a shared context that other code paths can also use it.

393

Same reason as above. I'm open to make it an AssetStoreContext instead

python_modules/dagster/dagster/core/execution/context/system.py
386

When you say "other code paths", which code paths are you thinking of?

python_modules/dagster/dagster/core/execution/context/system.py
386
393

i think asset_metadata should an explicit argument on the function signature (set_asset, get_asset) rather than bury it in context. i view variables like step_output_handle, output_name, etc as boilerplates but not the obj and asset_metadata

413

i feel strongly about having a function returning a collection of keys - but im open to moving the reexecution logic out of here

python_modules/dagster/dagster/core/execution/context/system.py
386

the context we use while executing a step output

not sure what you mean by "executing" an output - but along this line of thought names like:

StepOutputStoreContext StepInputLoadContext
StepOutputStorageContext StepInputLoadingContext

might be good.

386

lets not inherit from SystemExecutionContext - I think we want to make this thing easy to construct ad-hoc for tests and only contain the information we want to expose

413

i agree we should have some back and forth on naming here - and I think it would be cleaner to have that happen in a diff stacked on top of this one. A lot going on in this diff as it stands.

feel pretty strong about not inheriting from SystemExecutionContext, so to your queue for at least that

This revision now requires changes to proceed.Thu, Nov 12, 6:32 PM
python_modules/dagster/dagster/core/storage/asset_store.py
165

@cdecarolis was talking earlier about cases where we may need to interact with an asset store outside the scope of an execution - if we expect to use these same get/set methods we should be thoughtful about what on this new context might be None able such as pipeline_def_name or run_id

yuhan marked 2 inline comments as done.

AssetStoreContext

yuhan retitled this revision from asset-store introduce StepOutputContext to asset-store introduce AssetStoreContext.Sat, Nov 14, 12:11 AM
yuhan edited the summary of this revision. (Show Details)
yuhan added inline comments.
python_modules/dagster/dagster/core/execution/context/system.py
386

ok. renamed it to AssetStoreContext - we could split it into get and set context later if needed

python_modules/dagster/dagster/core/storage/asset_store.py
165

@cdecarolis in that case, would we have the pipeline_run available?

yuhan edited the summary of this revision. (Show Details)

doc

python_modules/dagster/dagster/core/execution/context/system.py
393

Do you have particular thinking on why asset_metadata shouldn't be considered boilerplate, but step_output_handle etc. should?

When I was working on defining asset stores for the tutorial and the basic pyspark example, I found the biggest friction to be that I had to write out _asset_metadata so many times.

I definitely agree that obj should be there, because it's nearly impossible to think of a set_asset function that wouldn't use obj. However, I think many set_asset and get_asset functions won't need asset_metadata. For example, the Dagster-provided asset_stores like mem_asset_store and fs_asset_store don't use asset_metadata. Users who want to define similar asset stores to the Dagster-provided ones are likely to encounter this friction.

python_modules/dagster/dagster/core/execution/context/system.py
386

AssetStoreContext sounds good to me!

python_modules/dagster/dagster/core/execution/context/system.py
393

Ok I'm buying it. I think it makes sense to treat it similarly as context.solid_config. will update it soon

This revision is now accepted and ready to land.Tue, Nov 17, 2:50 PM
This revision was automatically updated to reflect the committed changes.
python_modules/dagster/dagster/core/storage/asset_store.py
165

we wouldn't have the pipeline run available, it would be at pipeline run creation time most likely.