Page MenuHomePhabricator

asset-store-1 yield AssetMaterialization #3138
ClosedPublic

Authored by yuhan on Fri, Oct 30, 5:29 PM.

Details

Summary

This allows users to yield AssetMaterialization in AssetStore.set_asset
dagit changes in D4956

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

python_modules/dagster/dagster/core/execution/plan/execute_step.py
373–388

when replacing object store + intermediate storage with asset store, we should not emit AssetMaterialization in the default asset store. otherwise, there will be many unexpected assets shown up in users' asset catelog which would be a breaking change and regression - current users likely don't want to have all "intermediates" between solids shown up in the asset page.

to distinguish AssetMaterialization yielded by the system (from the default asset store) and the ones yielded by the user, we could introduce a param on AssetMaterialization(tracked=True) and make the one in PickledObjectFilesystemAssetStore to be "not tracked" and then exclude the non-tracked in the asset page

yuhan requested review of this revision.Fri, Oct 30, 5:51 PM
python_modules/dagster/dagster/core/execution/plan/execute_step.py
347–349

I think AssetStores should be able to return zero or multiple AssetMaterializations.

Zero: As you mentioned below, we probably don't want to flood the asset manager with materializations from every time an intermediate is written. Could we just omit the materializations in that default case and let people emit materializations when they do want something to show up in the asset manager?

Multiple: it's possible that a step writes data to multiple partitions, and we'd want an AssetMaterialization for each partition.

Thoughts?

python_modules/dagster/dagster/core/execution/plan/execute_step.py
347–349

Could we just omit the materializations in that default case and let people emit materializations when they do want something to show up in the asset manager?

makes total sense. I'd like to propose the default fs one and the mem in D4976 omit the event and the other built-in custom_path_filesystem_asset_store emits the event

Multiple: it's possible that a step writes data to multiple partitions, and we'd want an AssetMaterialization for each partition.

Right! I forgot about the partition case. I've only thought about step output and asset materialization being 1:1 relation.

zero or multiple AssetMaterialization events

This looks good aside from a couple comments.

python_modules/dagster/dagster/core/execution/plan/execute_step.py
351

Because of how generators / yield work, this will mean we yield the AssetStoreOperation event before we actually complete the contents of set_asset. Is that intended? I think ObjectStoreOperation is currently yielded after set_intermediate completes.

python_modules/dagster/dagster/core/storage/asset_store.py
163

It's a little unclear to me what the "right" behavior is in this case.

One of the uses of the asset catalog is as a reverse index that maps addresses to the steps themselves. E.g. if I'm using the file at path a/b/c, I want to use the asset catalog to find which pipeline/step generated it. So with that in mind, it could make sense to make the path itself the asset key.

That potentially presents some challenges though, because the user might not want the earlier parts of the path in the asset key. E.g. I don't think I'd want every asset key to start with "/Users/sryza".

Which is all to say that it might be difficult to anticipate what users want here, and might make sense to leave it up to them to implement an AssetStore if they want to use the Asset Catalog.

This isn't exposed publicly yet, so not super high stakes. If we want to leave it this way now and discuss before we make it public, that's OK with me.

This revision is now accepted and ready to land.Wed, Nov 4, 6:32 PM
python_modules/dagster/dagster/core/execution/plan/execute_step.py
351

makes sense. will move it after yield materialization

python_modules/dagster/dagster/core/storage/asset_store.py
163

One of the uses of the asset catalog is as a reverse index that maps addresses to the steps themselves

can we allow users to search the path in the metadata entry and show them the assets with that metadata?

this is the current view of the Asset page

i'd like to leave it as it is now - i will have a better idea of it once i'm done with the example docs