Page MenuHomePhabricator

asset-storage-1 Asset
AbandonedPublic

Authored by yuhan on Sep 22 2020, 6:52 PM.

Details

Reviewers
None
Summary

implementation of Asset Storage

This diff creates Asset object to store the information of either a reference of the physical data object or the value of a data object.
A unique identifier of an Asset would be the combination of key and value its reference or value.

It lays a foundation for the intermediate and AssetMaterialization unification: each asset will point to a unique loaded or materialized data object no matter it's through dagster type loaders/materializers or intermediates, which means asset would be the base unit of a "data object" loaded to step_input or generated by step_output.

We will maintain a mapping from asset.key to a list of assets to support versioning and memoization.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Sep 22 2020, 7:09 PM
Harbormaster failed remote builds in B18584: Diff 22563!
yuhan requested review of this revision.Sep 22 2020, 9:58 PM
python_modules/dagster/dagster/core/definitions/asset.py
16

Will linking to step input / output be optional? Curious as to how this would work with materializations that don't fit into being a step input or output (if I understand correctly)

python_modules/dagster/dagster/core/definitions/asset.py
16

im kinda of imagining a world that we encourage to use materializer and yield its corresponding Output to invoke the materializer. so the asset will always tie to an input or output.
we can't control the user code and users can still do dataset.to_db() in a solid, but in that case, they won't be able to get the asset benefit because dagster isn't aware of it.