ran simple_lakehouse_pipeline and verified columns show up in dagit assets
Details
Diff Detail
- Repository
- R1 dagster
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
examples/legacy_examples/dagster_examples/simple_lakehouse/lakehouse.py | ||
---|---|---|
15 ↗ | (On Diff #18528) | not on the critical path of this change, but made it easier to run inside an instance with an asset store. |
python_modules/libraries/lakehouse/lakehouse/asset.py | ||
---|---|---|
38–56 | huh, i thought source assets did not end up in the DAG, what would the metadata entries here entail? |
Seems like metadata_entries should be a callback
python_modules/libraries/lakehouse/lakehouse/house.py | ||
---|---|---|
179–182 | seems like this should be a function that returns metadata entries rather than a hardcoded set. don't we want to be able to compute the metadata entries based on result? |
python_modules/libraries/lakehouse/lakehouse/house.py | ||
---|---|---|
179–182 | Maybe this is premature, but the world I'm imagining is one where the asset graph is useful even before we've computed the assets using dagster. Even when someone can't yet migrate a computation to dagster, they can still document it with an asset and have displayed it in a tool. I had been imagining the metadata entries as additive. The asset itself has some metadata, and then in the future we can allow the computation to emit some metadata as well. cc @alangenfeld - up there is the reasoning for including metadata_entries on source assets. |
So I think this exposes some confusion probably around the naming. It's weird because typically metadata entries are applied to something emitted by the the system at runtime, but that's not what the Asset class is. I think this could be really confusing and lead people understandably to think that they should be yielding an Asset just like they do a an AssetMaterialization.
I think at a minimum we could renamed the metadata_entries variable to make it explicit that these metadata entries apply to the materialization that is emitted as the result of a computation, rather than being a property of the abstract asset itself.
However I do think that the a callback actually clarifies the relationship here. The way the user knows that this is a function that is invoked every time the abstract Asset actually is invoked and produces an AssetMaterialization.
I think at a minimum we could renamed the metadata_entries variable to make it explicit that these metadata entries apply to the materialization that is emitted as the result of a computation, rather than being a property of the abstract asset itself.
FWIW, the way I'm looking at that attribute is as properties of the abstract asset itself. E.g. the columns of a table don't depend on the computation. Including them with the AssetMaterialization is meant to be kind of a nice convenience, so that info shows up in the Assets page of Dagit, given that we don't currently have a way to represent assets independent of computation. Ideally, they'd instead be attached to some UI-consumable representation of the abstract asset - maybe the dagster type?
Am I making sense? Maybe better for me to retire this change until we have some better entity to attach this metadata to?
Maybe better for me to retire this change until we have some better entity to attach this metadata to?
this seems like a reasonable path to me but ill leave it up to you two