Page MenuHomePhabricator

lakehouse generate asset materializations
ClosedPublic

Authored by sandyryza on Jul 11 2020, 12:13 AM.

Details

Test Plan

ran simple_lakehouse_pipeline and verified columns show up in dagit assets

Diff Detail

Repository
R1 dagster
Branch
lakehouse-materialize-1 (branched from master)
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

examples/legacy_examples/dagster_examples/simple_lakehouse/lakehouse.py
15 ↗(On Diff #18528)

not on the critical path of this change, but made it easier to run inside an instance with an asset store.

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 11 2020, 12:24 AM
Harbormaster failed remote builds in B15118: Diff 18528!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 11 2020, 12:53 AM
Harbormaster failed remote builds in B15120: Diff 18530!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 11 2020, 1:31 AM
Harbormaster failed remote builds in B15122: Diff 18532!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 11 2020, 4:55 PM
Harbormaster failed remote builds in B15131: Diff 18541!
python_modules/libraries/lakehouse/lakehouse/asset.py
38–56

huh, i thought source assets did not end up in the DAG, what would the metadata entries here entail?

Seems like metadata_entries should be a callback

python_modules/libraries/lakehouse/lakehouse/house.py
179–182

seems like this should be a function that returns metadata entries rather than a hardcoded set. don't we want to be able to compute the metadata entries based on result?

This revision now requires changes to proceed.Jul 14 2020, 4:14 AM
sandyryza added inline comments.
python_modules/libraries/lakehouse/lakehouse/house.py
179–182

Maybe this is premature, but the world I'm imagining is one where the asset graph is useful even before we've computed the assets using dagster. Even when someone can't yet migrate a computation to dagster, they can still document it with an asset and have displayed it in a tool.

I had been imagining the metadata entries as additive. The asset itself has some metadata, and then in the future we can allow the computation to emit some metadata as well.

cc @alangenfeld - up there is the reasoning for including metadata_entries on source assets.

So I think this exposes some confusion probably around the naming. It's weird because typically metadata entries are applied to something emitted by the the system at runtime, but that's not what the Asset class is. I think this could be really confusing and lead people understandably to think that they should be yielding an Asset just like they do a an AssetMaterialization.

I think at a minimum we could renamed the metadata_entries variable to make it explicit that these metadata entries apply to the materialization that is emitted as the result of a computation, rather than being a property of the abstract asset itself.

However I do think that the a callback actually clarifies the relationship here. The way the user knows that this is a function that is invoked every time the abstract Asset actually is invoked and produces an AssetMaterialization.

This revision now requires changes to proceed.Jul 14 2020, 7:51 PM

I think at a minimum we could renamed the metadata_entries variable to make it explicit that these metadata entries apply to the materialization that is emitted as the result of a computation, rather than being a property of the abstract asset itself.

FWIW, the way I'm looking at that attribute is as properties of the abstract asset itself. E.g. the columns of a table don't depend on the computation. Including them with the AssetMaterialization is meant to be kind of a nice convenience, so that info shows up in the Assets page of Dagit, given that we don't currently have a way to represent assets independent of computation. Ideally, they'd instead be attached to some UI-consumable representation of the abstract asset - maybe the dagster type?

Am I making sense? Maybe better for me to retire this change until we have some better entity to attach this metadata to?

Maybe better for me to retire this change until we have some better entity to attach this metadata to?

this seems like a reasonable path to me but ill leave it up to you two

clearing my queue. readd if you start pushing on this again

sandyryza added reviewers: schrockn, alangenfeld, prha, yuhan.

I pared this down to leave out the metadata for now

This revision is now accepted and ready to land.Sep 24 2020, 3:12 PM
This revision was automatically updated to reflect the committed changes.