Page MenuHomeElementl

[caprisun] @asset and build_assets_job
Needs ReviewPublic

Authored by sandyryza on Jul 20 2021, 5:28 PM.

Details

Summary
@asset
def asset1():
    return 1

@asset
def asset2(asset1):
    assert asset1 == 1

job = build_assets_job("job_name", [asset1, asset2])
Test Plan

bk

Diff Detail

Repository
R1 dagster
Branch
assets (branched from master)
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 20 2021, 6:13 PM
Harbormaster failed remote builds in B33920: Diff 41896!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 20 2021, 9:54 PM
Harbormaster failed remote builds in B33942: Diff 41927!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 20 2021, 11:37 PM
Harbormaster failed remote builds in B33955: Diff 41943!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 21 2021, 10:13 PM
Harbormaster failed remote builds in B34022: Diff 42023!
sandyryza retitled this revision from assets to [RFC] @asset and build_assets_job.Jul 21 2021, 11:29 PM
sandyryza edited the summary of this revision. (Show Details)

Everything thats here seems reasonable from my perspective

One interesting problem that we've discussed in past iterations of this direction is how different @assets may materialize their input assets in to different in-memory types, controlled separately from how they are persisted. Thoughts on that? Something for a follow up?

python_modules/dagster/dagster/assets/decorators.py
11 ↗(On Diff #42032)

I think this is good, but just highlighting the decision here that we'll only have two-level-deep structure in asset keys.

@alangenfeld

One interesting problem that we've discussed in past iterations of this direction is how different @assets may materialize their input assets in to different in-memory types, controlled separately from how they are persisted. Thoughts on that? Something for a follow up?

I think this is a question even in the non-@asset world of solids and IOManagers. Our current recommendation is that, in their IOManager, users switch on the Python type attached to the InputDefinition. Here's an example in the hacker news pipeline: https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/parquet_io_manager.py#L46.

current recommendation is that, in their IOManager, users switch on the Python type attached to the InputDefinition

Makes sense - I think the ideas we were kicking around in the past (that I found intriguing) can be viewed as abstractions that provide "declarative IOManager creation". Some scheme where for some set of types we provide a toolkit for, you can declaratively specify the storage scheme and then the different in memory types you can sink-as and source-as.

As you allude to, if it ever makes sense to do something like that, it will make sense in the non-@asset world as well.

I think thats the only bit of context I had to add

python_modules/dagster/dagster/assets/decorators.py
87 ↗(On Diff #42032)

nit: targeting _Op is slightly more forward leaning

@alangenfeld

One interesting problem that we've discussed in past iterations of this direction is how different @assets may materialize their input assets in to different in-memory types, controlled separately from how they are persisted. Thoughts on that? Something for a follow up?

I think this is a question even in the non-@asset world of solids and IOManagers. Our current recommendation is that, in their IOManager, users switch on the Python type attached to the InputDefinition. Here's an example in the hacker news pipeline: https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/parquet_io_manager.py#L46.

I'm still pretty pro-io_manager_key in the @asset definition, mostly because of these analogs between the non-asset world and the asset world. Theoretically, this same exact scheme of having a single mono-IOManager fo all inputs/outputs would work essentially the same way in the non-asset world, so it seems a bit weird to force this pattern for the asset use cases and not even encourage it for the others. When I was refactoring the hacker_news IOManagers, the pattern of one IOManager per storage medium seemed to work quite well in terms of keeping things organized and understandable (and made it easy to swap out specific parts of the logic between different modes).

There's definitely some elegance to a pattern that allows writers of @assets to forget that io_managers exist at all, and having the mono-IOManager figure out what to do purely based off of types / metadata (especially for less technical users), but having an io_manager_key parameter seems purely additive (after all, the default io_manager_key is 'io_manager' -- if users want to go down that route, there's nothing stopping them).

TLDR; users familiar w/ io_manager_key patterns for their solids will likely miss them if they try out @assets, and people who want to have a single mega IOManager should be unaffected by the existence of an io_manager_key parameter.

python_modules/dagster/dagster/assets/assets_job.py
59 ↗(On Diff #42032)

nit: extra apostrophe, and maybe this should read "@assets must have exactly one output"?

@owen - I updated the diff to include io_manager_keys

sandyryza retitled this revision from [RFC] @asset and build_assets_job to @asset and build_assets_job.Jul 29 2021, 3:26 AM
sandyryza marked an inline comment as done.
sandyryza retitled this revision from @asset and build_assets_job to [RFC] @asset and build_assets_job.

up

I added docstrings and beefed up the tests - I think this is ready to go in for others to build on top of. It'll be hidden behind dagster.assets, so still not part of the public API.

sandyryza retitled this revision from [RFC] @asset and build_assets_job to @asset and build_assets_job.Jul 29 2021, 4:43 PM
sandyryza retitled this revision from @asset and build_assets_job to [caprisun] @asset and build_assets_job.Aug 2 2021, 8:13 PM