@asset def asset1(): return 1 @asset def asset2(asset1): assert asset1 == 1 job = build_assets_job("job_name", [asset1, asset2])
Details
bk
Diff Detail
- Repository
- R1 dagster
- Branch
- assets (branched from master)
- Lint
Lint Passed - Unit
No Test Coverage
Event Timeline
Everything thats here seems reasonable from my perspective
One interesting problem that we've discussed in past iterations of this direction is how different @assets may materialize their input assets in to different in-memory types, controlled separately from how they are persisted. Thoughts on that? Something for a follow up?
python_modules/dagster/dagster/assets/decorators.py | ||
---|---|---|
11 ↗ | (On Diff #42032) | I think this is good, but just highlighting the decision here that we'll only have two-level-deep structure in asset keys. |
One interesting problem that we've discussed in past iterations of this direction is how different @assets may materialize their input assets in to different in-memory types, controlled separately from how they are persisted. Thoughts on that? Something for a follow up?
I think this is a question even in the non-@asset world of solids and IOManagers. Our current recommendation is that, in their IOManager, users switch on the Python type attached to the InputDefinition. Here's an example in the hacker news pipeline: https://github.com/dagster-io/dagster/blob/master/examples/hacker_news/hacker_news/resources/parquet_io_manager.py#L46.
current recommendation is that, in their IOManager, users switch on the Python type attached to the InputDefinition
Makes sense - I think the ideas we were kicking around in the past (that I found intriguing) can be viewed as abstractions that provide "declarative IOManager creation". Some scheme where for some set of types we provide a toolkit for, you can declaratively specify the storage scheme and then the different in memory types you can sink-as and source-as.
As you allude to, if it ever makes sense to do something like that, it will make sense in the non-@asset world as well.
I think thats the only bit of context I had to add
python_modules/dagster/dagster/assets/decorators.py | ||
---|---|---|
87 ↗ | (On Diff #42032) | nit: targeting _Op is slightly more forward leaning |
I'm still pretty pro-io_manager_key in the @asset definition, mostly because of these analogs between the non-asset world and the asset world. Theoretically, this same exact scheme of having a single mono-IOManager fo all inputs/outputs would work essentially the same way in the non-asset world, so it seems a bit weird to force this pattern for the asset use cases and not even encourage it for the others. When I was refactoring the hacker_news IOManagers, the pattern of one IOManager per storage medium seemed to work quite well in terms of keeping things organized and understandable (and made it easy to swap out specific parts of the logic between different modes).
There's definitely some elegance to a pattern that allows writers of @assets to forget that io_managers exist at all, and having the mono-IOManager figure out what to do purely based off of types / metadata (especially for less technical users), but having an io_manager_key parameter seems purely additive (after all, the default io_manager_key is 'io_manager' -- if users want to go down that route, there's nothing stopping them).
TLDR; users familiar w/ io_manager_key patterns for their solids will likely miss them if they try out @assets, and people who want to have a single mega IOManager should be unaffected by the existence of an io_manager_key parameter.
python_modules/dagster/dagster/assets/assets_job.py | ||
---|---|---|
59 ↗ | (On Diff #42032) | nit: extra apostrophe, and maybe this should read "@assets must have exactly one output"? |
I added docstrings and beefed up the tests - I think this is ready to go in for others to build on top of. It'll be hidden behind dagster.assets, so still not part of the public API.