This diff presents a (messy) prototype for how resource key remapping could work. It aims to solve the problem outlined in https://github.com/dagster-io/dagster/issues/2112. This diff includes implementations of resource re-mapping for solids and intermediate storage.
Details
- Reviewers
alangenfeld yuhan sandyryza schrockn nate
Added unit tests for most pertinent use cases (planning on expanding this)
Diff Detail
- Repository
- R1 dagster
- Branch
- resource_mapping
- Lint
Lint OK - Unit
No Unit Test Coverage
Event Timeline
i think this looks right overall
python_modules/dagster/dagster/core/definitions/solid.py | ||
---|---|---|
150–153 | hmm the name thing here is kind of brutal - same goes for the configured version above, wonder if there isn't a clever solution |
python_modules/dagster/dagster/core/definitions/intermediate_storage.py | ||
---|---|---|
36 | We could potentially draw inspiration from Nick's approach to configured to avoid introducing an extra argument here. In that world, required_resource_keys would accept something like Union[str, ResourceMapping], and the internal representation would be some sort of ResourceMapping object that services the required_resource_keys and resource_mapping properties. Thoughts @schrockn ? | |
102 | should this use orig_key and new_key? | |
python_modules/dagster/dagster/core/definitions/resource.py | ||
235 | There's a lot of duplication between this and build - would it make sense to add the resource mapping argument to to build? | |
python_modules/dagster/dagster/core/definitions/resource_mappable.py | ||
4 | These should include documentation | |
14 | IMO a name like "remap_resource" would be more ergonomic and about as clear. | |
python_modules/dagster/dagster/core/definitions/solid.py | ||
150 | Do we need to accept a new name here? The user can always invoke alias if they want a new name. | |
python_modules/dagster/dagster/core/execution/context/compute.py | ||
58 | Do these attributes get used for anything by other methods of the class? If not, do we need them here? |
Something I'm particularly concerned about here is the nomenclature surrounding "original key" and "new key". I definitely played it a bit fast and loose with variable names here, and would like feedback on whether things seem clearly named.
python_modules/dagster/dagster/core/definitions/solid.py | ||
---|---|---|
150 | I think a naming parameter is inevitable, given that solid definitions need to have unique names. |
python_modules/dagster/dagster/core/definitions/solid.py | ||
---|---|---|
150 | @schrockn @alangenfeld curious as to your thoughts on this; Sandy and I were having a discussion as to whether it's worth including the name parameter at all. The name parameter is only necessary when we use the same solid definition more than once within the same pipeline. If this is the case, having an optional name parameter seems convenient (as a user doesn't have to further alias), but also is kind of combining two operations into one. |
python_modules/dagster/dagster/core/definitions/solid.py | ||
---|---|---|
150 | we definitely need the name parameter since definition names have to be unique per-repository |
I've been thinking about this a bunch and name issue continues to gnaw at me. configured, while elegant, expose this problem but it is this diff that in particularly makes it more acute.
There are a few issues at play here.
- We are not consistent, system-wide, regarding whether or not "definitions" are named or not. Let's ignore, executors and intermediates stores (which hopefully will become resources) and talk about the different ways that solid definitions and resource definitions are treated.
Here is one way to sketch out where we are at. Note that is assuming we have boiled down the system to resources and solids only!
- configured and solid aliasing work on different layers of abstraction levels. configured creates new definitions that are potentially repo-scoped, where as alias creates new instances of definitions that are graph-scoped. I am not saying that is a mistake but it is a schism. I also use the word "potentially" in the configured case because that pattern lends itself to creating ephemeral definitions which are never actually registered at the repo level. If you create solid definition B using a configured call on solid definition A and only use B in a graph, A will not even exist from the standpoint of the repository in question. If this same standard applies that only a rename/re-aliasing in a different scope should not create a new definition type, then resource key remapping should not create a new definition. It should only apply to the instance that is graph-scoped in that case, in which case it would apply to CallableNode, just aliasing does.
So to summarize we are in a bit of mess. I don't have any clever solutions right now, but I wanted to at least serialize my thoughts to spurn discussion
python_modules/dagster/dagster/core/definitions/resource_mappable.py | ||
---|---|---|
100 | inherit from namedtuple and you'll get __eq__ and __hash__ for free |
Do we need this in for 0.10.0? If yes we should figure out a path forward, likely punting on solving the name issue. If not lets revisit post release