Page MenuHomePhabricator

[RFC] Resource keys re-mapping
AbandonedPublic

Authored by cdecarolis on Dec 4 2020, 5:55 PM.

Details

Summary

This diff presents a (messy) prototype for how resource key remapping could work. It aims to solve the problem outlined in https://github.com/dagster-io/dagster/issues/2112. This diff includes implementations of resource re-mapping for solids and intermediate storage.

Test Plan

Added unit tests for most pertinent use cases (planning on expanding this)

Diff Detail

Repository
R1 dagster
Branch
resource_mapping
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

cdecarolis added inline comments.
python_modules/dagster/dagster/core/definitions/intermediate_storage.py
47

will be getting rid of this

python_modules/dagster/dagster/core/definitions/resource_mappable.py
18

I'm probably going to get rid of this method, unless I end up with more logic here.

i think this looks right overall

python_modules/dagster/dagster/core/definitions/solid.py
151–154

hmm the name thing here is kind of brutal - same goes for the configured version above, wonder if there isn't a clever solution

python_modules/dagster/dagster/core/definitions/intermediate_storage.py
36

We could potentially draw inspiration from Nick's approach to configured to avoid introducing an extra argument here. In that world, required_resource_keys would accept something like Union[str, ResourceMapping], and the internal representation would be some sort of ResourceMapping object that services the required_resource_keys and resource_mapping properties.

Thoughts @schrockn ?

103

should this use orig_key and new_key?

python_modules/dagster/dagster/core/definitions/resource.py
236

There's a lot of duplication between this and build - would it make sense to add the resource mapping argument to to build?

python_modules/dagster/dagster/core/definitions/resource_mappable.py
5

These should include documentation

15

IMO a name like "remap_resource" would be more ergonomic and about as clear.

python_modules/dagster/dagster/core/definitions/solid.py
151

Do we need to accept a new name here? The user can always invoke alias if they want a new name.

python_modules/dagster/dagster/core/execution/context/compute.py
59

Do these attributes get used for anything by other methods of the class? If not, do we need them here?

Addressed comments. Added ResourceKeyMapping class.

Something I'm particularly concerned about here is the nomenclature surrounding "original key" and "new key". I definitely played it a bit fast and loose with variable names here, and would like feedback on whether things seem clearly named.

python_modules/dagster/dagster/core/definitions/solid.py
151

I think a naming parameter is inevitable, given that solid definitions need to have unique names.

python_modules/dagster/dagster/core/definitions/solid.py
151

@schrockn @alangenfeld curious as to your thoughts on this; Sandy and I were having a discussion as to whether it's worth including the name parameter at all. The name parameter is only necessary when we use the same solid definition more than once within the same pipeline. If this is the case, having an optional name parameter seems convenient (as a user doesn't have to further alias), but also is kind of combining two operations into one.

python_modules/dagster/dagster/core/definitions/solid.py
150–169

we definitely need the name parameter since definition names have to be unique per-repository

python_modules/dagster/dagster/core/definitions/resource_mappable.py
38

should this be plural, i.e. remap_resource_keys?

69

{"foo": "not_foo"}

I've been thinking about this a bunch and name issue continues to gnaw at me. configured, while elegant, expose this problem but it is this diff that in particularly makes it more acute.

There are a few issues at play here.

  1. We are not consistent, system-wide, regarding whether or not "definitions" are named or not. Let's ignore, executors and intermediates stores (which hopefully will become resources) and talk about the different ways that solid definitions and resource definitions are treated.

    Here is one way to sketch out where we are at. Note that is assuming we have boiled down the system to resources and solids only!

    image.png (576×790 px, 62 KB)
  1. configured and solid aliasing work on different layers of abstraction levels. configured creates new definitions that are potentially repo-scoped, where as alias creates new instances of definitions that are graph-scoped. I am not saying that is a mistake but it is a schism. I also use the word "potentially" in the configured case because that pattern lends itself to creating ephemeral definitions which are never actually registered at the repo level. If you create solid definition B using a configured call on solid definition A and only use B in a graph, A will not even exist from the standpoint of the repository in question. If this same standard applies that only a rename/re-aliasing in a different scope should not create a new definition type, then resource key remapping should not create a new definition. It should only apply to the instance that is graph-scoped in that case, in which case it would apply to CallableNode, just aliasing does.

So to summarize we are in a bit of mess. I don't have any clever solutions right now, but I wanted to at least serialize my thoughts to spurn discussion

python_modules/dagster/dagster/core/definitions/resource_mappable.py
101

inherit from namedtuple and you'll get __eq__ and __hash__ for free

Do we need this in for 0.10.0? If yes we should figure out a path forward, likely punting on solving the name issue. If not lets revisit post release

to your queue

This revision now requires changes to proceed.Jan 5 2021, 4:09 PM

Abandoning this for now; @alangenfeld and I cooked up a much similar implementation that would work at the pipeline level. Essentially, directly providing a list of mappings to an execution.