Page MenuHomeElementl

[RFC] Resource keys re-mapping

Authored by cdecarolis on Dec 4 2020, 5:55 PM.



This diff presents a (messy) prototype for how resource key remapping could work. It aims to solve the problem outlined in This diff includes implementations of resource re-mapping for solids and intermediate storage.

Test Plan

Added unit tests for most pertinent use cases (planning on expanding this)

Diff Detail

R1 dagster
Lint Passed
No Test Coverage

Event Timeline

cdecarolis added inline comments.

will be getting rid of this


I'm probably going to get rid of this method, unless I end up with more logic here.

i think this looks right overall


hmm the name thing here is kind of brutal - same goes for the configured version above, wonder if there isn't a clever solution


We could potentially draw inspiration from Nick's approach to configured to avoid introducing an extra argument here. In that world, required_resource_keys would accept something like Union[str, ResourceMapping], and the internal representation would be some sort of ResourceMapping object that services the required_resource_keys and resource_mapping properties.

Thoughts @schrockn ?


should this use orig_key and new_key?


There's a lot of duplication between this and build - would it make sense to add the resource mapping argument to to build?


These should include documentation


IMO a name like "remap_resource" would be more ergonomic and about as clear.


Do we need to accept a new name here? The user can always invoke alias if they want a new name.


Do these attributes get used for anything by other methods of the class? If not, do we need them here?

Addressed comments. Added ResourceKeyMapping class.

Something I'm particularly concerned about here is the nomenclature surrounding "original key" and "new key". I definitely played it a bit fast and loose with variable names here, and would like feedback on whether things seem clearly named.


I think a naming parameter is inevitable, given that solid definitions need to have unique names.


@schrockn @alangenfeld curious as to your thoughts on this; Sandy and I were having a discussion as to whether it's worth including the name parameter at all. The name parameter is only necessary when we use the same solid definition more than once within the same pipeline. If this is the case, having an optional name parameter seems convenient (as a user doesn't have to further alias), but also is kind of combining two operations into one.


we definitely need the name parameter since definition names have to be unique per-repository


should this be plural, i.e. remap_resource_keys?


{"foo": "not_foo"}

I've been thinking about this a bunch and name issue continues to gnaw at me. configured, while elegant, expose this problem but it is this diff that in particularly makes it more acute.

There are a few issues at play here.

  1. We are not consistent, system-wide, regarding whether or not "definitions" are named or not. Let's ignore, executors and intermediates stores (which hopefully will become resources) and talk about the different ways that solid definitions and resource definitions are treated.

    Here is one way to sketch out where we are at. Note that is assuming we have boiled down the system to resources and solids only!

    image.png (576×790 px, 62 KB)
  1. configured and solid aliasing work on different layers of abstraction levels. configured creates new definitions that are potentially repo-scoped, where as alias creates new instances of definitions that are graph-scoped. I am not saying that is a mistake but it is a schism. I also use the word "potentially" in the configured case because that pattern lends itself to creating ephemeral definitions which are never actually registered at the repo level. If you create solid definition B using a configured call on solid definition A and only use B in a graph, A will not even exist from the standpoint of the repository in question. If this same standard applies that only a rename/re-aliasing in a different scope should not create a new definition type, then resource key remapping should not create a new definition. It should only apply to the instance that is graph-scoped in that case, in which case it would apply to CallableNode, just aliasing does.

So to summarize we are in a bit of mess. I don't have any clever solutions right now, but I wanted to at least serialize my thoughts to spurn discussion


inherit from namedtuple and you'll get __eq__ and __hash__ for free

Do we need this in for 0.10.0? If yes we should figure out a path forward, likely punting on solving the name issue. If not lets revisit post release

to your queue

This revision now requires changes to proceed.Jan 5 2021, 4:09 PM

Abandoning this for now; @alangenfeld and I cooked up a much similar implementation that would work at the pipeline level. Essentially, directly providing a list of mappings to an execution.