Page MenuHomePhabricator

[RFC] Add NamedConfigurable and AnonymousConfigurable, add configured to API docs.
ClosedPublic

Authored by cdecarolis on Jan 22 2021, 7:46 PM.

Details

Summary

This diff aims to solve the naming issue that was introduced by the configured API. It also adds configured to API docs.

Naming Issue
The configured API performs a Definition -> Definition transformation. There are various definition types where configurable is useful, hence having a single interface between each, but there is variation between the required parameters of different definition types. This creates awkward API ergonomics under certain scenarios. This is most acute in the case of the name parameter. Consider that SolidDefinitions must have globally unique names, while ResourceDefinition has no name parameter whatsoever.

Possible Solutions
Current Solution
The current solution is that name exists as an optional parameter on the configured function, which will cause an error if not provided to a SolidDefinition, and cause an error if provided to a ResourceDefinition.

CallableNode Solution
One possible solution is to change the configured API to emit a CallableNode instead of a SolidDefinition, similar to how the alias, tag, and with_hook functions work (for the solid case). CallableNode(s) can be auto-aliased, avoiding the need for the name parameter.

However, CallableNodes are pipeline scoped, meaning they can only be declared within the context of a pipeline (can't be reused between pipelines). This would create a discrepancy between how configured is used on solids, vs how it is used on other definition types.

I think putting configured on CallableNode is also a conceptual mismatch. configured is a form of currying. By transforming the set of inputs (considering config schema as an input), configured is fundamentally changing the solid, whereas fxns like alias and tag are changing metadata surrounding the solid.

In Dagit, we can see this dichotomy as a solid "invocation" vs a solid definition. CallableNodes are linked back to the original definition when invoked.

Overall, it seems that solving the problem in this way leads to:

  1. constraining the scenarios in which one can use the API
  2. Conceptual thrash (we don't treat CallableNodes as distinct entities, while viewing things from a currying perspective suggests we should)
  3. Represented weirdly in Dagit

Split interface solution
The other solution (and the one I will be advocating for) would be to simply have different interfaces for different types of definitions, to mirror the parameters that they require.
To illustrate this, we would have an API NamedConfigurableDefinition interface which has a positional arg for name on the configured function, and an AnonymousConfigurableDefinition interface which has no name arg.

This trades off API parity between configured on different definition types, for clarity about required parameters. While this could potentially create confusion for someone using configured on two different APIs, I think this is a much more momentary confusion than the confusing cases in each alternative.

Tradeoffs

Confusing case in new interfaces approach: hm I used solid.configured and it wanted a name, and I used resource.configured and it didn't want a name.
Confusing case in CallableNode approach: hm I tried to use solid.configured outside of pipeline scope and it didn't work
Confusing case in current approach: hm I tried to use solid.configured, name was a positional arg so I didn't provide it, but then I received an error. Why is name a positional arg if it is required? || hm I tried to use resource.configured, name was a positional arg so I provided it, but then I received an error. Why is name a positional arg if it isn't usable?

In my opinion, surmounting the confusing case in the "new interfaces approach" will be a lot easier for a user, and also a lot easier for us to communicate (we can add a note to the documentation saying "hey solids have this constraint of being uniquely named so we have to provide a name parameter", and that is (imo) less conceptual overhead than communicating in the other scenarios.

From an internal perspective, I think having different interfaces for different definitions is an acknowledgement of this uniqueness constraint on solids vs other types of definitions, and is better than workarounds to pretend it isn't there.

Other potential weirdness
The configured decorator. Name is still a positional arg on this, which is probably fine because we just steal the function name, but it is a bit confusing.

Since providing a string is potentially valid config, mismatch between name and config is possible.

Test Plan

Wrote up a test for how I'm envisioning configured to be used, and what it looks like to use configured on different types of definitions.\

How configured displays in API docs for each definition type:
solid:


executor:

logger:

resource:

composite solid:

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 22 2021, 8:05 PM
Harbormaster failed remote builds in B24725: Diff 30115!
Harbormaster failed remote builds in B24726: Diff 30116!
cdecarolis retitled this revision from Add NamedConfigurable and AnonymousConfigurable to [RFC] Add NamedConfigurable and AnonymousConfigurable.Jan 22 2021, 9:18 PM
cdecarolis edited the summary of this revision. (Show Details)
cdecarolis edited the test plan for this revision. (Show Details)
cdecarolis edited the summary of this revision. (Show Details)

Fixed issue with configured decorator

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 25 2021, 6:07 PM
Harbormaster failed remote builds in B24813: Diff 30215!

Fixed a bunch of errors, got rid of a confusing code path.

python_modules/dagster/dagster_tests/core_tests/config_types_tests/evaluator_tests/test_composite_descent.py
808

Got rid of these tests because they reflect improper usage (configured is supposed to be used as a decorator, and if a fxn is provided, this error will never occur.

nice thanks for the detailed write up

one question in my head is how to handle the breaking change here - having name as a later positional arg avoids breaking anybody - but does feel a little goofy. Is the amount of goofy it feels worth the breaking change? We could pull that out in to a separate diff that could land later for 0.11.0 which i think is what we would do if we decide that it's the right choice.

nice thanks for the detailed write up

one question in my head is how to handle the breaking change here - having name as a later positional arg avoids breaking anybody - but does feel a little goofy. Is the amount of goofy it feels worth the breaking change? We could pull that out in to a separate diff that could land later for 0.11.0 which i think is what we would do if we decide that it's the right choice.

I think it only feels silly from our end, is my guess. Keeping some shenanigans on configured solids regarding required keyword params seems better than potentially breaking someone in a weird way. Since names were disallowed on things that weren't solids, executors, or pipelines anyway, removing the name arg from those should be fine, and keeping name as a keyword on solid/pipeline will not introduce any new weirdness

That said, for resource remapping, I think we should go ahead with the name as positional

Moved name param to later positional arg, so as to not break people in between releases.

the executor breaking change would be unlikely to affect anyone and is probably on trajectory with resourceification

python_modules/dagster/dagster/core/definitions/configurable.py
56–57

"that dont have a name" would be true except for ExecutorDefinition ....

113–119

ordering here

python_modules/dagster/dagster/core/definitions/executor.py
18

im not sure anyone is using this practice, but this actually removes the ability to name the configured executor which is a breaking change

21

nit: this isnt optional

python_modules/dagster/dagster/core/definitions/executor.py
18

Right...it might be a bit pre-emptive to do this. I'll keep it named for now and we can change it accordingly based upon what we decide to do with executors.

python_modules/dagster/dagster/core/definitions/configurable.py
113–119

^

python_modules/dagster/dagster/core/definitions/executor.py
18

to avoid any breaking changes - we should override configured on this class to default out name and grab it from the executor if its not passed in

I think this is on trajectory to be landable, should any docs be updated?

This revision now requires changes to proceed.Thu, Feb 4, 9:39 PM

@alangenfeld will go on a docs hunt to see where configured might be used, and change accordingly. I think it might also be worth wrapping up a docs change that clearly communicates the difference between configurable on solids and everything else.

Updated note on solid.configured in docs, made executor accept name as a kwarg for backcompat

take a peak at how these methods /classes show in api docs at least

This revision is now accepted and ready to land.Fri, Feb 5, 12:00 AM
cdecarolis retitled this revision from [RFC] Add NamedConfigurable and AnonymousConfigurable to [RFC] Add NamedConfigurable and AnonymousConfigurable, add configured to API docs..Fri, Feb 5, 4:29 PM
cdecarolis edited the summary of this revision. (Show Details)
cdecarolis edited the test plan for this revision. (Show Details)
cdecarolis edited the test plan for this revision. (Show Details)

Added modifications to rst files to display configured methods on API docs