Page MenuHomeElementl

RFC (for discussion): Make executor definitions resource definitions
Needs RevisionPublic

Authored by schrockn on Dec 2 2020, 3:01 PM.

Details

Summary

I am exploring this now as I want to potentially make this
happen in the 0.11.0 timeframe, but that might require deprecations in
the 0.10.0 timeframe.

The idea here is that we move all artifacts attached to mode to be
resources. The way we would explain modes is that they are a set of
resource definitions. We could even rename mode to resource set.

The invariant in the system is that the entire purpose of a mode to
provide a set of pluggable resources, and the the purpose of a resource
is to be a thing that can very its implmentation based on mode. Any
future abstractions or concepts that vary by mode will be resources.
Full stop.

There are weird inconsistencies between resources, executors,
storages, etc atm. Examples:

  1. Only resource definitions can create their resources with a context manager allowing for cleanup. Other resource-ish things do not support that
  1. There are different argument orders to ctors
  1. Executors, storages etc have embedded names and resources do not.
  1. Only intermediate storage supports dependencies on other resources. This should be a generalized capability.

I also believe this final merge will cleanup the context initialization
process considerably.

In order to pull this off we will have to formalize the notion that some
resources are "special". For example an executor

  • Is automatically available to the entire computation and to the system
  • There can only be one of them
    • Note: to support the current default behavior where you can switch

between multiprocess and in-process out-of-the-box with just a config
change, we would need to have an executor definition that makes a
decision about which executor to issue, which would require changes in
this proposal. In particular the logic around
requires_multiprocess_safe_env

I think it will be interesting to *completely* eliminate
ExecutorDefintion entirely and have all the special behavior encoded
in the state of the ResourceDefinition itself, but we can defer that
discussion as it doesn't change the essence of what is going on here.

So what does this do diff do precisely?

  1. ExecutorDefinition now inherits from ResourceDefinition
  2. InitExecutorContext is gone. InitResourceContext has been expanded to cover the properties that InitExecutorContext has
  3. If you pass in a resource definition that is an executor definition, this "takes over" and eliminates the execution section of the config. Mode def constructor ensures that the user does not pass in more than one.
  4. Moves the check around where an environment is multiprocess-safe into the core. I believe that this should be a generalized feature of resources, but this diff does not do that. I think this is clear code quality and modularity win and I think we should do that change standlone.
Test Plan

BK

Diff Detail

Repository
R1 dagster
Branch
system-resource-exploration-merged
Lint
Lint Passed
Unit
No Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 2 2020, 3:20 PM
Harbormaster failed remote builds in B22057: Diff 26779!
Harbormaster returned this revision to the author for changes because remote builds failed.Dec 2 2020, 4:53 PM
Harbormaster failed remote builds in B22080: Diff 26806!
schrockn retitled this revision from Merged All the things to RFC (for discussion): Make executor definitions resource definitions.
schrockn edited the summary of this revision. (Show Details)

up

python_modules/dagster/dagster/core/definitions/environment_configs.py
156–159

no executor defs mean that it was specified in the resource_defs dictionary

python_modules/dagster/dagster/core/definitions/executor.py
11

this now inherits from ResourceDefinition. It would be potentially cool to eliminate this entirely and just make it a direct instantiation of ResourceDefinition (which would have to have additional state)

132

this function now receives an InitResourceContext

202–218

all this logic was moved the core in ensure_multiprocess_safe.py. this made it so we could reduce the surface area of InitExecutorContext and consolidate on InitResourceContext

python_modules/dagster/dagster/core/definitions/mode.py
69–78

here is the logic to grab the executor if it exists. there can only be zero or 1 of them. If zero we fall back to the old behavior

python_modules/dagster/dagster/core/definitions/resource.py
15–22

i made this during the creation this diff and don't strictly need it.

90–95

I wrapped this check behind a property to allow for future experimentation.

For example I have a diff locally that changes the ResourceDefinition to take the expected type of the returned resource itself (e.g. IExecutor or IStepLauncher) and make that the test in question, rather than checking the instance of the definition type.

python_modules/dagster/dagster/core/execution/context/init.py
70–85

there are properties on the old InitExecutorContext that are no longer supported

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
48–61

again, config hijacking in the executor-is-a-resource case

python_modules/libraries/dagster-celery-docker/dagster_celery_docker/executor.py
132

see we can eliminate all of these

schrockn published this revision for review.Dec 2 2020, 5:19 PM

Very in favor of getting rid of stuff on mode.

There was some discussion of moving executor to the instance and combining it with run launcher. Is that no longer under consideration?

python_modules/dagster/dagster/core/definitions/resource.py
91

It feels weird for ResourceDefinition to be aware of its subtypes. This also adds surface area to the public API that we'd likely want to eliminate later. If we want to abstract out this check, maybe that should live inside ModeDefinition for now?

I think you bring up a great point with the Launcher vs Executor front. @alangenfeld @sashank where did we end up on that?

python_modules/dagster/dagster/core/definitions/resource.py
91

The circular dep makes it feel like a hack. one could just define that property and the override it to be true in th ExecutorDefinition. Or have an enumeration that encodes all of the "special" resources. Call it resource_kind. This feels a little less hacky (no circular dep) but is in effect that same thing. This is how the config and the dagster type system encode this type of thing.

Either way however there is centralized knowledge somewhere that there are "special" resources that the system knows about and treats differently.

There was some discussion of moving executor to the instance and combining it with run launcher. Is that no longer under consideration?
I think you bring up a great point with the Launcher vs Executor front. @alangenfeld @sashank where did we end up on that?

@sashank will have to speak to anything he discerned in the brief time he spent exploring before getting pulled in to other stuff.

The big question that comes to mind for me is what do we want running in a host process and what should run in a user process.

Currently its launcher in host and executor in user, I *think* what we want is to move towards them both to being host processes, theres not really a way to get to multi-container graphs without that. It's not a small task to get there. There could be a rational argument to having the launch be user process driven thing too, but it's less clear to me off the bat.

clearing queue for 0.10.0 burn down - request review post release if there is more discussion this can generate

This revision now requires changes to proceed.Jan 5 2021, 4:10 PM