Page MenuHomePhabricator

Only initialize required resources during step execution
ClosedPublic

Authored by prha on Nov 26 2019, 10:23 PM.

Details

Summary

Uses required_resource_keys to provide Resources objects in the context with *only* those keys. If the user does not specify required_resource_keys in a solid, no resources are provided to the context. This allows us to be more selective about what resources are spun up for different pipeline subset executions.

For example, a pipeline with an expensive resource (such as a Spark context) will spin up the context even if the current pipeline or execution plan subset does not require it. With this change, solids are forced to declare their resource dependencies and therefore only the resources required by the solids about to be executed will be booted up.

Custom types and plugins within types that required resources are not covered in this diff and will be addressed in a following diff.

Resolves https://github.com/dagster-io/dagster/issues/1937

Test Plan

BK

Diff Detail

Repository
R1 dagster
Branch
scope-resource-creation
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Harbormaster failed remote builds in B5615: Diff 6958!
Harbormaster failed remote builds in B5614: Diff 6957!
schrockn updated this revision to Diff 6972.Nov 27 2019, 11:39 PM
schrockn edited the summary of this revision. (Show Details)

upmessage

schrockn updated this revision to Diff 6984.Nov 28 2019, 12:47 AM
schrockn retitled this revision from Only boot up required resources to Add strict mod to pipelines and only initialized required resources.
schrockn edited the summary of this revision. (Show Details)
schrockn added reviewers: alangenfeld, max, nate.

upmessage

schrockn updated this revision to Diff 6986.Nov 28 2019, 12:52 AM
schrockn edited the summary of this revision. (Show Details)

upmessage

nate accepted this revision.Dec 1 2019, 4:46 PM
nate added inline comments.
python_modules/dagster/dagster/core/definitions/system_storage.py
81

I doubt anyone is using system_storage in the wild, but note this is a small breaking external API change

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
102

in the short term, it would be nice to be able to set an environment variable like DAGSTER_RESOURCES_STRICT_MODE to override this; that would make it pretty easy to burn down the list of tests and ensure all resource requirements are specified

python_modules/dagster/dagster_tests/core_tests/test_strict_resources.py
274

maybe worth also adding a test raising an error when a solid tries to consume a resource it didn't specify as a requirement?

This revision is now accepted and ready to land.Dec 1 2019, 4:46 PM

Migrating the test suite is a pretty modest change: https://dagster.phacility.com/D1524

schrockn marked an inline comment as done.Dec 2 2019, 10:25 PM
schrockn added inline comments.
python_modules/dagster/dagster_tests/core_tests/test_strict_resources.py
274

test_filter_out_resources does not. currently we don't raise a special error for that situation. I could change it so that the generated class has a special method that gives a nicer error message in that case.

schrockn added inline comments.Dec 2 2019, 10:35 PM
python_modules/dagster/dagster_tests/core_tests/test_strict_resources.py
274

test_filter_out_resources does this test, i meant

schrockn retitled this revision from Add strict mod to pipelines and only initialized required resources to Add strict mode to pipelines and only initialized required resources.Dec 2 2019, 10:40 PM
schrockn edited the summary of this revision. (Show Details)

should there be a test for multiprocess engine / execute plan APIs that ensures unused resources are not instantiated?

python_modules/dagster/dagster/core/definitions/decorators.py
820

still feels off to me to spec this here - feels like it at least belongs on mode next to the other resource information

python_modules/dagster/dagster/core/definitions/solid.py
392–393

if I add a new solid to a composite and don't specify its requirements (where previously all of them did) what failure case do i hit?

python_modules/dagster/dagster/core/definitions/system_storage.py
81

if were going to land this diff we should note it in changes.md so we dont forget

python_modules/dagster/dagster/core/execution/api.py
194

this is a little confusing since its just the non-composite solid names [1]

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
156

this argument feels really weird compared to the others

158

doubly odd since its very specific information that is just getting passed in to this sibling argument

python_modules/dagster/dagster/core/execution/plan/plan.py
319–320

[1]

Do we have an exact game plan for all this yet? I would push for deferring landing anything and trying to jump right to the end state unless there is a clear user need to satisfy.

πŸ‘πŸ» @alangenfeld agree let's chat when you are in tmrw

alangenfeld requested changes to this revision.Dec 3 2019, 9:03 PM

from in person discussion:

  • move strict mode arg to mode (maybe rename it to describe the behavior of selective instantiation it turns on)
  • remove required_resource_keys from composite solid and resolve that change up through the graphql layer
  • explore cleaner options for providing the information currently handled by solid_def_names_in_execution
This revision now requires changes to proceed.Dec 3 2019, 9:03 PM

@alangenfeld I've been trying to isolate what bothers me about attaching this behavior to mode. I think it is the following scenario.

You have two modes, one strict, one not.

The non-strict mode is used for local testing.

@solid
def something(context):
   context.resources.some_resource.it_works()

If you run this in a the non-strict local testing model it works fine as the resources are unconditionally created. However in strict mode some_resource would not be created and this error would only be caught at runtime. Seems pretty wrong to me. I think having this behavior attached to Mode actually allows for too much flexibility and can get a user into quite the confusing situation.

alangenfeld requested changes to this revision.Dec 10 2019, 10:25 PM

thats fair, though now that we're on the 0.7.0 train should we just skip the resources_strict_mode flag and just go for it?

thoughts on [A] & [B] from previous comments still hold

python_modules/dagster/dagster/core/definitions/solid.py
388–396

[A]

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
156

[B]

This revision now requires changes to proceed.Dec 10 2019, 10:25 PM
prha commandeered this revision.Jan 23 2020, 11:25 PM
prha added a reviewer: schrockn.
prha planned changes to this revision.Jan 24 2020, 12:33 AM
prha updated this revision to Diff 9011.Jan 28 2020, 1:51 AM
  • remove strict mode flag
  • add change log
  • add better error message
  • added composite resource test
  • switched solid resource argument to execution plan
  • removed NoResourceReqsSpecifiedSentinel
  • marked resources for airline_demo notebooks (HEAD -> prha/required_resources)
alangenfeld accepted this revision.Jan 28 2020, 10:11 PM

I'm ok handling the clean-up refactoring in a follow up but lets make sure not to lose track of any of it

python_modules/dagster/dagster/core/definitions/resource.py
139–157

can drop this right

python_modules/dagster/dagster/core/definitions/solid.py
183–187

is there not opt_set_param? maybe add it?

python_modules/dagster/dagster/core/definitions/system_storage.py
81

make sure to mention this in changes.md

python_modules/dagster/dagster/core/errors.py
247–262 β†—(On Diff #9011)

do we have enough info on the context to know if the resource was on the mode def we are using?

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
150–170

should we refactor this to just thread through execution_plan since the info is all on there?

python_modules/dagster/dagster/core/execution/plan/plan.py
319–320

I am pretty sure this is wrong (not the impl of the function as its named but what its used for) but "fixing" this requires resource mapping working / to be fixed

322–323

think this could look more like [2] - double check the composition of these functions

335

[2]

This revision is now accepted and ready to land.Jan 28 2020, 10:11 PM
prha marked 8 inline comments as done.Jan 28 2020, 11:53 PM
prha added inline comments.Jan 29 2020, 12:27 AM
python_modules/dagster/dagster/core/errors.py
247–262 β†—(On Diff #9011)

We have enough info on the context, but we'd have to plumb that through to the ScopedResource object if we wanted to add a more descriptive error message. What did you have in mind?

python_modules/dagster/dagster/core/execution/context_creation_pipeline.py
150–170

I have this done in the next diff in the stack (factoring in the input_hydration_config for custom types)

prha updated this revision to Diff 9059.Jan 29 2020, 12:32 AM
  • comments from alex
prha retitled this revision from Add strict mode to pipelines and only initialized required resources to Only initialize required resources during step execution.Jan 29 2020, 12:57 AM
prha edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.