Page MenuHomePhabricator

(value-resource-3) Add value_resource
Changes PlannedPublic

Authored by sandyryza on Jun 21 2020, 6:36 PM.

Details

Reviewers
max
schrockn
Summary

It is a relatively common desire to want to have static values
available to all the solids in a pipeline. The way to model this is to
have a resource that ends up being the configuration value itself.
Without a specific top-level API, this is actually quite non-obvious.

This adds a new top-level api value_resource that does exactly that.

It takes a config_schema argument. It ensure that value pass in via
config fulfills that value and then makes the resource the config
value itself. There one can do something like fill out a dictionary in
config and have that dictionary avaiable across the pipeline.

Test Plan

BK

Diff Detail

Repository
R1 dagster
Branch
value-resource-3
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

schrockn created this revision.Jun 21 2020, 6:36 PM
Harbormaster returned this revision to the author for changes because remote builds failed.Jun 21 2020, 6:49 PM
Harbormaster failed remote builds in B13882: Diff 17066!
schrockn requested review of this revision.Jun 21 2020, 8:17 PM
schrockn updated this revision to Diff 17070.Jun 21 2020, 8:26 PM
schrockn edited the summary of this revision. (Show Details)
schrockn added reviewers: max, sandyryza.

up

max added a comment.Jun 22 2020, 11:53 AM

can we either create an issue to document this, or add some documentation/an example to this diff? the sugar is nice but the real issue is discoverability of this solution

Yeah definitely will document etc. Curious if you think this is good/if there is a better name

max added a comment.Jun 22 2020, 2:17 PM

it's not a thrilling name but i don't think i have anything better. plain value is too general. maybe we could do something like global_value, configured_value, configurable_value. scalar_resource feels wrong.

alternatively we *could* sugar this a level above, in resource_defs, perhaps allowing ppl to pass a config field directly as a value in that dict (?) or maybe piggybacking on dataclasses as in https://github.com/dagster-io/dagster/issues/2545

I'm pretty skeptical about approaching the dataclass approach in the core at this point tbh. I'd be curious about something building it as a layer on top, but ideally that would be ecosystem-driven. It's a lot of surface area to add

sandyryza added a comment.EditedJun 22 2020, 3:09 PM

Not sure whether I think this is better, but wanted to throw out an idea:

It seems like half the advantage of this API is that, when a resource needs no initializer, users get to avoid the boilerplate of defining a resource_fn that passes through the init_context.

I suspect that there are other situations where it would be nice to avoid that boilerplate. E.g. what if users could do:

@no_init_resource(config_schema={'root': str})
class LocalFileSystem():
    def get_absolute_path(relative_path: str):
        return os.path.join(self.config['root'], relative_path)

and then they could get something similar to value_resource with no_init_resource(config_schema=some_schema)(object)

The implementation would look something like:

def no_init_resource(config_schema):
    def _no_init_resource(callable):
        @resource
        def inner_resource(init_context):
            resource_obj = callable()
            resource_obj.config = init_context.resource_config
            return resource_obj
        return inner_resource
    return _no_init_resource
python_modules/dagster/dagster/core/definitions/resource.py
177

If I understand correctly, "value" here corresponds to the "resource_config" member of the init_context passed in at resource creation. That didn't pop out as obvious to me until I read this implementation. I wonder whether this would be clearer as config_value_resource or pure_config_resource or direct_config_resource or something?

schrockn added a comment.EditedJun 22 2020, 3:27 PM

In general I have a pretty strong reaction against schemes which completely abstract away object construction like that. It is quite magical, makes things difficult to test, feel brittle, creates an artificial 1:1 mapping between class and resource etc. In particular having the interface be "you must have a __init__ with no parameters a public settable property" really grates.

The entire codebase skews heavily towards immutable objects and I want to keep it that way.

Having a some decorators apply to classes and some to function would also leave us in a world where some resources are snake_case_resource and some are CamelCaseResource which feels awful.

config_value_resource is probably a better name than value_resource.

sandyryza added a comment.EditedJun 22 2020, 4:04 PM

Having a some decorators apply to classes and some to function would also leave us in a world where some resources are snake_case_resource and some are CamelCaseResource which feels awful.

FWIW if I were a dagster user I would use the @resource decorator on classes. I.e. this works and has less boilerplate than the alternative:

@resource(config_schema={'root': str})
class LocalFileSystem():
    def __init__(self, init_context):
        self.config = init_context.resource_config

    def get_absolute_path(relative_path: str):
        return os.path.join(self.config['root'], relative_path)
max added a comment.Jun 23 2020, 6:14 PM

pass_through_config_value, pipeline_config_value

@sandyryza

feel free to commandeer this and incorporate it into configured work. this is by no means urgent

sandyryza commandeered this revision.Jun 24 2020, 5:38 PM
sandyryza edited reviewers, added: schrockn; removed: sandyryza.
sandyryza planned changes to this revision.Jun 24 2020, 5:41 PM