Page MenuHomePhabricator

(value-resource-3) Add value_resource

Authored by sandyryza on Jun 21 2020, 6:36 PM.



It is a relatively common desire to want to have static values
available to all the solids in a pipeline. The way to model this is to
have a resource that ends up being the configuration value itself.
Without a specific top-level API, this is actually quite non-obvious.

This adds a new top-level api value_resource that does exactly that.

It takes a config_schema argument. It ensure that value pass in via
config fulfills that value and then makes the resource the config
value itself. There one can do something like fill out a dictionary in
config and have that dictionary avaiable across the pipeline.

Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 21 2020, 6:49 PM
Harbormaster failed remote builds in B13882: Diff 17066!
schrockn edited the summary of this revision. (Show Details)
schrockn added reviewers: max, sandyryza.


can we either create an issue to document this, or add some documentation/an example to this diff? the sugar is nice but the real issue is discoverability of this solution

Yeah definitely will document etc. Curious if you think this is good/if there is a better name

it's not a thrilling name but i don't think i have anything better. plain value is too general. maybe we could do something like global_value, configured_value, configurable_value. scalar_resource feels wrong.

alternatively we *could* sugar this a level above, in resource_defs, perhaps allowing ppl to pass a config field directly as a value in that dict (?) or maybe piggybacking on dataclasses as in

I'm pretty skeptical about approaching the dataclass approach in the core at this point tbh. I'd be curious about something building it as a layer on top, but ideally that would be ecosystem-driven. It's a lot of surface area to add

Not sure whether I think this is better, but wanted to throw out an idea:

It seems like half the advantage of this API is that, when a resource needs no initializer, users get to avoid the boilerplate of defining a resource_fn that passes through the init_context.

I suspect that there are other situations where it would be nice to avoid that boilerplate. E.g. what if users could do:

@no_init_resource(config_schema={'root': str})
class LocalFileSystem():
    def get_absolute_path(relative_path: str):
        return os.path.join(self.config['root'], relative_path)

and then they could get something similar to value_resource with no_init_resource(config_schema=some_schema)(object)

The implementation would look something like:

def no_init_resource(config_schema):
    def _no_init_resource(callable):
        def inner_resource(init_context):
            resource_obj = callable()
            resource_obj.config = init_context.resource_config
            return resource_obj
        return inner_resource
    return _no_init_resource

If I understand correctly, "value" here corresponds to the "resource_config" member of the init_context passed in at resource creation. That didn't pop out as obvious to me until I read this implementation. I wonder whether this would be clearer as config_value_resource or pure_config_resource or direct_config_resource or something?

In general I have a pretty strong reaction against schemes which completely abstract away object construction like that. It is quite magical, makes things difficult to test, feel brittle, creates an artificial 1:1 mapping between class and resource etc. In particular having the interface be "you must have a __init__ with no parameters a public settable property" really grates.

The entire codebase skews heavily towards immutable objects and I want to keep it that way.

Having a some decorators apply to classes and some to function would also leave us in a world where some resources are snake_case_resource and some are CamelCaseResource which feels awful.

config_value_resource is probably a better name than value_resource.

Having a some decorators apply to classes and some to function would also leave us in a world where some resources are snake_case_resource and some are CamelCaseResource which feels awful.

FWIW if I were a dagster user I would use the @resource decorator on classes. I.e. this works and has less boilerplate than the alternative:

@resource(config_schema={'root': str})
class LocalFileSystem():
    def __init__(self, init_context):
        self.config = init_context.resource_config

    def get_absolute_path(relative_path: str):
        return os.path.join(self.config['root'], relative_path)

pass_through_config_value, pipeline_config_value


feel free to commandeer this and incorporate it into configured work. this is by no means urgent

sandyryza edited reviewers, added: schrockn; removed: sandyryza.