Page MenuHomeElementl

RFC make_values_resource
ClosedPublic

Authored by yuhan on Apr 16 2021, 6:55 PM.

Details

Summary

depends on D7446

before: https://github.com/dagster-io/dagster/discussions/3213
after the change, users can pass in config:

@solid(required_resource_keys={"my_str"})
def solid1(context):
    my_str = context.resources.my_str
    # do stuff

@solid(required_resource_keys={"my_str"})
def solid2(context):
    my_str = context.resources.my_str
    # do stuff

@pipeline(mode_defs=[ModeDefinition(resource_defs={"my_str": make_values_resource()})])
def my_pipeline():
    solid1()
    solid2()

yaml:

resources:
  my_str:
    config: some_value
Test Plan

bk

docs:

Screen Shot 2021-04-26 at 12.03.03 PM.png (2×1 px, 410 KB)

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuhan requested review of this revision.Apr 16 2021, 7:37 PM

top level export + update tutorial too?

im meh on make_ but Its the best of the ideas ive seen/had. Open if someone else has a better one

sandyryza added inline comments.
python_modules/dagster/dagster/core/definitions/resource.py
287

It would be helpful to include type annotations.

291

Would be helpful to include doc similar to the config_schema docs in this diff: https://dagster.phacility.com/D7446.

292

Nit: no square brackets needed around Optional

295

Nits:

  • Square brackets not needed around ResourceDefinition.
  • Looks like sentence is incomplete.
This revision now requires changes to proceed.Apr 20 2021, 3:17 PM

should we do make_object_resource here as well ? or separate diff? feels like introducing them together in the same release makes sense

should we do make_object_resource here as well ? or separate diff? feels like introducing them together in the same release makes sense

Not convinced that it's the right thing to do, but I think it's worth considering just accepting objects anywhere ResourceDefinitions are accepted, instead of requiring to wrap in make_object_resource.

yuhan marked 4 inline comments as done.

top level expoert + update example + update api docs
follow up diff: make_object_resource

re: make_object_resource vs accepting objects anywhere ResourceDefinitions are accepted

make_object_resource is a pretty trivial change - users can do the same thing with the existing ResourceDefinition.hardcoded_resource api but make_object_resource makes it less verbose.

the latter sounds like which sounds like we will end up loosing the resource definition check by allowing Any - also the resource machinery may need to special case resource objects (which need to skip the resource init) vs resource defs (which needs resource init), which sounds like a lot system complexity but not very much user value gain (?)

This name is definitely a bit tough. If I wanted to get global variables into the pipeline make_config_resource wouldn't really lead me to it.

How about we consider offering a variables resource out of box to solve this use case directly? That what people can just add an arbitrary number rather than declare individual resources. It would be totally permissive by default but could optionally be provided config schema

context.resources.variables.a # if we want to make it feel like a tuple
context.resources.variables["a"] # if we want to make it feel like a dict

@solid(required_resource_keys={"variables"})
def solid1(context):
    my_str = context.resources.variables.my_str
    # do stuff

@solid(required_resource_keys={"variables"})
def solid2(context):
    my_str = context.resources.variables.my_str
    # do stuff

@pipeline(mode_defs=[ModeDefinition(resource_defs={"variables": make_variables_resource()})]) # config schema optional
def my_pipeline():
    solid1()
    solid2()

Opt-in typing via config system:

@pipeline(mode_defs=[ModeDefinition(resource_defs={"variables": make_variables_resource(my_str=str)})]) # config schema optional
def my_pipeline():
    solid1()
    solid2()

@schrockn are you suggesting renaming make_config_resource to make_variables_resource or making "variables" a reserved resource key?

to clarify, make_config_resource can take multiple variables at once and pass in a dict - you don't have to declare individual resources to pass in multiple values:

@solid(required_resource_keys={"variables"})
def my_solid(context):
    my_str = context.resources.variables['my_str']
    my_int = context.resources.variables['my_int']

@pipeline(
    mode_defs=[
        ModeDefinition(
            resource_defs={
                "variables": make_config_resource({"my_str": str, "my_int": int}),
            },
        )
    ]
)
def my_pipeline():
    my_solid()

if this was confusing, i will update the example on the docs

Not suggesting a reserved keyword. This is mostly about naming and norms. I think people are likely to understand intuitively what a variables_resource is used for and the config variant is harder to understand. This also allows the use of the variables resource without specifying a config schema.

If I understand correctly what Nick is suggesting, it's:

@resource
def variables(init_context):
    return init_context.resource_config
docs/content/concepts/configuration/config-schema.mdx
161

Yes effectively that with a little sugar and an approachable framing.

A couple variants:

from dagster import pipeline, resource, solid, ModeDefinition, execute_pipeline, Any

from collections import namedtuple


class Variables:
    pass


# tuple version
def make_variables_resource(**kwargs):
    class _ScopedResources(
        namedtuple("_ScopedResources", list(kwargs.keys())), Variables  # type: ignore[misc]
    ):
        def __getattr__(self, attr):
            raise Exception(f"Unknown variable {attr}")

    @resource(config_schema=kwargs)
    def _variables_resource(init_context):
        return _ScopedResources(**init_context.resource_config)

    return _variables_resource


# dict_version
# def make_variables_resource(**kwargs):
#     @resource(config_schema=kwargs or Any)
#     def _variables_resource(init_context):
#         return init_context.resource_config

#     return _variables_resource


@solid(required_resource_keys={"variables"})
def a_solid_dict_variable(context):
    print(f"VALUE: {context.resources.variables['foo']}")
    return context.resources.variables["foo"]


@solid(required_resource_keys={"variables"})
def a_solid_accessor(context):
    print(f"VALUE: {context.resources.variables.foo}")
    return context.resources.variables.foo


@pipeline(mode_defs=[ModeDefinition(resource_defs={"variables": make_variables_resource(foo=int)})])
def a_pipeline():
    a_solid_accessor()


execute_pipeline(a_pipeline, run_config={"resources": {"variables": {"config": {"foo": 2}}}})

all discussed options are reasonable to me - will defer to y'all

  • make_config_resource -> make_variables_resource (dict version)
  • update docs & examples
yuhan retitled this revision from RFC make_config_resource 1/ to RFC make_variables_resource.Apr 26 2021, 7:11 PM
yuhan edited the summary of this revision. (Show Details)
yuhan edited the test plan for this revision. (Show Details)
yuhan added a reviewer: schrockn.

@schrockn i've updated the diff to be the dict version of the make_variables_resource - plz see the docs screenshot in test plan. lmk how you feel about it

I think my preference is for the dict version. I can foresee the slack questions where someone tries to use __getitem__ where __getattr__ is expected and vice versa. IMO consistency beats convenience in this situation.

I'm also a little leery of introducing a new noun ("variable"), but don't feel strongly. Happy for someone else to make the call on that.

Sorry to keep bikeshedding @yuhan.

Reflecting on this I think the major con of this approach is that the term "variable" (1) makes it feel like an integrated feature as opposed to a user-space feature that a user could easily implement on their own, (2) it occupies the term "variable" which could potentially be used for a feature in the future and (3) it could cause confusion with Airflow variables.

other possibilities:

values
bag
dict
dictionary
globals
properties

make_dict_resource might be the most obvious since the actual resource object ends up being a dict and therefore doesn't require a new word

thoughts?

Another idea: make_ naming pattern could be a def suffix.

make_dict_resource --> dict_resource_def

@schrockn are there particular reservations you have about using "config" for the name? I suspect that most users looking for this functionality will be searching with something along the lines of "configure multiple solids" or "pipeline-level configuration".

Yeah that name in particular I find very confusing. Config is the way that you specify how something is constructed, rather than the output of that construction process. It also makes it sound like a property or feature of the config system, rather than the other way around. I'm concerned about the reaction of "Oh a config resource. I guess this means the config system has a resource concept as well?"

The question I ask is what is attached to the context.resources object. *That* is the resource. So config is attached the resources object as well?

The question I ask is what is attached to the context.resources object. *That* is the resource. So config is attached the resources object as well?

Got it. I think that line of reasoning makes total sense.

let's do make_dict_resource then?

my reservation on dict_resource_def is that most of our exported nouns don't require invocation like fs_io_manager, s3_resource, so some verb-led name would be good in this case

Cool. Agree with your reasoning on the _def suffix! Was just spitballing. Req'ing for q mgmt.

This revision now requires changes to proceed.Apr 27 2021, 8:22 PM
yuhan edited the summary of this revision. (Show Details)

make_values_resource

as i was updating the docs, i realized that make_dict_resource could be confusing bc if the user doesn't specify a schema, the actual resource object will be in any type not dict, like:

image.png (1×1 px, 231 KB)

so im renaming it to be make_values_resource, reasoning:

  • the con of "values" is the term could mean anything but i did find myself writing "you can share values between solids", so i think it's not a bad idea.
  • alternatively, we could still use the name`make_dict_resource` and always return a dict - when there's no config schema, the returned object could be {"value": ...}. but i dont like this idea bc it's verbose

sorry for the back and forth

No problem on the thrash. think I'm the one that started (or at least continued) the bikeshedding party! We should probably use the term "values" instead of "variables" in examples, tests, and docs, but I think this is good.

docs/content/concepts/configuration/config-schema.mdx
174

probably want to change the resource key to values

examples/docs_snippets/docs_snippets/concepts/configuration/make_values_resource_config_schema.py
5–27

s/variables/values?

yuhan retitled this revision from RFC make_variables_resource to RFC make_values_resource.Apr 27 2021, 10:25 PM
yuhan edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Apr 27 2021, 11:55 PM
This revision was automatically updated to reflect the committed changes.