Page MenuHomePhabricator

configured
ClosedPublic

Authored by sandyryza on May 28 2020, 8:43 PM.

Details

Summary

This adds the configured API and makes resources config-mappable.

Test Plan

get feedback on direction before testing

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sandyryza created this revision.May 28 2020, 8:43 PM
sandyryza edited the summary of this revision. (Show Details)May 28 2020, 8:50 PM
sandyryza edited the summary of this revision. (Show Details)
sandyryza updated this revision to Diff 15066.May 28 2020, 8:52 PM
sandyryza edited the summary of this revision. (Show Details)

fix example

Harbormaster returned this revision to the author for changes because remote builds failed.May 28 2020, 9:05 PM
Harbormaster failed remote builds in B12265: Diff 15066!
sandyryza requested review of this revision.May 28 2020, 9:35 PM

my only concern is i think we will want the variant that decorates a function and allows exposing different config - that could cover this static case as well, albeit in a more involved way than this.

Do we think its worth having a static and dynamic API?

my_s3 = configured_resource('my_s3', s3_resource, {'my': 'config'})

@reconfigurable_resource(config={'less': str}, target_resource=s3_resource)
def my_s3(init_context): # name from fn name
  return {
    'this':  {
      'type': {'of_thing': init_context.resource_config['less'] }
    }
  }

That said - I do think we want to do this for solid as well - I think having to use composite just for config_mapping has tripped people up since they then have to figure out I/O mapping too

python_modules/dagster/dagster/core/definitions/bound_resource.py
7 ↗(On Diff #15073)

I think we need to force a name selection too since the resource name is shown in tools and wrapped for all of them is not great

Do we think its worth having a static and dynamic API?

Do we have ideas for what scenarios we'd expect people to use the dynamic API in? My personal bias is towards delaying adding API surface until it addresses concrete need.

If I were to guess about a use case for a dynamic API, it would be that someone wants their resource to be configured at runtime, but it has lots of fancy nitty-gritty config and they want to simplify it? FWIW this feels qualitatively different to me than not wanting a resource to be configured at runtime.

python_modules/dagster/dagster/core/definitions/bound_resource.py
7 ↗(On Diff #15073)

I clicked around in dagit a little, and if I understand correctly, the displayed name comes from the resource key - ResourceDefinitions themselves don't have names. Does sound right to you?

alangenfeld added inline comments.May 29 2020, 5:24 PM
python_modules/dagster/dagster/core/definitions/bound_resource.py
7 ↗(On Diff #15073)

oh nice, you are totally right - ResourceDefinition does not even have name

"Do we have ideas for what scenarios we'd expect people to use the dynamic API in? My personal bias is towards delaying adding API surface until it addresses concrete need."

I normally agree with you but I think it is worth thinking about it since we have to support new APIs for a long time. In this case the partial application case is more general and would supplant the full curry.

Example scenario:

I am a data infra eng. I am responsible for how spark contexts are configured prod, stagings test, etc.. However I still want my ops people and devs to be able to control a limited subset of the configuration. E.g. maybe I just want to expose a memory limit and nothing else (contrived example).

@reconfigurable_resource(config={'mem': int}, target_resource=pyspark_resource)
def myco_prod_spark_context(cfg):
   return _build_spark_config(cfg['mem'])
 
def _build_spark_config(mem_limit):
   return {....} # massive config dict

However you very well maybe be right that a full curry is a common case that it warrants its own top-level API. Also translating these same concepts to solids very interesting to me:

preconfigured_resource and preconfigured_solid

Not that in the full preconfiguration case would could actually write tools the leverage the config editor to build these in a more WYSIWYG style way

reconfigured_resource and reconfigured_solid

^--- this would be the partial application case. Note that reconfigured_solid would be a nicer way of doing a single solid composite whose only purpose is config mappiing

and finally composite_solid

python_modules/dagster/dagster/core/definitions/bound_resource.py
7 ↗(On Diff #15073)

Yeah think this a gap actually. The key in the dictionary is analogous solid name and we don't have a definition name.

schrockn requested changes to this revision.May 29 2020, 5:42 PM

bouncing to your queue for discussion

This revision now requires changes to proceed.May 29 2020, 5:42 PM

@schrockn

I am a data infra eng. I am responsible for how spark contexts are configured prod, stagings test, etc.. However I still want my ops people and devs to be able to control a limited subset of the configuration. E.g. maybe I just want to expose a memory limit and nothing else (contrived example).

That sounds legit.

I normally agree with you but I think it is worth thinking about it since we have to support new APIs for a long time. In this case the partial application case is more general and would supplant the full curry.

Definitely. The way I'm imagining this could ultimately end up is distinct APIs with concentric circles of implementation. Something like the following:

def config_mapped_resource(resource: ResourceDefinition, config_spec: Any):
    def _config_mapped_resource(fn: Callable[[Dict], Dict]) -> ResourceDefinition:
        @resource(config=config_spec)
        def map_config():
            ...
            return ResourceDefinition()
    return _config_mapped_resource


def partially_configured_resource(resource: ResourceDefinition, partial_config: Dict) -> ResourceDefinition:
    @config_mapped_resource(resource, curried_config_spec(resource.config_spec, partial_config))
    def resource_with_configs(runtime_config: Dict) -> ResourceDefinition:
        return deep_merge_dicts(runtime_config, config)

    return resource_with_configs


def fully_configured_resource(resource: ResourceDefinition, config: Dict):
    partially_configured = partially_configured_resource(resource, config)
    check.invariant(len(partially_configured.config) == 0, 'missing config')
    return partially_configured

So a user could theoretically use config_mapped_resource for everything, but in my opinion, expecting them to write a mapping function when they want to fully/partially configure a resource is too heavyweight.

fully_configured_resource and partially_configured_resource should maybe be mushed into a single function with a can_be_partial=False argument?

Thoughts?

I originally started implementing a more general case - something like the partially_configured_resource described above. My impression though was that producing a curried config spec would require working pretty deeply inside the config machinery. I was grepping field snaps and EvaluationStacks, but then stepped back and thought it might not be a good use of my time if my immediate goal was fully configured resources. Maybe I was overcomplicating though and deep surgery isn't required.

Either way, I agree that the API should be designed with an eye towards what the general case will look like.

sandyryza updated this revision to Diff 15604.Jun 4 2020, 12:54 AM
  • use resource_with_config and comment config
  • resource_with_config
  • IConfigMappable

@schrockn @alangenfeld I updated the approach to try to generalize across config-mappable types, starting with resources and storages. The idea is to have an IConfigMappable interface implemented by ResourceDefinition, SystemStorageDefinition etc., that puts those types in charge of delegating config-processing.

There are definitely a bunch of loose ends to tie up, but interested in your initial reactions to this direction.

I still feel strongly that users shouldn't need to decorate a function to do basic config-currying, so configured accepts either a function or a config.

good shit

python_modules/dagster/dagster/core/definitions/config_mappable.py
28–36

redfordnod

python_modules/dagster/dagster/core/definitions/resource.py
13–119

can you tell me about how you ended up at this set up of IConfigMappable + I___Definition + ConfigMapped____Definition ? it seems like a lot at first glance and i cant tell what constraints led you here

This is great.

Having a ConfigMapped* variant of every definition type feels a little heavy. Seems like we should be able to do it in cross-cutting way a little more elegantly. But that is an implementation detail and I think we can iterate from here.

I think this is going to be *really* useful on SolidDefinition too.

python_modules/dagster/dagster/core/definitions/config_mappable.py
18–47

looksgood

most excellent

sandyryza updated this revision to Diff 15631.Jun 4 2020, 2:49 PM
  • Consolidate ResourceDefinition and ConfigMappedResourceDefinition

@schrockn -

Having a ConfigMapped* variant of every definition type feels a little heavy. Seems like we should be able to do it in cross-cutting way a little more elegantly. But that is an implementation detail and I think we can iterate from here.

The constraint I'm working with is that calling configured on a ResourceDefinition should return something that looks and acts like a ResourceDefinition. Does that sound right to you? A couple ideas on how to satisfy this constraint while avoiding a ConfigMapped variant of every definition:

  • Smush ConfigMappedResourceDefinition and ResourceDefinition together. In my opinion, that's a little awkward because then ResourceDefinition kind of has two internal modes, one where half of its attrs are relevant, and one where the other half are relevant. I updated the revision to showcase this approach, and if you prefer it, I can go with it.
  • Pull something fancy like the following:
def configured(configurable, config_schema=None):
    def _configured(config_or_config_fn):
        if isinstance(config_or_config_fn, Callable):
            config_fn = config_or_config_fn
        else:

            def config_fn(_):
                return config_or_config_fn

        return ConfigMappingWrapper(configurable)


class ConfigMappingWrapper(object):
    def __init__(self, wrapped, config_fn, config_field):
        self._config_field = config_field
        self._wrapped = wrapped
        self._config_fn = config_fn

    def __getattr__(self, name):
        if (name in ['_wrapped', '_config_fn', '_config_field']):
            return super(ConfigMappingWrapper, self).__getattr__(name)
        else:
            return self._wrapped.__getattr__(name)

    def process_config(self, config):
        config_evr = process_config(self._config_field, config)
        if config_evr.success:
            return self._wrapped.process_config(self._config_fn(config))

        return config_evr

cc @alangenfeld

python_modules/dagster/dagster/core/definitions/resource.py
13–119

Yeah, very fair question. The constraints:

  • A single function (configured) can be used for config mapping across resources, storages, etc.
  • Config validation at pipeline initialization time, not resource initialization time.
  • Config mapping of arbitrary depth, i.e. you should be able to config-map a config-mapped resource.

IConfigMappable defines an explicit contract for what a type needs to support for configured to be used on it. I think the alternative would be for configured and EnvironmentConfig.build to have heavy implementations that know about how each type gets wrapped.

I think the alternative to ConfigMappedResourceDefinition, would be making _wrapped_resource_fn and _config_fn attrs of ResourceDefinition. In my opinion, that's a little awkward because then ResourceDefinition kind of has two internal modes, one where half of its attrs are relevant, and one where the other half are relevant.

I don't feel super strongly about that though. I just pushed a diff that shows what it would like like to consolidate ConfigMappedResourceDefinition into ResourceDefinition. Does this version look better to you?

schrockn requested changes to this revision.Jun 4 2020, 10:47 PM

I like the IConfigMappable approach. I think you can make it so that you can just construct a wholly new definition object without the lineage. I think that will be a lot cleaner.

python_modules/dagster/dagster/core/definitions/resource.py
98–104

I think this would be a lot cleaner if you didn't pass in the wrapped_resource_def.

I think it's also important that the configurable allows you to override the description. I think passing along the the inner description can actually be pretty misleading

This revision now requires changes to proceed.Jun 4 2020, 10:47 PM
sandyryza added a comment.EditedJun 4 2020, 11:17 PM

@schrockn:

I like the IConfigMappable approach. I think you can make it so that you can just construct a wholly new definition object without the lineage. I think that will be a lot cleaner.

I agree that it would be way cleaner without the lineage. The reason I put the lineage in there is so we can validate the "curried" config at the time we're validating the rest of the config (which is what we already do with composite solid config-mapping case). I think, without the lineage, we'd need to wait until resource initialization to do validation. Does that line up with your understanding?

Edit: an idea for how to address the above without wrapping a full resource would be for resources to have a process_config function (maybe even replaces config_field?). Then the new resource that's produced by configured could have a process_config that calls the process_config of the resource it's configuring.

sandyryza retitled this revision from resource_with_config to configured.Jun 26 2020, 1:11 AM
sandyryza updated this revision to Diff 17740.Jul 1 2020, 12:02 AM
  • process_config wrapping
sandyryza updated this revision to Diff 17742.Jul 1 2020, 12:16 AM

pare down

sandyryza edited the summary of this revision. (Show Details)Jul 1 2020, 12:17 AM
schrockn accepted this revision.Jul 1 2020, 1:33 PM

most excellent

python_modules/dagster/dagster/core/definitions/resource.py
103

Better error message. Indicates an error in the user specification of configured

This revision is now accepted and ready to land.Jul 1 2020, 1:33 PM
schrockn requested changes to this revision.Jul 1 2020, 1:33 PM

Actually one more thing:

  • Just make that error message better when the configured function returns bad config
  • Add a test for it
This revision now requires changes to proceed.Jul 1 2020, 1:33 PM
schrockn accepted this revision.Jul 6 2020, 6:28 PM

most excellent

impressive

This revision is now accepted and ready to land.Jul 6 2020, 6:28 PM
This revision was automatically updated to reflect the committed changes.