Page MenuHomePhabricator

#1848 output_materialization_config should be able to yield arbitrarily many Materializations
ClosedPublic

Authored by yuhan on Mon, Mar 16, 7:19 AM.

Details

Summary

Issue: https://github.com/dagster-io/dagster/issues/1848
Enabled @output_materialization_config to yield multiple Materializations

  • Addition to returning a Materialization, output_materialization_config can also yield multiple Materializations (a generator of Materializations.
    • To treat return and yield the same way in the core, we are going to check ensure_gen in OutputSchemaForDecorator.materialize_runtime_values every time this gets called.
  • Changed intro_tutorial/output_materialization.py to yield two Materializations (use case is to output both csv and json)
Test Plan
  • unit
  • ran output_materialization in tutorial through CLI and dagit
  • BK

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

yuhan created this revision.Mon, Mar 16, 7:19 AM
yuhan updated this revision to Diff 10592.Mon, Mar 16, 6:22 PM

There still will be one case broken in dagster-pandas.
Because I don't think @output_selector_schema should do yield multiple Materializations... So I will have to find a way to try-catch the "not a generator" error

yuhan edited the summary of this revision. (Show Details)Mon, Mar 16, 6:25 PM
yuhan added reviewers: max, prha.
prha requested changes to this revision.Mon, Mar 16, 6:54 PM

I think we should probably use ensure_gen instead of converting the type signature?

We do this with resources (see resources_init.py) to support both functions w/ return types and for generators.

This revision now requires changes to proceed.Mon, Mar 16, 6:54 PM
yuhan updated this revision to Diff 10615.Tue, Mar 17, 12:38 AM

switch back to return Materialization for outputting single materialization. use ensure_gen to guard

prha added a comment.Tue, Mar 17, 2:07 AM

looking pretty good! A couple of small suggestions for you...

examples/dagster_examples/intro_tutorial/output_materialization.py
41–44

yayyyyy... it's awesome you updated the intro tutorial to showcase this.

58

might make sense to keep this as data_frame_csv?

78

maybe data_frame_json?

python_modules/dagster/dagster/core/engine/engine_inprocess.py
677

Maybe this should be materialize_runtime_values if it can yield multiple materializations?

681–692

It could be cleaner if you could put the ensure_gen call within the output_materialization_config decorator...

This comment was removed by prha.
prha added inline comments.Tue, Mar 17, 2:11 AM
python_modules/dagster/dagster/core/engine/engine_inprocess.py
681–692

We should also make sure we incur the ensure_gen is called at definition time and not at runtime.

yuhan updated this revision to Diff 10617.Tue, Mar 17, 3:12 AM
yuhan marked 3 inline comments as done.

update

prha requested changes to this revision.Tue, Mar 17, 3:33 AM
prha added inline comments.
python_modules/dagster/dagster/core/types/config_schema.py
160

Let's move this so that this happens at definition time, in the constructor....

self._func = ensure_gen(check.callable_param(func, 'func'))
This revision now requires changes to proceed.Tue, Mar 17, 3:33 AM
yuhan added inline comments.Tue, Mar 17, 6:54 AM
python_modules/dagster/dagster/core/types/config_schema.py
160
class OutputSchemaForDecorator(OutputMaterializationConfig):
    def __init__(self, config_type, func, required_resource_keys):
        self._func = ensure_gen(check.callable_param(func, 'func'))

    def materialize_runtime_values(self, context, config_value, runtime_value):
        return self._func(context, config_value, runtime_value)

gave me TypeError: 'generator' object is not callable.

Shouldn't ensure_* all happen at run_time? my understanding is it needs to check the actual return/yield of the func at runtime and then it returns a generator. If it gets init-ed at once, how would the real value gets checked?

prha added a comment.Tue, Mar 17, 4:26 PM

ah, you're right (of course).

I *think* it should be okay to incur this cost at runtime, but let's check in with @max.

prha accepted this revision.Thu, Mar 19, 12:48 AM

approving to remove the req'd changes in phabricator, but let's make sure we check in on the runtime inspect

This revision is now accepted and ready to land.Thu, Mar 19, 12:48 AM
yuhan edited the summary of this revision. (Show Details)Thu, Mar 19, 1:11 AM
yuhan edited the summary of this revision. (Show Details)Thu, Mar 19, 1:13 AM
yuhan updated this revision to Diff 10717.Thu, Mar 19, 6:31 PM

rebase master