Page MenuHomePhabricator

RFC: Composite solid subselection
Needs RevisionPublic

Authored by max on May 12 2020, 5:14 PM.

Details

Summary

The main things to note about this are:

a) This distinguishes between selecting a composite solid ['composite_a'] and selecting all of the composite's children ['composite_a.solid_a', 'composite_a.solid_b']. In the first case, config mapping is respected; in the second, it is not. Subselecting from a composite ['composite_a.solid_a'] does not respect config mapping.

b) This collapses composite subselections, so that, e.g., 'composite_a.composite_b.solid_c' takes config like {'solids': {'composite_a.composite_b.solid_c': {...}}}. So this does synthetic naming in the subselection.

In order to enable the alternate approach, where 'composite_a.composite_b.solid_c' takes config like {'solids': {'composite_a': {'solids': {'composite_b': {'solids': {'solid_c': {...}}}}}}}, we would either need to inject synthetic composite solids into the subpipeline (hard to understand how this would work when we no longer subclass PipelineDefinition), or we would need to specialize environment schema generation for the subpipeline (but see https://github.com/dagster-io/dagster/issues/2322 -- we may have to do this anyway).

Test Plan

Unit

Diff Detail

Repository
R1 dagster
Branch
solid-subset-selection
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

max planned changes to this revision.May 12 2020, 5:14 PM
max created this revision.
max updated this revision to Diff 13809.May 13 2020, 1:12 AM

fix output mapping

max planned changes to this revision.May 13 2020, 1:12 AM
max retitled this revision from RFC (1): Composite solid subselection to RFC: Composite solid subselection.May 13 2020, 1:35 AM
max edited the summary of this revision. (Show Details)
max added reviewers: alangenfeld, prha, schrockn.
max updated this revision to Diff 13817.May 13 2020, 1:36 AM
max edited the summary of this revision. (Show Details)

Lint

max edited the summary of this revision. (Show Details)May 13 2020, 1:36 AM
max requested review of this revision.May 13 2020, 1:51 AM

I think this overall makes sense - interested in more eyes on this for sure though.

python_modules/dagster/dagster/core/definitions/pipeline.py
519–780

this is Some Real Shit - i think some walk through comments could be a nice addition . The existing comments are a great start - i just mean some outline of "we do this then this then this"

871

why are we iterating defs when we are building handles? shouldnt it be against instances? this and the addition of parent_name to iterate defs confuses me

python_modules/dagster/dagster_tests/core_tests/test_pipeline_execution.py
1073–1075

we could make a different choice here right? - a pipeline with everything subset is equivalent to the full pipeline, curious on your thinking here.

1305

spooky - fix now?

max added inline comments.May 13 2020, 6:47 PM
python_modules/dagster/dagster_tests/core_tests/test_pipeline_execution.py
1073–1075

yes, we could def do that. i think it seemed more consistent to do this given that if we subset all of the children of a composite, we aren't resolving that to the composite

1305

sorry, this is just a legacy of dev

max updated this revision to Diff 13959.May 14 2020, 1:34 AM

nits

python_modules/dagster/dagster/core/definitions/pipeline.py
871

yep, this is cruft

max updated this revision to Diff 13967.May 14 2020, 2:43 AM

Add comments

woah nice work on the comments - thats really nice. I would like @prha / @schrockn to also take a pass, mostly for opinion on the design choices highlighted in summary.

prha added a comment.May 15 2020, 7:31 PM

would there be a universe in which we would need (or want) to support both:

{'solids': {'composite_a.composite_b.solid_c': {...}}}

and

{'solids': {'composite_a': {'solids': {'composite_b': {'solids': {'solid_c': {...}}}}}}}
schrockn requested changes to this revision.May 15 2020, 7:39 PM

I think supporting this for subset selection is fine (since it is totally unsupported now) but I am very wary about supporting this in the configuration context. This would open two different ways of doing things which complicates everything: the code, our documentation, stack overflow questions, etc. We need to decide on one way to stick with it.

This revision now requires changes to proceed.May 15 2020, 7:39 PM
alangenfeld added a comment.EditedMay 15 2020, 7:59 PM

I have a hard time reconciling a mental model that makes sense for subsetting in to a composite where maintaining the original config structure makes sense. I agree that this is a gnarly user experience - but I think its because subsetting in to a composite is fundamentally complex and that complexity is accurately exposed to the user. I think maintaining the config structure will mask the complexity causing greater confusion.

If somebody has a mental model that feels right to them for keeping it consistent, it would be great if you could update the drawing in the comment above to help communicate it.

max added a comment.May 18 2020, 11:03 PM

Just for the avoidance of doubt, I agree with Alex and I think this is clearer and more consistent than the alternative

met to discuss this - @schrockn was able to help outline how using the dotted notation for all composite config (subselected or not) instead of the hierarchal structure or a hybrid would make sense. The example of vscode settings helps us believe this as likely a better end user experience.

Action items:

  • gather some early feedback from users on the proposed change
  • a separate RFC for migrating the current system to this approach and the necessary amount of back compat
yuhan added a subscriber: yuhan.Jun 5 2020, 5:03 PM

FYI
in D3221 i renamed PipelineDefinition.subset_for_execution(solid_subset) to PipelineDefinition.get_pipeline_subset_def(solids_to_execute), because previously both ExecutablePipeline and PipelineDefinition have a func called subset_for_execution so i did the renaming to avoid confusing.
it may be worth reading the summary sections in D3221 and D3225 to get the context of the subsetting refactor we did after this diff