Page MenuHomeElementl

(make-pipeline-a-solid-1) Allow pipeline to be used as composite_solid

Authored by schrockn on Oct 21 2020, 3:30 AM.



This is committable, but it is also a larger diff than I
generally do for this type of thing, so I could easily be persuaded to
break it up.

In order to reconcile composite_solid and pipeline into graph, we
have to take a piecemeal approach.

What this diff does is introduce a new base class GraphDefinition
that both CompositeSolidDefinition and PipelineDefinition inherit
from. The eventual long-term goal would be a delete both CompositeSolidDefinition
and PipelineDefinition, with only GraphDefinition surviving. The
0.10.0 goal would be to have CompositeSolidDefinition and
PipelineDefiniton be trivial subclasses, with only backcompat (e.g.
argument ordering) issues involved.

The net result of this diff is that you can use @pipeline
anywhere you can use @composite_solid. E.g. you can embed
pipelines in pipelines, making it a unit of composition.

Next steps would be to make the config system work with top-level
config, inputs, and outputs keys. Right now if you tried to
use execute_pipeline on a pipeline with a config mapping or an
unfulfilled input it would break.

Test Plan


Diff Detail

R1 dagster
Lint Passed
No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is cleaner than I thought it would be.


I think this could benefit from some docstring.

293 ↗(On Diff #24311)

Returns (GraphDefinition)

amazed how clean this is


for clarity maybe decorator_name, graph_name


maybe outputs_are_explicit, outputs_are_implicit, outputs_are_provided; explicit outputs feels very much ~ provided outputs, esp since we use explicit three lines below to mean provided


this should pass through decorator_name instead of hardcoding "@composite_solid"

99 ↗(On Diff #24344)


104 ↗(On Diff #24344)


110 ↗(On Diff #24344)


133 ↗(On Diff #24344)


277 ↗(On Diff #24344)

a new GraphDefinition i think (?)

282 ↗(On Diff #24344)


283 ↗(On Diff #24344)


288 ↗(On Diff #24344)

graph. i'm not clear on what the actual uniqueness constraints for names are here. is it that at any given level of a composite graph structure all the names need to be unique?

297 ↗(On Diff #24344)

graphs; but this should probably be possible for pipelines, à la presets

306 ↗(On Diff #24344)


335 ↗(On Diff #24344)

would be clearer if we called this validate_input_mappings

397 ↗(On Diff #24344)


551 ↗(On Diff #24311)

it'd be nice to have a test for the breaking case marked xfail

99 ↗(On Diff #24344)

nope they are solids

104 ↗(On Diff #24344)


110 ↗(On Diff #24344)


133 ↗(On Diff #24344)

same. solids and instances that contain an ISolidDefinition, which can be a SolidDefinition or GraphDefinition

277 ↗(On Diff #24344)

it's actually GraphDefinition or SolidDefinition depending on caller

335 ↗(On Diff #24344)

just moving code for now. can rename in followup

397 ↗(On Diff #24344)


99 ↗(On Diff #24344)

top level solids in the graph (not pipeline), etc.


here is the concrete cost of the proposed backwards compat strategy. for use within composition contexts you *must* explicitly defined the output_defs

99 ↗(On Diff #24371)

whoops will fix


i'm not sure i completely understand


because of this

We previously ignored return values out of @pipeline. People could return whatever it wants. The ability to have outputs on pipelines changes that.

The proposed solution in that diff is to continue to ignore the output for now when no output defs are explicitly provided. That way users can continue to return stuff.

The cost of that is here, where you *must* provide an OutputDefinition in the case where you actually want to absence of output definition to mean a single output of the Any type.

... which can be a SolidDefinition or GraphDefinition

[2] What is the right thing to call these methods that can return a SolidDefinition or a GraphDefinition ? I don't think ISolidDefinition is as fitting in graph world as it was in composite solid world.

I don't expect this diff in particular to solve it, but it would be good to have a plan since I think its a pretty confusing intermediate state to be in so I want to make sure we move through it swiftly


nit: maybe move this out - could avoid inline require [1]



42 ↗(On Diff #24378)

why this - i don't recall this one being problematic

99 ↗(On Diff #24378)

in the pipeline

in the graph

157–161 ↗(On Diff #24378)



always worry about these type of callsites missing new args when they get added


me too

schrockn retitled this revision from RFC (ish): Allow pipeline to be used as composite_solid to Allow pipeline to be used as composite_solid.


schrockn retitled this revision from Allow pipeline to be used as composite_solid to (make-pipeline-a-solid-1) Allow pipeline to be used as composite_solid.Oct 23 2020, 9:30 PM

@alangenfeld good call on moving do_composition 👍🏻

42 ↗(On Diff #24396)

regarding ^ previous comment

throwing out some ideas for what to do with ISolidDefinition:

  • AbstractSolidDefinition, get_abstract_solid()
  • GraphNodeDefinition, get_node()
  • NodeDefinition, get_node()

I'm not sure if we should rename that right now. In my mind a Graph *is* a solid. You can treat it as one, essentially. So it makes actually makes sense from a certain point of view. In that frame of mind the Playground it actually a thing that loads solid and you can execute them. Introducing another name might obfuscate things

I'm not sure if we should rename that right now

Yea I am not advocating for a rename in this diff necessarily, but I do want to try to reach some shared understanding of ontology. To me, this diff leaves the code base in a less readable / intuitive state, which is fine temporarily but not for very long.

The heart of the issue is that we have 3 things and currently only 2 names:

  • SolidDefinition - a unit of user defined computation
  • GraphDefinition - a graph of solids
  • ??? - the abstract unit representing either

In my mind a Graph *is* a solid. You can treat it as one, essentially. So it makes actually makes sense from a certain point of view. In that frame of mind the Playground it actually a thing that loads solid and you can execute them. Introducing another name might obfuscate things

I think with this framing - you then need to have a name for the leaf / compute solid for when you need to be able to distinguish it. For example, how would you explain the execution plan creation process in this framing? I think with a reasonable name for the @solid solid ie ComputeSolid and then renaming methods that only deal in those ie get_compute_solids you get to an OK spot, but i do worry its still a bit confusing.

To me, thinking about trying to explain this to some one new - I really keep wanting a distinct third noun to distinguish from the others instead of relying on an adjective - which is why I suggested Node above.

This revision is now accepted and ready to land.Oct 27 2020, 3:55 PM