Page MenuHomePhabricator

(make-pipeline-a-solid-1) Allow pipeline to be used as composite_solid
ClosedPublic

Authored by schrockn on Oct 21 2020, 3:30 AM.

Details

Summary

This is committable, but it is also a larger diff than I
generally do for this type of thing, so I could easily be persuaded to
break it up.

In order to reconcile composite_solid and pipeline into graph, we
have to take a piecemeal approach.

What this diff does is introduce a new base class GraphDefinition
that both CompositeSolidDefinition and PipelineDefinition inherit
from. The eventual long-term goal would be a delete both CompositeSolidDefinition
and PipelineDefinition, with only GraphDefinition surviving. The
0.10.0 goal would be to have CompositeSolidDefinition and
PipelineDefiniton be trivial subclasses, with only backcompat (e.g.
argument ordering) issues involved.

The net result of this diff is that you can use @pipeline
anywhere you can use @composite_solid. E.g. you can embed
pipelines in pipelines, making it a unit of composition.

Next steps would be to make the config system work with top-level
config, inputs, and outputs keys. Right now if you tried to
use execute_pipeline on a pipeline with a config mapping or an
unfulfilled input it would break.

Test Plan

BK

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This is cleaner than I thought it would be.

python_modules/dagster/dagster/core/definitions/decorators/composite_solid.py
10

I think this could benefit from some docstring.

python_modules/dagster/dagster/core/definitions/graph.py
293

Returns (GraphDefinition)

amazed how clean this is

python_modules/dagster/dagster/core/definitions/decorators/composite_solid.py
11

for clarity maybe decorator_name, graph_name

20

maybe outputs_are_explicit, outputs_are_implicit, outputs_are_provided; explicit outputs feels very much ~ provided outputs, esp since we use explicit three lines below to mean provided

27

this should pass through decorator_name instead of hardcoding "@composite_solid"

python_modules/dagster/dagster/core/definitions/graph.py
99

graph

104

graph

110

graph

133

graph

277

a new GraphDefinition i think (?)

282

graph's

283

graph's

288

graph. i'm not clear on what the actual uniqueness constraints for names are here. is it that at any given level of a composite graph structure all the names need to be unique?

297

graphs; but this should probably be possible for pipelines, à la presets

306

graph

335

would be clearer if we called this validate_input_mappings

397

similarly

python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_composition.py
551–552

it'd be nice to have a test for the breaking case marked xfail

python_modules/dagster/dagster/core/definitions/graph.py
99

nope they are solids

104

same

110

same

133

same. solids and instances that contain an ISolidDefinition, which can be a SolidDefinition or GraphDefinition

277

it's actually GraphDefinition or SolidDefinition depending on caller

335

just moving code for now. can rename in followup

397

same

python_modules/dagster/dagster/core/definitions/graph.py
99

top level solids in the graph (not pipeline), etc.

python_modules/dagster/dagster_tests/core_tests/test_composites.py
418

here is the concrete cost of the proposed backwards compat strategy. for use within composition contexts you *must* explicitly defined the output_defs

python_modules/dagster/dagster/core/definitions/graph.py
99

whoops will fix

python_modules/dagster/dagster_tests/core_tests/test_composites.py
418

i'm not sure i completely understand

python_modules/dagster/dagster_tests/core_tests/test_composites.py
418

because of this

https://dagster.phacility.com/D4884

We previously ignored return values out of @pipeline. People could return whatever it wants. The ability to have outputs on pipelines changes that.

The proposed solution in that diff is to continue to ignore the output for now when no output defs are explicitly provided. That way users can continue to return stuff.

The cost of that is here, where you *must* provide an OutputDefinition in the case where you actually want to absence of output definition to mean a single output of the Any type.

... which can be a SolidDefinition or GraphDefinition

[2] What is the right thing to call these methods that can return a SolidDefinition or a GraphDefinition ? I don't think ISolidDefinition is as fitting in graph world as it was in composite solid world.

I don't expect this diff in particular to solve it, but it would be good to have a plan since I think its a pretty confusing intermediate state to be in so I want to make sure we move through it swiftly

python_modules/dagster/dagster/core/definitions/decorators/composite_solid.py
10

nit: maybe move this out - could avoid inline require [1]

python_modules/dagster/dagster/core/definitions/decorators/pipeline.py
50–68

[1]

python_modules/dagster/dagster/core/definitions/graph.py
42

why this - i don't recall this one being problematic

99

in the pipeline

in the graph

157–161

[2]

python_modules/dagster/dagster/core/definitions/pipeline.py
235–257

always worry about these type of callsites missing new args when they get added

python_modules/dagster/dagster/core/definitions/pipeline.py
256

me too

schrockn retitled this revision from RFC (ish): Allow pipeline to be used as composite_solid to Allow pipeline to be used as composite_solid.

up

schrockn retitled this revision from Allow pipeline to be used as composite_solid to (make-pipeline-a-solid-1) Allow pipeline to be used as composite_solid.Oct 23 2020, 9:30 PM

@alangenfeld good call on moving do_composition 👍🏻

python_modules/dagster/dagster/core/definitions/graph.py
42

regarding ^ previous comment

throwing out some ideas for what to do with ISolidDefinition:

  • AbstractSolidDefinition, get_abstract_solid()
  • GraphNodeDefinition, get_node()
  • NodeDefinition, get_node()

I'm not sure if we should rename that right now. In my mind a Graph *is* a solid. You can treat it as one, essentially. So it makes actually makes sense from a certain point of view. In that frame of mind the Playground it actually a thing that loads solid and you can execute them. Introducing another name might obfuscate things

I'm not sure if we should rename that right now

Yea I am not advocating for a rename in this diff necessarily, but I do want to try to reach some shared understanding of ontology. To me, this diff leaves the code base in a less readable / intuitive state, which is fine temporarily but not for very long.

The heart of the issue is that we have 3 things and currently only 2 names:

  • SolidDefinition - a unit of user defined computation
  • GraphDefinition - a graph of solids
  • ??? - the abstract unit representing either

In my mind a Graph *is* a solid. You can treat it as one, essentially. So it makes actually makes sense from a certain point of view. In that frame of mind the Playground it actually a thing that loads solid and you can execute them. Introducing another name might obfuscate things

I think with this framing - you then need to have a name for the leaf / compute solid for when you need to be able to distinguish it. For example, how would you explain the execution plan creation process in this framing? I think with a reasonable name for the @solid solid ie ComputeSolid and then renaming methods that only deal in those ie get_compute_solids you get to an OK spot, but i do worry its still a bit confusing.

To me, thinking about trying to explain this to some one new - I really keep wanting a distinct third noun to distinguish from the others instead of relying on an adjective - which is why I suggested Node above.

This revision is now accepted and ready to land.Tue, Oct 27, 3:55 PM