Page MenuHomePhabricator

Add pipe dep operator to easily express serial deps of single input/single output solids
AbandonedPublic

Authored by schrockn on Nov 29 2019, 7:34 PM.

Details

Summary

It's a common pattern to have a long chain of solids and it's
often inconvenient to use the function notion in order to do these
operations specifically:

solid_4(solid_3(solid_2(solid_1())))

This adds the "|" operator to the DSL, allowing single input/output
solids to be piped left-to-right execution order

solid_1 | solid_2 | solid_3 | solid_4

This proposal uses the "|" rather than the "<<" operator used
in other workflow APIs such as Airflow & Beam.

I believe that using the bitshift operators is confusing because
the directionality of A >> B could be read as "A executes then B"
OR it can be read as "A depends on B". Inevitiably users will
then asked for the bitshift operator in the other direction
compounding the confusion. I often see Airflow DAGs specified
as so:

A >> B
C << B

which I find quite difficult to understand, even in the simple cases.

Instead this just uses the Unix analogy of pipes and only allows
one direction. Everyone understands pipes as left-to-right execution
so I think this is more clear overall.

Test Plan

BK

Diff Detail

Repository
R1 dagster
Branch
pipe-operator-rfc-D1512
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

schrockn created this revision.Nov 29 2019, 7:34 PM
schrockn updated this revision to Diff 7014.Nov 30 2019, 1:53 AM
schrockn retitled this revision from Another chaining attempt using bitwise or to Add chain operator to allow serial chains of single input/single output solids.
schrockn edited the summary of this revision. (Show Details)
schrockn edited the test plan for this revision. (Show Details)
schrockn added reviewers: max, nate, alangenfeld, sashank.

upmessage

schrockn edited the summary of this revision. (Show Details)Nov 30 2019, 1:55 AM
schrockn added a comment.EditedNov 30 2019, 1:59 AM

Also a tip of the hat to one @alangenfeld for the excellent implementation of the dep dsl. This was very straightforward to add. Vast, vast majority of the time was spent writing tests and delineating error states and helpful error messages. The core functionality is implemented in only 3 one-liner methods.

def __or__(self, other): # in CallableSolidNode
    return other(self())
 
def __or__(self, other): # in InvokeSolidOutputHandle
    return other(self)

def __or__(self, other): # in ISolidDefinition
    return other(self())

That's it. 🀯🀯🀯

chefkiss

schrockn edited the summary of this revision. (Show Details)Nov 30 2019, 2:01 AM
schrockn edited the summary of this revision. (Show Details)
nate resigned from this revision.Dec 1 2019, 4:53 PM

This is sweet! I find your argument re: the Unix analogy compelling, I'm on board with |.

Resigning in favor of letting Alex have the final say here.

python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_chain_operator.py
345 β†—(On Diff #7014)

yes definitely agree here

schrockn updated this revision to Diff 7035.Dec 1 2019, 6:25 PM
schrockn retitled this revision from Add chain operator to allow serial chains of single input/single output solids to Add pipe dep operator to easily express serial deps of single input/single output solids.
schrockn edited the summary of this revision. (Show Details)
schrockn removed a reviewer: nate.

upmessage

schrockn added inline comments.Dec 1 2019, 6:27 PM
python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_pipe_dep_operator.py
23

can we disable this conditionally the same way we did for the dsl?

Harbormaster completed remote builds in B5677: Diff 7035.
schrockn marked an inline comment as not done.Dec 1 2019, 6:34 PM
schrockn added inline comments.
python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_pipe_dep_operator.py
23

Meaning that in order for this to not mess with pylint we need to inform pylint that the pipe operator in this context actually does have side effects.

I have some hesitancy about this - I agree it is valuable but I feel like it makes things more confusing in particular where the function-based invocations meet the piped ones. I understand it since i know how it works - but I don't know of a good way to explain it to someone without just explaining how it works. To frame this feeling in another way - I think the combination of the function based capability and the pipe based capability is less than the sum of both since they combine in an unintuitive way.

python_modules/dagster/dagster/core/definitions/composition.py
128

tuple_typle

python_modules/dagster/dagster_tests/core_tests/definitions_tests/test_pipe_dep_operator.py
23

I dont see anything like signature-mutators for operator overloads

54

look at these examples - I worry about how confusing the use/absence of () will be. Do we need to start to allow input-less solids to work without "invoking" them?

schrockn marked an inline comment as not done.Dec 2 2019, 7:26 PM

@alangenfeld i agree it's a concern and there's no rush to merge. in fact totally willing to wait for more instances of user feedback since it does muddy the model and injects complexity (as is evidenced by all the error conditions I needed to test for)

schrockn planned changes to this revision.Dec 2 2019, 10:24 PM

I'm going to pop this off folks' queues for now. Let's see if we actually need this.

schrockn updated this revision to Diff 7307.Dec 8 2019, 4:32 PM

updating for new local branch name

alangenfeld requested changes to this revision.Dec 9 2019, 4:47 PM

Q

This revision now requires changes to proceed.Dec 9 2019, 4:47 PM
schrockn abandoned this revision.Jan 23 2020, 1:28 PM

Abandoning for now.