Page MenuHomePhabricator

RFC: chain function for building strings of solids
Needs RevisionPublic

Authored by schrockn on Wed, Nov 6, 2:02 AM.

Details

Reviewers
max
alangenfeld
Group Reviewers
Restricted Project
Summary

An occasionaly usability issue has been a common case where
one is simply trying to execute a few solids in order is a bit
cumbersome.

This proposes a helper method chain, which allows one to string
together a set of solids in linear order and specify
them in order of execute

Alternative would be something like

(a | b | c | d)(inputs)

but dealing with the order of precendence is fraught

Test Plan

View in dagit

Diff Detail

Repository
R1 dagster
Branch
2chainz
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

schrockn created this revision.Wed, Nov 6, 2:02 AM
max accepted this revision.Wed, Nov 6, 5:24 PM
max added a subscriber: max.

i like this; i think some fp purists would want to see chain(a, b, c, d) rather than chain(a, b, c, d)() -- or chain(a, b, c, d) rather than chain(b, c, d)(a), etc.

examples/dagster_examples/pyspark_pagerank/pyspark_pagerank_pipeline.py
83

oh no

This revision is now accepted and ready to land.Wed, Nov 6, 5:24 PM
max added a comment.Wed, Nov 6, 5:26 PM

clojure calls this -> ("threading")

did you consider some hot operator overloading bullshit?

I had to.

examples/dagster_examples/pyspark_pagerank/pyspark_pagerank_pipeline.py
83

2chainz

schrockn requested review of this revision.Wed, Nov 6, 7:53 PM

@alangenfeld for sure. look at diff summary. considered pipe operator

alangenfeld added a comment.EditedWed, Nov 6, 8:22 PM

would an operator with clear left and right delineation ie >> / > resolve the precedence issue?

edit: oh i bet you mean python resolution order not user perception

throwing other ideas around:

with chained_sequence():
    parse_pagerank_data()
    compute_links()
    calculate_ranks()
    log_ranks()

TBH I'm kind of partial to the bitshift >> overloading especially since it'll be familiar to users coming from the Airflow world

My issue with the bitshifts is that I can never remember which order the dependency order is. Especially if you also support the opposite direction

If we support both directions, isn't the ordering clear by the direction the operators are pointing? I mean we could also keep things simple and say that you can only go in one direction IE (a >> b >> c >> d or a << b << c << d). Then the dependency ordering becomes trivial right?

max added a comment.Tue, Nov 12, 7:09 PM

I also dislike the bitshift because I don't know which direction is which

themissinghlink added a comment.EditedTue, Nov 12, 7:25 PM

Pipe? a | b | c | d??? If it works for bash it works for us XD Oh didn't see the alternative

alangenfeld requested changes to this revision.Thu, Nov 14, 6:58 PM

I think a context manager that alters the composition context that:

  • ensures the first invocation is fully resolved and has a single output
  • ensures that subsequent invocations have a single unspecified input and single output
  • wires everything together

would allow it to work if there were inputs to some of the solids other than the "chained" one so I'm partial to that approach.

chain is definitely simpler but it doesn't feel "right" to me personally. Nothing I can effectively articulate though.

to your queue to process feedback

examples/dagster_examples/pyspark_pagerank/pyspark_pagerank_pipeline.py
97

as max said, worth considering invoking the chain for you instead of having user do it. Unclear what the value of leaving it open is. If it's to supply other args to the first solid in the chain that seems like it could be confusing

chain(parse_pagerank_data, compute_links, calculate_ranks, log_ranks)
This revision now requires changes to proceed.Thu, Nov 14, 6:58 PM