Page MenuHomePhabricator

Move tutorial into main dagster package
Needs RevisionPublic

Authored by max on Feb 17 2020, 9:52 PM.



The rationale for doing this is to enable better quickstart. Right now, if someone pip installs dagster and then tries to follow the intro tutorial, we tell them to refer to the source code on github or to clone the repository. Post this move, it will be possible to do things like run dagster tutorial from the CLI, call a function like dagster.tutorial.get_tutorial(), or refer to datasets like dagster.tutorial.cereal_dataset.

Test Plan


Diff Detail

R1 dagster
Lint OK
No Unit Test Coverage

Event Timeline

max created this revision.Feb 17 2020, 9:52 PM
max retitled this revision from Cp - move tutorial to Move tutorial into main dagster package.Feb 17 2020, 9:55 PM
max edited the summary of this revision. (Show Details)
max added reviewers: nate, sashank, schrockn, alangenfeld.

Is there prior art in other systems here? I'm a little wary of putting this in core. How about making a separate installable dagster-tutorial module?

Just feels a bit wrong to force people to deploy the tutorial the world over

max added a comment.Feb 17 2020, 11:07 PM

I'm not sure what you mean by prior art.

Airflow ships with its examples embedded, enabling a quick start like this:

Another approach we could take is to include a function in dagster that downloads the tutorial from a known spot and put it somewhere. (This is typically how large datasets, e.g. in NLP get packaged).

The tutorial is 240K.

Harbormaster failed remote builds in B7884: Diff 9735!
schrockn requested changes to this revision.Feb 17 2020, 11:56 PM

I'm not convinced that this is what we want. A simpler solution would seem to be to have a separate repo for examples that is up-to-date with the latest public version. Implementing a tutorial downloader in the dagster core seems very odd to me.

I think there is a broader conversation to be had as well about how to structure all of our examples, including the tutorial.

This revision now requires changes to proceed.Feb 17 2020, 11:56 PM
sashank added a comment.EditedFeb 18 2020, 12:45 AM

I agree with the motivation here - I think it's a good idea to package up a few example solids, pipelines, etc under dagster.examples, so that users can do things like:

$ dagster pipeline execute -m dagster.examples -n example_pipeline
$ dagit -m dagster.examples -n define_repo

This lets people open dagit and play with it _immediately_ after downloading.

However, I don't know if it's a good idea to directly copy over _all_ of the intro tutorial files. I think it would be better to have a few cleaned up final state examples (like the final state of the cereal repository), available here.

Re: a separate repo - I think it's also a good idea but also too much friction for the on-boarding and tutorial use case. I ideally shouldn't have to do another git clone and cd to the correct place to try things out.

With something like this, a user could literally try out Dagster with just:

$ pip install dagster dagit
$ dagit -m dagster.examples -f define_bay_bikes_repository

And our upcoming Dagit tutorials etc could take advantage of this.

Good example:

See timestamp 1:03 in the video

I'd be more open to making the tutorials available to dagit. I just want to keep dagster core as lean as possible.