The rationale for doing this is to enable better quickstart. Right now, if someone pip installs dagster and then tries to follow the intro tutorial, we tell them to refer to the source code on github or to clone the repository. Post this move, it will be possible to do things like run dagster tutorial from the CLI, call a function like dagster.tutorial.get_tutorial(), or refer to datasets like dagster.tutorial.cereal_dataset.
I'm not sure what you mean by prior art.
Airflow ships with its examples embedded, enabling a quick start like this: https://airflow.apache.org/docs/stable/start.html.
Another approach we could take is to include a function in dagster that downloads the tutorial from a known spot and put it somewhere. (This is typically how large datasets, e.g. in NLP get packaged).
The tutorial is 240K.
I'm not convinced that this is what we want. A simpler solution would seem to be to have a separate repo for examples that is up-to-date with the latest public version. Implementing a tutorial downloader in the dagster core seems very odd to me.
I think there is a broader conversation to be had as well about how to structure all of our examples, including the tutorial.
I agree with the motivation here - I think it's a good idea to package up a few example solids, pipelines, etc under dagster.examples, so that users can do things like:
$ dagster pipeline execute -m dagster.examples -n example_pipeline $ dagit -m dagster.examples -n define_repo
This lets people open dagit and play with it _immediately_ after downloading.
However, I don't know if it's a good idea to directly copy over _all_ of the intro tutorial files. I think it would be better to have a few cleaned up final state examples (like the final state of the cereal repository), available here.
Re: a separate repo - I think it's also a good idea but also too much friction for the on-boarding and tutorial use case. I ideally shouldn't have to do another git clone and cd to the correct place to try things out.
With something like this, a user could literally try out Dagster with just:
$ pip install dagster dagit $ dagit -m dagster.examples -f define_bay_bikes_repository
And our upcoming Dagit tutorials etc could take advantage of this.