Page MenuHomeElementl

Revamp the configuration concept section
ClosedPublic

Authored by sandyryza on Apr 14 2021, 9:44 PM.

Details

Summary

Preview: https://dagster-git-config-concept-elementl.vercel.app

  • Explained the basic idea of config (you pass it to pipelines at runtime).
  • Explained what situations users should use config in.
  • Took out the config-mapping example, because I think it violates our guidance on when to use config (the Spark parameters that are the same across all runs shouldn't be config at all).
  • Added in an example of how to make a config value available to multiple solids.
Test Plan

manual inspection

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 14 2021, 10:03 PM
Harbormaster failed remote builds in B28867: Diff 35429!

minor changes but bouncing back to your queue

docs/content/concepts/configuration/config-schema.mdx
18

I think it's also worth mentioning the partioned schedule and sensor use case. e.g. for sensors maybe you want to provide the name of the file in the s3 bucket, and for the partition the partition key must be provided

24

s/strongly typed/gradually typed/

28–29

Add make ad hoc execution of deployed pipelines self-documenting and way easier

This revision now requires changes to proceed.Apr 15 2021, 4:24 PM
yuhan added inline comments.
docs/content/_navigation.json
139

cc @sashank who is working on content for "pipeline - run config"

docs/content/concepts/configuration/config-schema.mdx
123–132

the indentation seems off when the start/end tags are inside a func

docs/content/concepts/configuration/configured.mdx
86

make sure you update the file path

docs/content/concepts/configuration/config-schema.mdx
123–132

any ideas on how to deal with this?

docs/content/concepts/configuration/config-schema.mdx
123–132

sadly, my workaround is to move it out of the function or make it its own file. long term, we should fix the snapshot parser esp when we address issues like https://github.com/dagster-io/dagster/issues/3975

@schrockn mind taking another look? I updated to lead with a non-schema'd example.

The preview link looks out-of-date?

This is a great overall. Just a few suggestions.

docs/content/concepts/configuration/config-schema.mdx
18

"For example, you might want whoever is running a pipeline to decide what dataset it operates on."

A little ambiguous as to whether that person is doing this at authorship time or post deployment. Maybe something like:

"For example, you might want to enable someone to manually operate a deployed pipeline and vary what dataset it operates on."

20

A subtle point but want to make sure that people understand that we aren't super tied to YAML in essence. It's more our "default serialization format."

When execute a pipeline with the Python API, you supply run configuration as a python dictionary. Our web CLI tools have explicit support for YAML.

26–44

I think we can just start immediately with an example of invoking this pipeline with execute_pipeline. With the change to default config_schema to Any, *every* solid is configurable by default, so having this separate section isn't necessary

135

extra credit points for gifs of this

159
  1. Great to explicitly document this
  2. Excited to make this more smooth :-)
docs/content/concepts/configuration/configured.mdx
19

I think a concrete example would good here.

When is this useful? Often library authors provide very flexible and configurable solids that can be used in a wide variety of operational contexts. For example, in our dbt integration, there is a solid that could allow a user to run arbitrary dbt commands on a deployed instance, and leverage our config editor to make this easier.

However, typically you do not want this level of flexibility in a deployed pipeline. You want most configuration options set in code and fixed for deployed. configured provides the bridge between these worlds.

This revision now requires changes to proceed.Apr 21 2021, 2:31 PM

The preview link looks out-of-date?

@schrockn you need to select "master" from the versions list (or at least that was something I ran into).

@schrockn - I incorporated all your suggestions except the one I commented on.

docs/content/concepts/configuration/config-schema.mdx
26–44

I'm not entirely sure I follow. Every solid is configurable by default, but people execute pipelines in different ways, and I don't think we should necessarily privilege execute_pipeline. So I think it's helpful to separate "how to use configuration inside a solid" from "how to provide configuration to a pipeline", because the latter depends on how you're executing the pipeline.

I made some changes to the section below to more clearly indicate that the decision on how to provide run configuration depends on how you're launching your pipeline.

docs/content/concepts/configuration/config-schema.mdx
44

Maybe it is just the title?. "Writing a configurable solid" is weird since every solid is configurable by default now.

great stuff. consider final comment but big improvement

This revision is now accepted and ready to land.Apr 21 2021, 10:24 PM
This revision was automatically updated to reflect the committed changes.