- Explained the basic idea of config (you pass it to pipelines at runtime).
- Explained what situations users should use config in.
- Took out the config-mapping example, because I think it violates our guidance on when to use config (the Spark parameters that are the same across all runs shouldn't be config at all).
- Added in an example of how to make a config value available to multiple solids.
minor changes but bouncing back to your queue
I think it's also worth mentioning the partioned schedule and sensor use case. e.g. for sensors maybe you want to provide the name of the file in the s3 bucket, and for the partition the partition key must be provided
s/strongly typed/gradually typed/
Add make ad hoc execution of deployed pipelines self-documenting and way easier
cc @sashank who is working on content for "pipeline - run config"
the indentation seems off when the start/end tags are inside a func
|90 ↗||(On Diff #35899)|
make sure you update the file path
This is a great overall. Just a few suggestions.
"For example, you might want whoever is running a pipeline to decide what dataset it operates on."
A little ambiguous as to whether that person is doing this at authorship time or post deployment. Maybe something like:
"For example, you might want to enable someone to manually operate a deployed pipeline and vary what dataset it operates on."
A subtle point but want to make sure that people understand that we aren't super tied to YAML in essence. It's more our "default serialization format."
When execute a pipeline with the Python API, you supply run configuration as a python dictionary. Our web CLI tools have explicit support for YAML.
I think we can just start immediately with an example of invoking this pipeline with execute_pipeline. With the change to default config_schema to Any, *every* solid is configurable by default, so having this separate section isn't necessary
extra credit points for gifs of this
|19 ↗||(On Diff #35912)|
I think a concrete example would good here.
When is this useful? Often library authors provide very flexible and configurable solids that can be used in a wide variety of operational contexts. For example, in our dbt integration, there is a solid that could allow a user to run arbitrary dbt commands on a deployed instance, and leverage our config editor to make this easier.
However, typically you do not want this level of flexibility in a deployed pipeline. You want most configuration options set in code and fixed for deployed. configured provides the bridge between these worlds.
@schrockn - I incorporated all your suggestions except the one I commented on.
I'm not entirely sure I follow. Every solid is configurable by default, but people execute pipelines in different ways, and I don't think we should necessarily privilege execute_pipeline. So I think it's helpful to separate "how to use configuration inside a solid" from "how to provide configuration to a pipeline", because the latter depends on how you're executing the pipeline.
I made some changes to the section below to more clearly indicate that the decision on how to provide run configuration depends on how you're launching your pipeline.