Changeset View
Changeset View
Standalone View
Standalone View
docs/sections/intro_tutorial/inputs.rst
- This file was moved from docs/sections/learn/tutorial/inputs.rst.
.. py:currentmodule:: dagster | .. py:currentmodule:: dagster | ||||
Parametrizing solids with inputs | Parametrizing solids with inputs | ||||
-------------------------------- | -------------------------------- | ||||
So far, we've only seen solids whose behavior is the same every time they're run: | So far, we've only seen solids whose behavior is the same every time they're run: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/serial_pipeline.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/serial_pipeline.py | ||||
:lines: 6-15 | :lines: 6-15 | ||||
:linenos: | :linenos: | ||||
:lineno-start: 6 | :lineno-start: 6 | ||||
:caption: serial_pipeline.py | :caption: serial_pipeline.py | ||||
In general, though, rather than relying on hardcoded values like ``dataset_path``, we'd like to be | In general, though, rather than relying on hardcoded values like ``dataset_path``, we'd like to be | ||||
able to parametrize our solid logic. Appropriately parameterized solids are more testable, and | able to parametrize our solid logic. Appropriately parameterized solids are more testable, and | ||||
also more reusable. Consider the following more generic solid: | also more reusable. Consider the following more generic solid: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs.py | ||||
:lines: 6-12 | :lines: 6-12 | ||||
:linenos: | :linenos: | ||||
:lineno-start: 6 | :lineno-start: 6 | ||||
:caption: inputs.py | :caption: inputs.py | ||||
Here, rather than hardcoding the value of ``dataset_path``, we use an input, ``csv_path``. It's | Here, rather than hardcoding the value of ``dataset_path``, we use an input, ``csv_path``. It's | ||||
easy to see why this is better. We can reuse the same solid in all the different places we | easy to see why this is better. We can reuse the same solid in all the different places we | ||||
might need to read in a .csv from a filepath. We can test the solid by pointing it at some known | might need to read in a .csv from a filepath. We can test the solid by pointing it at some known | ||||
test csv file. And we can use the output of another upstream solid to determine which file to load. | test csv file. And we can use the output of another upstream solid to determine which file to load. | ||||
Let's rebuild a pipeline we've seen before, but this time using our newly parameterized solid. | Let's rebuild a pipeline we've seen before, but this time using our newly parameterized solid. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs.py | ||||
:lines: 1-36 | :lines: 1-36 | ||||
:linenos: | :linenos: | ||||
:emphasize-lines: 36 | :emphasize-lines: 36 | ||||
:caption: inputs.py | :caption: inputs.py | ||||
As you can see above, what's missing from this setup is a way to specify the ``csv_path`` | As you can see above, what's missing from this setup is a way to specify the ``csv_path`` | ||||
input to our new ``read_csv`` solid in the absence of any upstream solids whose outputs we can | input to our new ``read_csv`` solid in the absence of any upstream solids whose outputs we can | ||||
rely on. | rely on. | ||||
Show All 9 Lines | |||||
We previously encountered the :py:func:`execute_pipeline` function. Pipeline configuration is | We previously encountered the :py:func:`execute_pipeline` function. Pipeline configuration is | ||||
specified by the second argument to this function, which must be a dict (the "environment dict"). | specified by the second argument to this function, which must be a dict (the "environment dict"). | ||||
This dict contains all of the user-provided configuration with which to execute a pipeline. As such, | This dict contains all of the user-provided configuration with which to execute a pipeline. As such, | ||||
it can have :ref:`a lot of sections <config_schema>`, but we'll only use one of them here: | it can have :ref:`a lot of sections <config_schema>`, but we'll only use one of them here: | ||||
per-solid configuration, which is specified under the key ``solids``: | per-solid configuration, which is specified under the key ``solids``: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs.py | ||||
:linenos: | :linenos: | ||||
:lineno-start: 40 | :lineno-start: 40 | ||||
:lines: 40-44 | :lines: 40-44 | ||||
:dedent: 4 | :dedent: 4 | ||||
:caption: inputs.py | :caption: inputs.py | ||||
The ``solids`` dict is keyed by solid name, and each solid is configured by a dict that may itself | The ``solids`` dict is keyed by solid name, and each solid is configured by a dict that may itself | ||||
have several sections. In this case we are only interested in the ``inputs`` section, so | have several sections. In this case we are only interested in the ``inputs`` section, so | ||||
that we can specify the value of the input ``csv_path``. | that we can specify the value of the input ``csv_path``. | ||||
Now you can pass this environment dict to :py:func:`execute_pipeline`: | Now you can pass this environment dict to :py:func:`execute_pipeline`: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs.py | ||||
:linenos: | :linenos: | ||||
:lines: 45-47 | :lines: 45-47 | ||||
:dedent: 4 | :dedent: 4 | ||||
:lineno-start: 45 | :lineno-start: 45 | ||||
:caption: inputs.py | :caption: inputs.py | ||||
Specifying config using YAML fragments and the dagster CLI | Specifying config using YAML fragments and the dagster CLI | ||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
When executing pipelines with the dagster CLI, we'll need to provide the environment dict in a | When executing pipelines with the dagster CLI, we'll need to provide the environment dict in a | ||||
config file. We use YAML for the file-based representation of an environment dict, but the values | config file. We use YAML for the file-based representation of an environment dict, but the values | ||||
are the same as before: | are the same as before: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs_env.yaml | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs_env.yaml | ||||
:language: YAML | :language: YAML | ||||
:linenos: | :linenos: | ||||
:caption: inputs_env.yaml | :caption: inputs_env.yaml | ||||
We can pass config files in this format to the dagster CLI tool with the ``-e`` flag. | We can pass config files in this format to the dagster CLI tool with the ``-e`` flag. | ||||
.. code-block:: console | .. code-block:: console | ||||
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines | |||||
the environment dict to do this, but you can also pass input values directly using the | the environment dict to do this, but you can also pass input values directly using the | ||||
:py:func:`execute_solid` API. This can be especially useful when it is cumbersome or impossible to | :py:func:`execute_solid` API. This can be especially useful when it is cumbersome or impossible to | ||||
parametrize an input through the environment dict. | parametrize an input through the environment dict. | ||||
For example, we may want to test ``sort_by_calories`` on a controlled data set where we know the | For example, we may want to test ``sort_by_calories`` on a controlled data set where we know the | ||||
most and least caloric cereals in advance, but without having to flow its input from an upstream | most and least caloric cereals in advance, but without having to flow its input from an upstream | ||||
solid implementing a data ingest process. | solid implementing a data ingest process. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/test_inputs.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/test_inputs.py | ||||
:lines: 9-22 | :lines: 9-22 | ||||
:lineno-start: 9 | :lineno-start: 9 | ||||
:linenos: | :linenos: | ||||
:caption: test_inputs.py | :caption: test_inputs.py | ||||
When we execute this test (e.g., using pytest), we'll be reminded again of one of the reasons why | When we execute this test (e.g., using pytest), we'll be reminded again of one of the reasons why | ||||
it's always a good idea to write unit tests, even for the most seemingly trivial components. | it's always a good idea to write unit tests, even for the most seemingly trivial components. | ||||
Show All 36 Lines | |||||
.. thumbnail:: inputs_figure_four.png | .. thumbnail:: inputs_figure_four.png | ||||
By default, every untyped value in Dagster is assigned the catch-all type :py:class:`Any`. This means that | By default, every untyped value in Dagster is assigned the catch-all type :py:class:`Any`. This means that | ||||
any errors in the config won't be surfaced until the pipeline is executed. | any errors in the config won't be surfaced until the pipeline is executed. | ||||
For example, when we execute our pipeline with this config, it'll fail at runtime: | For example, when we execute our pipeline with this config, it'll fail at runtime: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs_env_bad.yaml | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs_env_bad.yaml | ||||
:language: YAML | :language: YAML | ||||
:linenos: | :linenos: | ||||
:caption: inputs_env_bad.yaml | :caption: inputs_env_bad.yaml | ||||
When we enter this mistyped config in Dagit and execute our pipeline, you'll see that an error | When we enter this mistyped config in Dagit and execute our pipeline, you'll see that an error | ||||
appears in the structured log viewer pane of the **Execute** tab: | appears in the structured log viewer pane of the **Execute** tab: | ||||
.. thumbnail:: inputs_figure_five.png | .. thumbnail:: inputs_figure_five.png | ||||
Click on "View Full Message" or on the red dot on the execution step that failed and a detailed | Click on "View Full Message" or on the red dot on the execution step that failed and a detailed | ||||
stacktrace will pop up. | stacktrace will pop up. | ||||
.. thumbnail:: inputs_figure_six.png | .. thumbnail:: inputs_figure_six.png | ||||
It would be better if we could catch this error earlier, when we specify the config. So let's | It would be better if we could catch this error earlier, when we specify the config. So let's | ||||
make the inputs typed. | make the inputs typed. | ||||
A user can apply types to inputs and outputs using Python 3's type annotation syntax. In this case, | A user can apply types to inputs and outputs using Python 3's type annotation syntax. In this case, | ||||
we just want to type the input as the built-in ``str``. | we just want to type the input as the built-in ``str``. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/inputs_typed.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/inputs_typed.py | ||||
:lines: 6-12 | :lines: 6-12 | ||||
:emphasize-lines: 2 | :emphasize-lines: 2 | ||||
:linenos: | :linenos: | ||||
:lineno-start: 6 | :lineno-start: 6 | ||||
:caption: inputs_typed.py | :caption: inputs_typed.py | ||||
By using typed input instead we can catch this error prior to execution, and reduce the surface | By using typed input instead we can catch this error prior to execution, and reduce the surface | ||||
area we need to test and guard against in user code. | area we need to test and guard against in user code. | ||||
.. thumbnail:: inputs_figure_seven.png | .. thumbnail:: inputs_figure_seven.png |