Changeset View
Changeset View
Standalone View
Standalone View
docs/sections/intro_tutorial/hello_cereal.rst
- This file was moved from docs/sections/learn/tutorial/hello_cereal.rst.
.. py:currentmodule:: dagster | .. py:currentmodule:: dagster | ||||
Hello, cereal! | Hello, cereal! | ||||
--------------- | --------------- | ||||
In this tutorial, we'll explore the feature set of Dagster with small examples that are intended to | In this tutorial, we'll explore the feature set of Dagster with small examples that are intended to | ||||
be illustrative of real data problems. | be illustrative of real data problems. | ||||
We'll build these examples around a simple but scary .csv dataset, ``cereal.csv``, which contains | We'll build these examples around a simple but scary .csv dataset, ``cereal.csv``, which contains | ||||
nutritional facts about 80 breakfast cereals. You can find this dataset on | nutritional facts about 80 breakfast cereals. You can find this dataset on | ||||
`Github <https://raw.githubusercontent.com/dagster-io/dagster/master/examples/dagster_examples/intro_tutorial/cereal.csv>`_. | `Github <https://raw.githubusercontent.com/dagster-io/dagster/master/examples/dagster_examples/intro_tutorial/cereal.csv>`_. | ||||
Or, if you've cloned the dagster git repository, you'll find this dataset at | Or, if you've cloned the dagster git repository, you'll find this dataset at | ||||
``dagster/examples/dagster_examples/intro_tutorial/cereal.csv``. | ``dagster/examples/dagster_examples/intro_tutorial/cereal.csv``. | ||||
To get the flavor of this dataset, let's look at the header and the first five rows: | To get the flavor of this dataset, let's look at the header and the first five rows: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/cereal.csv | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/cereal.csv | ||||
:linenos: | :linenos: | ||||
:lines: 1-6 | :lines: 1-6 | ||||
:caption: cereals.csv | :caption: cereals.csv | ||||
:language: text | :language: text | ||||
Hello, solid! | Hello, solid! | ||||
^^^^^^^^^^^^^ | ^^^^^^^^^^^^^ | ||||
Let's write our first Dagster solid and save it as ``hello_cereal.py``. | Let's write our first Dagster solid and save it as ``hello_cereal.py``. | ||||
(You can also find this file, and all of the tutorial code, on | (You can also find this file, and all of the tutorial code, on | ||||
`Github <https://github.com/dagster-io/dagster/tree/master/examples/dagster_examples/intro_tutorial>`__ | `Github <https://github.com/dagster-io/dagster/tree/master/examples/dagster_examples/intro_tutorial>`__ | ||||
or, if you've cloned the git repo, at ``dagster/examples/dagster_examples/intro_tutorial/``.) | or, if you've cloned the git repo, at ``dagster/examples/dagster_examples/intro_tutorial/``.) | ||||
A solid is a unit of computation in a data pipeline. Typically, you'll define solids by | A solid is a unit of computation in a data pipeline. Typically, you'll define solids by | ||||
annotating ordinary Python functions with the :py:func:`@solid <solid>` decorator. | annotating ordinary Python functions with the :py:func:`@solid <solid>` decorator. | ||||
The logic in our first solid is very straightforward: it just reads in the csv from a hardcoded path | The logic in our first solid is very straightforward: it just reads in the csv from a hardcoded path | ||||
and logs the number of rows it finds. | and logs the number of rows it finds. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | ||||
:linenos: | :linenos: | ||||
:lines: 1-18 | :lines: 1-18 | ||||
:caption: hello_cereal.py | :caption: hello_cereal.py | ||||
In this simplest case, our solid takes no inputs except for the | In this simplest case, our solid takes no inputs except for the | ||||
:py:class:`context <SystemComputeExecutionContext>` in which it executes | :py:class:`context <SystemComputeExecutionContext>` in which it executes | ||||
(provided by the Dagster framework as the first argument to every solid), and also returns no | (provided by the Dagster framework as the first argument to every solid), and also returns no | ||||
outputs. Don't worry, we'll soon encounter solids that are much more dynamic. | outputs. Don't worry, we'll soon encounter solids that are much more dynamic. | ||||
Hello, pipeline! | Hello, pipeline! | ||||
^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^ | ||||
To execute our solid, we'll embed it in an equally simple pipeline. | To execute our solid, we'll embed it in an equally simple pipeline. | ||||
A pipeline is a set of solids arranged into a DAG (or | A pipeline is a set of solids arranged into a DAG (or | ||||
`directed acyclic graph <https://en.wikipedia.org/wiki/Directed_acyclic_graph>`_) of computation. | `directed acyclic graph <https://en.wikipedia.org/wiki/Directed_acyclic_graph>`_) of computation. | ||||
You'll typically define pipelines by annotating ordinary Python functions with the | You'll typically define pipelines by annotating ordinary Python functions with the | ||||
:py:func:`@pipeline <pipeline>` decorator. | :py:func:`@pipeline <pipeline>` decorator. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | ||||
:linenos: | :linenos: | ||||
:lineno-start: 21 | :lineno-start: 21 | ||||
:lines: 21-23 | :lines: 21-23 | ||||
:caption: hello_cereal.py | :caption: hello_cereal.py | ||||
Here you'll see that we call ``hello_cereal()``. This call doesn't actually execute the solid | Here you'll see that we call ``hello_cereal()``. This call doesn't actually execute the solid | ||||
-- within the body of functions decorated with :py:func:`@pipeline <pipeline>`, we use | -- within the body of functions decorated with :py:func:`@pipeline <pipeline>`, we use | ||||
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines | |||||
In this view, you can filter and search through the logs corresponding to your pipeline run. | In this view, you can filter and search through the logs corresponding to your pipeline run. | ||||
Using the Python API to execute a pipeline | Using the Python API to execute a pipeline | ||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
If you'd rather execute your pipelines as a script, you can do that without using the dagster CLI | If you'd rather execute your pipelines as a script, you can do that without using the dagster CLI | ||||
at all. Just add a few lines to ``hello_cereal.py``: | at all. Just add a few lines to ``hello_cereal.py``: | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | ||||
:linenos: | :linenos: | ||||
:lineno-start: 26 | :lineno-start: 26 | ||||
:lines: 26-28 | :lines: 26-28 | ||||
:caption: hello_cereal.py | :caption: hello_cereal.py | ||||
Now you can just run: | Now you can just run: | ||||
.. code-block:: console | .. code-block:: console | ||||
Show All 11 Lines | |||||
expected. We'll use :py:func:`execute_pipeline` to test our pipeline, as well as | expected. We'll use :py:func:`execute_pipeline` to test our pipeline, as well as | ||||
:py:func:`execute_solid` to test our solid in isolation. | :py:func:`execute_solid` to test our solid in isolation. | ||||
These functions synchronously execute a pipeline or solid and return results objects (the | These functions synchronously execute a pipeline or solid and return results objects (the | ||||
:py:class:`SolidExecutionResult` and :py:class:`PipelineExecutionResult`) whose methods let us | :py:class:`SolidExecutionResult` and :py:class:`PipelineExecutionResult`) whose methods let us | ||||
investigate, in detail, the success or failure of execution, the outputs produced by solids, and | investigate, in detail, the success or failure of execution, the outputs produced by solids, and | ||||
(as we'll see later) other events associated with execution. | (as we'll see later) other events associated with execution. | ||||
.. literalinclude:: ../../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | .. literalinclude:: ../../../examples/dagster_examples/intro_tutorial/hello_cereal.py | ||||
:linenos: | :linenos: | ||||
:caption: hello_cereal.py | :caption: hello_cereal.py | ||||
:lineno-start: 31 | :lineno-start: 31 | ||||
:lines: 31-40 | :lines: 31-40 | ||||
Now you can use pytest, or your test runner of choice, to run unit tests as you develop your | Now you can use pytest, or your test runner of choice, to run unit tests as you develop your | ||||
data applications. | data applications. | ||||
Show All 13 Lines |