Changeset View
Standalone View
docs/content/deployment/guides/kubernetes/deploying-with-helm.mdx
Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines | |||||||||||
A User Code Deployment runs a gRPC server and responds to Dagit's requests for information (such as: "List all of the pipelines in each repository" or "What is the dependency structure of pipeline X?"). The user-provided image for the User Code Deployment must contain a [repository definition](/concepts/repositories-workspaces/repositories) and all of the packages needed to execute within the repository. | A User Code Deployment runs a gRPC server and responds to Dagit's requests for information (such as: "List all of the pipelines in each repository" or "What is the dependency structure of pipeline X?"). The user-provided image for the User Code Deployment must contain a [repository definition](/concepts/repositories-workspaces/repositories) and all of the packages needed to execute within the repository. | ||||||||||
Users can have multiple User Code Deployments. A common pattern is for each User Code Deployment to correspond to a different repository. | Users can have multiple User Code Deployments. A common pattern is for each User Code Deployment to correspond to a different repository. | ||||||||||
This component can be updated independently from other Dagster components, including Dagit. As a result, updates to repositories can occur without causing downtime to any other repository or to Dagit. After updating, if there is an error with any repository, an error is surfaced for that repository within Dagit; all other repositories and Dagit will still operate normally. | This component can be updated independently from other Dagster components, including Dagit. As a result, updates to repositories can occur without causing downtime to any other repository or to Dagit. After updating, if there is an error with any repository, an error is surfaced for that repository within Dagit; all other repositories and Dagit will still operate normally. | ||||||||||
## Walkthrough | ## Walkthrough | ||||||||||
We'll use [docker-desktop](https://docs.docker.com/desktop/kubernetes/) to set up a local k8s cluster to develop against; feel free to substitute with another k8s cluster as desired. | |||||||||||
### Configure kubectl | |||||||||||
First, configure the `kubectl` CLI to point to the local k8s cluster set up by `docker-desktop`. | |||||||||||
$ kubectl config set-context dagster --namespace default --cluster docker-desktop --user=docker-desktop | |||||||||||
$ kubectl config use-context dagster | |||||||||||
### Build Docker image for User Code | ### Build Docker image for User Code | ||||||||||
_Skip this step if using Dagster's example User Code image [dagster/user-code-example](https://hub.docker.com/r/dagster/user-code-example)._ | _Skip this step if using Dagster's example User Code image [dagster/user-code-example](https://hub.docker.com/r/dagster/user-code-example)._ | ||||||||||
Build a Docker image containing your Dagster repository and any dependencies needed to execute the business logic in your code. | Build a Docker image containing your Dagster repository and any dependencies needed to execute the business logic in your code. | ||||||||||
For reference, here is an example [Dockerfile](https://github.com/dagster-io/dagster/blob/master/python_modules/automation/automation/docker/images/k8s-example/Dockerfile) and the corresponding [User Code directory](https://github.com/dagster-io/dagster/tree/master/examples/deploy_k8s/example_project). Here, we install all the Dagster-related dependencies in the Dockerfile, and then copy over the directory with the implementation of the Dagster repository into the root folder. We'll need to remember the path of this repository in a [subsequent step](/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment) to setup the gRPC server as a deployment. | For reference, here is an example [Dockerfile](https://github.com/dagster-io/dagster/blob/master/python_modules/automation/automation/docker/images/k8s-example/Dockerfile) and the corresponding [User Code directory](https://github.com/dagster-io/dagster/tree/master/examples/deploy_k8s/example_project). Here, we install all the Dagster-related dependencies in the Dockerfile, and then copy over the directory with the implementation of the Dagster repository into the root folder. We'll need to remember the path of this repository in a [subsequent step](/deployment/guides/kubernetes/deploying-with-helm#configure-your-user-deployment) to setup the gRPC server as a deployment. | ||||||||||
For projects with many dependencies, it is recommended that you publish your Python project as a package and install that package in your Dockerfile. | For projects with many dependencies, it is recommended that you publish your Python project as a package and install that package in your Dockerfile. | ||||||||||
### Push Docker image to registry | ### Push Docker image to registry | ||||||||||
_Skip this step if using Dagster's example User Code image._ | _Skip this step if using Dagster's example User Code image._ | ||||||||||
Publish the image to a registry that is accessible from the Kubernetes cluster, such as AWS ECR or DockerHub. | Publish the image to a registry that is accessible from the Kubernetes cluster, such as AWS ECR or DockerHub. | ||||||||||
### Set up S3 (Optional) | |||||||||||
rexledesma: I think we should mark this as optional, since this looks like a lot of setup | |||||||||||
Done Inline ActionsSo you need to set up either S3 or minio for the default configuration to work - I'll mark it as optional, but I'll add a note here & below clarifying that. sidkmenon: So you need to set up either S3 or minio for the `default` configuration to work - I'll mark it… | |||||||||||
The [dagster/user-code-example](https://hub.docker.com/r/dagster/user-code-example) uses an [S3 IO Manager](/deployment/guides/aws#using-s3-for-io-management). | |||||||||||
Done Inline Actions
rexledesma: | |||||||||||
Therefore, if you'd like to run the pipeline in the `default` mode, you'll need an AWS S3 bucket available, and access to a pair of `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values. This is because the IO Manager uses [boto](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html). | |||||||||||
This tutorial also has the option of using [`minio`](https://min.io/) to mock an S3 endpoint locally in K8s. Note that this option utilizes `host.docker.internal` to access a host from within Docker - this behavior has only been tested for MacOS, so may need different configuration for other platforms. | |||||||||||
#### Using AWS S3 | |||||||||||
_Skip this step if you'd like to use minio for a local S3 endpoint_ | |||||||||||
If using S3, create a bucket in your AWS account -- for this tutorial, we'll create a bucket called `test-bucket`. Also, keep your `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` credentials handy. Now, you can create your k8s secrets: | |||||||||||
$ kubectl create secret generic dagster-aws-access-key-id --from-literal=AWS_ACCESS_KEY_ID=<YOUR ACCESS KEY ID> | |||||||||||
$ kubectl create secret generic dagster-aws-secret-access-key --from-literal=AWS_SECRET_ACCESS_KEY=<SECRET ACCESS KEY> | |||||||||||
#### Using Local S3 - Minio | |||||||||||
_Skip this step if you're using AWS S3_ | |||||||||||
First, set up minio locally: | |||||||||||
```bash | |||||||||||
brew install minio/stable/minio # server | |||||||||||
brew install minio/stable/mc # client | |||||||||||
mkdir $HOME/miniodata # Prepare a directory for data | |||||||||||
minio server $HOME/miniodata # start a server with default user/pass and no TLS | |||||||||||
mc --insecure alias set minio http://localhost:9000 minioadmin minioadmin | |||||||||||
# See it work | |||||||||||
mc ls minio | |||||||||||
date > date1.txt # create a sample file | |||||||||||
mc cp date1.txt minio://testbucket/date1.txt | |||||||||||
export AWS_ACCESS_KEY_ID="minioadmin" | |||||||||||
export AWS_SECRET_ACCESS_KEY="minioadmin" | |||||||||||
# See the aws cli work | |||||||||||
aws --endpoint-url http://localhost:9000 s3 mb s3://test-bucket | |||||||||||
aws --endpoint-url http://localhost:9000 s3 cp date1.txt s3://test-bucket/ | |||||||||||
``` | |||||||||||
Now, create your k8s AWS secrets: | |||||||||||
$ kubectl create secret generic dagster-aws-access-key-id --from-literal=AWS_ACCESS_KEY_ID=minioadmin | |||||||||||
$ kubectl create secret generic dagster-aws-secret-access-key --from-literal=AWS_SECRET_ACCESS_KEY=minioadmin | |||||||||||
### Add the Dagster Helm chart repository | ### Add the Dagster Helm chart repository | ||||||||||
The Dagster chart repository contains the versioned charts for all Dagster releases. Add the remote url under the namespace `dagster` to install the Dagster charts. | The Dagster chart repository contains the versioned charts for all Dagster releases. Add the remote url under the namespace `dagster` to install the Dagster charts. | ||||||||||
helm repo add dagster https://dagster-io.github.io/helm | $ helm repo add dagster https://dagster-io.github.io/helm | ||||||||||
### Configure your User Deployment | ### Configure your User Deployment | ||||||||||
Update the `dagster-user-deployments.deployments` section of the Dagster chart's `values.yaml` to include your deployment. Here, we can specify the configuration of the Kubernetes Deployment that will create the gRPC server for Dagit and the Daemon to access the User Code. The gRPC server is created through the arguments passed to `dagsterApiGrpcArgs`, which expects a list of arguments for [`dagster api grpc`](/concepts/repositories-workspaces/workspaces#running-your-own-grpc-server). | Update the `dagster-user-deployments.deployments` section of the Dagster chart's `values.yaml` to include your deployment. Here, we can specify the configuration of the Kubernetes Deployment that will create the gRPC server for Dagit and the Daemon to access the User Code. The gRPC server is created through the arguments passed to `dagsterApiGrpcArgs`, which expects a list of arguments for [`dagster api grpc`](/concepts/repositories-workspaces/workspaces#running-your-own-grpc-server). | ||||||||||
The following snippet works for Dagster's example User Code image. Since our Dockerfile contains the repository definition in a path, we specify arguments for the gRPC server to find this path under `dagsterApiGrpcArgs`. | To get access to the Dagster `values.yaml`, run: | ||||||||||
$ helm show values dagster/dagster > values.yaml | |||||||||||
The following snippet works for Dagster's example User Code image. Since our Dockerfile contains the repository definition in a path, we specify arguments for the gRPC server to find this path under `dagsterApiGrpcArgs`. Note that if you haven't set up an S3 endpoint, you can only run the pipeline in `test` mode. | |||||||||||
```yaml | ```yaml | ||||||||||
dagster-user-deployments: | dagster-user-deployments: | ||||||||||
enabled: true | enabled: true | ||||||||||
deployments: | deployments: | ||||||||||
- name: "k8s-example-user-code-1" | - name: "k8s-example-user-code-1" | ||||||||||
image: | image: | ||||||||||
repository: "docker.io/dagster/user-code-example" | repository: "docker.io/dagster/user-code-example" | ||||||||||
tag: latest | tag: latest | ||||||||||
pullPolicy: Always | pullPolicy: Always | ||||||||||
dagsterApiGrpcArgs: | dagsterApiGrpcArgs: | ||||||||||
- "--python-file" | - "--python-file" | ||||||||||
- "/example_project/example_repo/repo.py" | - "/example_project/example_repo/repo.py" | ||||||||||
port: 3030 | port: 3030 | ||||||||||
``` | ``` | ||||||||||
`dagsterApiGrpcArgs` also supports loading repository definitions from a package name. To find the applicable arguments, [read here](/concepts/repositories-workspaces/workspaces#running-your-own-grpc-server). | `dagsterApiGrpcArgs` also supports loading repository definitions from a package name. To find the applicable arguments, [read here](/concepts/repositories-workspaces/workspaces#running-your-own-grpc-server). | ||||||||||
#### Running the `default` mode (Optional) | |||||||||||
You'll need a slightly different configuration to run the `default` mode as well. This is because the user code uses an AWS `S3IOManager` in the `default` mode, and therefore you'll need to provide the user code k8s pods with AWS S3 credentials. | |||||||||||
See the [set up S3](/deployment/guides/kubernetes/deploying-with-helm#set-up-s3-optional) section for setup instructions. The below snippet works for both AWS S3 and a local S3 endpoint via `minio`. | |||||||||||
```yaml | |||||||||||
Done Inline Actionssimilarly, there should be two yaml blocks present - one for deploying without setting up AWS (the previous yaml block), and another for deploying with AWS (your current one) rexledesma: similarly, there should be two yaml blocks present - one for deploying without setting up AWS… | |||||||||||
Done Inline ActionsSo in this case, because of the way I set up the secrets, both YAML blocks would be identical - i'll add a note that this works with either real S3 or minio. sidkmenon: So in this case, because of the way I set up the secrets, both YAML blocks would be identical… | |||||||||||
dagster-user-deployments: | |||||||||||
enabled: true | |||||||||||
deployments: | |||||||||||
- name: "k8s-example-user-code-1" | |||||||||||
image: | |||||||||||
repository: "docker.io/dagster/user-code-example" | |||||||||||
tag: latest | |||||||||||
pullPolicy: Always | |||||||||||
dagsterApiGrpcArgs: | |||||||||||
- "--python-file" | |||||||||||
- "/example_project/example_repo/repo.py" | |||||||||||
port: 3030 | |||||||||||
envSecrets: | |||||||||||
- name: dagster-aws-access-key-id | |||||||||||
- name: dagster-aws-secret-access-key | |||||||||||
runLauncher: | |||||||||||
type: K8sRunLauncher | |||||||||||
config: | |||||||||||
k8sRunLauncher: | |||||||||||
envSecrets: | |||||||||||
- name: dagster-aws-access-key-id | |||||||||||
- name: dagster-aws-secret-access-key | |||||||||||
``` | |||||||||||
### Install the Dagster Helm chart | ### Install the Dagster Helm chart | ||||||||||
Install the Helm chart and create a release. Below, we've named our release `dagster`. We use `helm upgrade --install` to create the release if it does not exist; otherwise, the existing `dagster` release will be modified: | Install the Helm chart and create a release. Below, we've named our release `dagster`. We use `helm upgrade --install` to create the release if it does not exist; otherwise, the existing `dagster` release will be modified: | ||||||||||
helm upgrade --install dagster dagster/dagster -f /path/to/values.yaml | helm upgrade --install dagster dagster/dagster -f /path/to/values.yaml | ||||||||||
Helm will launch several pods including PostgreSQL. You can check the status of the installation with `kubectl`. If everything worked correctly, you should see output like the following: | Helm will launch several pods including PostgreSQL. You can check the status of the installation with `kubectl` - note that it might take a few minutes for the pods to move to a `Running` state. | ||||||||||
If everything worked correctly, you should see output like the following: | |||||||||||
$ kubectl get pods | $ kubectl get pods | ||||||||||
NAME READY STATUS RESTARTS AGE | NAME READY STATUS RESTARTS AGE | ||||||||||
dagster-dagit-645b7d59f8-6lwxh 1/1 Running 0 11m | dagster-dagit-645b7d59f8-6lwxh 1/1 Running 0 11m | ||||||||||
dagster-k8s-example-user-code-1-88764b4f4-ds7tn 1/1 Running 0 9m24s | dagster-k8s-example-user-code-1-88764b4f4-ds7tn 1/1 Running 0 9m24s | ||||||||||
dagster-postgresql-0 1/1 Running 0 17m | dagster-postgresql-0 1/1 Running 0 17m | ||||||||||
### Run a pipeline in your deployment | ### Run a pipeline in your deployment | ||||||||||
After Helm has successfully installed all the required kubernetes resources, start port forwarding to the Dagit pod via: | After Helm has successfully installed all the required kubernetes resources, start port forwarding to the Dagit pod via: | ||||||||||
export DAGIT_POD_NAME=$(kubectl get pods --namespace default \ | export DAGIT_POD_NAME=$(kubectl get pods --namespace default \ | ||||||||||
-l "app.kubernetes.io/name=dagster,app.kubernetes.io/instance=dagster,component=dagit" \ | -l "app.kubernetes.io/name=dagster,app.kubernetes.io/instance=dagster,component=dagit" \ | ||||||||||
-o jsonpath="{.items[0].metadata.name}") | -o jsonpath="{.items[0].metadata.name}") | ||||||||||
kubectl --namespace default port-forward $DAGIT_POD_NAME 8080:80 | kubectl --namespace default port-forward $DAGIT_POD_NAME 8080:80 | ||||||||||
Visit <http://127.0.0.1:8080>, navigate to the [playground](http://127.0.0.1:8080/workspace/example_repo@k8s-example-user-code-1/pipelines/example_pipe/playground), select the `default` preset, and click _Launch Execution_. | Now try running a pipeline. Visit <http://127.0.0.1:8080>, navigate to the [playground](http://127.0.0.1:8080/workspace/example_repo@k8s-example-user-code-1/pipelines/example_pipe/playground), select the `test` mode, and click _Launch Execution_. | ||||||||||
```yaml | |||||||||||
solids: | |||||||||||
multiply_the_word: | |||||||||||
config: | |||||||||||
factor: 0 | |||||||||||
inputs: | |||||||||||
word: "" | |||||||||||
``` | |||||||||||
You can introspect the jobs that were launched with `kubectl`: | You can introspect the jobs that were launched with `kubectl`: | ||||||||||
$ kubectl get jobs | $ kubectl get jobs | ||||||||||
NAME COMPLETIONS DURATION AGE | NAME COMPLETIONS DURATION AGE | ||||||||||
dagster-run-c8f4e3c2-0915-4317-a168-bf8c86810fb2 1/1 4s 6s | dagster-run-5ee8a0b3-7ca5-44e6-97a6-8f4bd86ee630 1/1 4s 11s | ||||||||||
Now, you can try a full run. Using the `default` mode, provide the following config: | |||||||||||
#### Using AWS S3 | |||||||||||
```yaml | |||||||||||
resources: | |||||||||||
io_manager: | |||||||||||
config: | |||||||||||
s3_bucket: "test-bucket" | |||||||||||
solids: | |||||||||||
multiply_the_word: | |||||||||||
config: | |||||||||||
factor: 0 | |||||||||||
inputs: | |||||||||||
word: "" | |||||||||||
``` | |||||||||||
#### Using minio | |||||||||||
```yaml | |||||||||||
Done Inline Actions
feels odd to include that preface rexledesma: feels odd to include that preface | |||||||||||
Done Inline ActionsSo the reason for the preface was the use of host.docker.internal which I think works out-of-the-box only on MacOS (and maybe Windows) - you need to build the docker images differently and do some docker networking shenanigans on Linux. Do you still think I should remove it? Fine either way! sidkmenon: So the reason for the preface was the use of `host.docker.internal` which I think works out-of… | |||||||||||
# Go to the playground and prepare a configuration | |||||||||||
resources: | |||||||||||
io_manager: | |||||||||||
config: | |||||||||||
s3_bucket: "test-bucket" | |||||||||||
s3: | |||||||||||
config: | |||||||||||
# This use of host.docker.internal is unique to Mac | |||||||||||
endpoint_url: http://host.docker.internal:9000 | |||||||||||
region_name: us-east-1 | |||||||||||
solids: | |||||||||||
multiply_the_word: | |||||||||||
config: | |||||||||||
factor: 0 | |||||||||||
inputs: | |||||||||||
word: "" | |||||||||||
``` | |||||||||||
Again, you can view the launched jobs: | |||||||||||
$ kubectl get jobs | |||||||||||
NAME COMPLETIONS DURATION AGE | |||||||||||
dagster-run-5ee8a0b3-7ca5-44e6-97a6-8f4bd86ee630 1/1 4s 11s | |||||||||||
dagster-run-733baf75-fab2-4366-9542-0172fa4ebc1f 1/1 4s 100s | |||||||||||
## Debugging | |||||||||||
Some of the following commands will be useful if you'd like to debug issues with deploying on Helm: | |||||||||||
# Get the Dagit pod's name | |||||||||||
$ export DAGIT_POD_NAME=$(kubectl get pods --namespace default \ | |||||||||||
-l "app.kubernetes.io/name=dagster,app.kubernetes.io/instance=dagster,component=dagit" \ | |||||||||||
-o jsonpath="{.items[0].metadata.name}") | |||||||||||
# Start a shell in the dagit pod | |||||||||||
$ kubectl exec --stdin --tty $DAGIT_POD_NAME -- /bin/bash | |||||||||||
# Get debug data from $RUN_ID | |||||||||||
$ kubectl exec $DAGIT_POD_NAME -- dagster debug export $RUN_ID debug_info.gzip | |||||||||||
# Get a list of recently failed runs | |||||||||||
$ kubectl exec $DAGIT_POD -- dagster debug export fakename fakename.gzip | |||||||||||
# Get debug output of a failed run - note that this information is also available in Dagit | |||||||||||
$ kubectl exec $DAGIT_POD -- dagster debug export 360d7882-e631-4ac7-8632-43c75cb4d426 debug.gzip | |||||||||||
Not Done Inline Actionscould note, there's also a button in dagit for this which is probably easier johann: could note, there's also a button in dagit for this which is probably easier | |||||||||||
# Extract the debug.gzip from the pod | |||||||||||
$ kubectl cp $DAGIT_POD:debug.gzip debug.gzip | |||||||||||
Within Dagit, you can watch pipeline progress live update and succeed! | # List config maps | ||||||||||
$ kubectl get configmap # Make note of the "user-deployments" configmap | |||||||||||
$ kubectl get configmap dagster-dagster-user-deployments-$NAME | |||||||||||
## Conclusion | ## Conclusion | ||||||||||
Done Inline Actions❤️ rexledesma: ❤️ | |||||||||||
We deployed Dagster, configured with the default <PyObject module="dagster_k8s" object="K8sRunLauncher" />, onto a Kubernetes cluster using Helm. | We deployed Dagster, configured with the default <PyObject module="dagster_k8s" object="K8sRunLauncher" />, onto a Kubernetes cluster using Helm. |
I think we should mark this as optional, since this looks like a lot of setup