Page MenuHomeElementl

rfc: add repository_name to SensorEvaluationContext + make pipeline_failure_sensor repo scoped
ClosedPublic

Authored by yuhan on Jun 30 2021, 11:19 PM.

Details

Summary

bc pipeline_failure_sensor listens to all pipelines across repos, when repo1.im_in_repo1 failed, repo2.slack_on_pipeline_failure will fire too

the ideal use-facing behavior should be that pipeline faliure sensor only acts on the failure in the same repo, bc we ask users to add the sensor to repo.

this diff rfc for the least invasive approach: add repository_name to SensorEvaluationContent and check if the failed pipeline is in the same repo in the wrapper

Test Plan

toy workspace:

import os
from dagster import repository, pipeline, solid
from dagster_slack import make_slack_on_pipeline_failure_sensor
from dagster_slack.hooks import slack_on_success


@solid
def solid1():
    raise Exception()


@pipeline
def im_in_repo1():
    solid1()


slack_on_piepline_failure_in_repo1 = make_slack_on_pipeline_failure_sensor(
    channel="#yuhan-test", slack_token=os.getenv("SLACK_DAGSTER_ETL_BOT_TOKEN")
)


@pipeline
def im_in_repo2():
    solid1()


@repository
def repo1():
    return [im_in_repo1, slack_on_piepline_failure_in_repo1]


slack_on_piepline_failure_in_repo2 = make_slack_on_pipeline_failure_sensor(
    channel="#yuhan-test",
    slack_token=os.getenv("SLACK_DAGSTER_ETL_BOT_TOKEN"),
    name="slack_on_piepline_failure_in_repo2",
)


@repository
def repo2():
    return [im_in_repo2, slack_on_piepline_failure_in_repo2]

slack_on_piepline_failure_in_repo1 only acts on im_in_repo1

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

yuhan edited the test plan for this revision. (Show Details)
yuhan added reviewers: prha, sandyryza.
python_modules/dagster/dagster/core/definitions/pipeline_sensor.py
170–187

[2]

python_modules/dagster/dagster/daemon/sensor.py
275 ↗(On Diff #40627)

[1]

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 30 2021, 11:48 PM
Harbormaster failed remote builds in B32964: Diff 40627!
yuhan retitled this revision from rfc: add repository_name to SensorEvaluationContent + make pipeline_failure_sensor repo scoped to rfc: add repository_origin_id to SensorEvaluationContent + make pipeline_failure_sensor repo scoped.Jul 1 2021, 4:07 PM
yuhan edited the summary of this revision. (Show Details)
yuhan edited the summary of this revision. (Show Details)

repository_origin_id

yuhan requested review of this revision.Jul 1 2021, 4:33 PM

Sorry for the thrash, but perhaps origin id should be an internal-only concept?

cc @cdecarolis

python_modules/dagster/dagster/core/definitions/sensor.py
370

yikes, this might be an argument against repository_origin_id, to be honest.

having users construct this themselves seems a little hairy.

python_modules/dagster/dagster/core/definitions/sensor.py
370

agreed. after putting this together, i think i actually prefer repository_name over repository_origin_id.

also, external repo, reconstructable repo, and repo origin are all internal concept. so repository_name solely may be the best option here.

I don't have a ton of context - but would it make sense to make this behaviour opt-in? That is, maybe it makes sense in some cases for users to want sensor to fire across repos, and in some cases want to filter based on repo.

@cdecarolis you are right and that is the proposal here. currently a sensor's evaluation function does not have access to its repo info, so user code can't have logic based on the repo. what this diff does is to expose the repo info via SensorEvaluationContext so inside the body of the sensor, users can opt in to the filter logic if needed.

the reservation here is which type of repo info we want to expose. the initial thinking was repo_origin_id but it feels more of an internal-only concept.

yuhan retitled this revision from rfc: add repository_origin_id to SensorEvaluationContent + make pipeline_failure_sensor repo scoped to rfc: add repository_origin_id to SensorEvaluationContext + make pipeline_failure_sensor repo scoped.Jul 1 2021, 5:10 PM

@prha
here's the version of repository_name: https://dagster.phacility.com/D8648?id=40696 (the current revision)
and here's the version repository_origin_id or anything origin related: https://dagster.phacility.com/D8648?id=40655

i like the repository_name bc it's much cleaner both internal and external facing

yeah, I like this better, I think

This revision is now accepted and ready to land.Jul 1 2021, 8:08 PM
In D8648#225346, @yuhan wrote:

@cdecarolis you are right and that is the proposal here. currently a sensor's evaluation function does not have access to its repo info, so user code can't have logic based on the repo. what this diff does is to expose the repo info via SensorEvaluationContext so inside the body of the sensor, users can opt in to the filter logic if needed.

the reservation here is which type of repo info we want to expose. the initial thinking was repo_origin_id but it feels more of an internal-only concept.

ah gotcha, thanks for the clarification. Repository name def seems better from my perspective as well, fwiw.

yuhan retitled this revision from rfc: add repository_origin_id to SensorEvaluationContext + make pipeline_failure_sensor repo scoped to rfc: add repository_name to SensorEvaluationContext + make pipeline_failure_sensor repo scoped.Jul 1 2021, 8:36 PM
yuhan edited the summary of this revision. (Show Details)