Page MenuHomePhabricator

RFC: Set up sensor definition, set up boiler-plate for API evaluation
AbandonedPublic

Authored by prha on Oct 26 2020, 4:07 PM.

Details

Summary

Set up sensor in terms of job, evaluate both sensor / job parameters.

Sensors are passed a hard-coded job definition. The external API evaluates the sensor definition and if successful, the resulting job run-time execution params.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Branch
prha/sensor
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

prha retitled this revision from Set up sensor definition, set up boiler-plate for API evaluation to RFC: Set up sensor definition, set up boiler-plate for API evaluation.Oct 26 2020, 4:33 PM
prha edited the summary of this revision. (Show Details)
prha added reviewers: dgibson, schrockn, alangenfeld.
prha requested review of this revision.Oct 26 2020, 4:35 PM

My major question is more high-level, in terms of our final ontology of Job, Schedule, Sensor, etc.

This proposal is thats SensorDefinition *contains* a job definition instance.

How does this relate to the ScheduleDefinition?

Did we consider a world where the Sensor *is a* job?

Wondering what the plan is here.

couple high-level questions to start, kinda related to nick's question

python_modules/dagster/dagster/api/snapshot_sensor.py
13–28

fwiw people have to opt-in in order to use the CLI API now - it's days are numbered and we could probably just not support new features on it if we want

python_modules/dagster/dagster/core/definitions/decorators/sensor.py
8–15

do we think job is a user-facing feature? or something that features like sensors use under the hood? Kinda feels like a one-two punch of two new concepts at once currently.

10

would there be like a check_frequency argument here? or is the guarantee just ASAP / that we'll be checking it in a reasonably tight loop?

python_modules/dagster/dagster/core/definitions/decorators/sensor.py
15

Yeah this is really concerning. I am really trying to make an effort to both reduce number of overall concepts (@graph thing is just one dimension of this) and I *really* think we need to not add any more "2 concepts at a time" features (like Mode/Resource).

python_modules/dagster/dagster/core/definitions/decorators/sensor.py
15

Yeah, this is definitely adding both the concept of a Job, and the concept of a Sensor.

I can definitely move on this, but I originally conceived of Job as being the representation of the run-time determined execution params for a pipeline run. Sensors, triggers, schedules etc are all instigation policies that resolve to a Job, which then gets launched to a run. You could then mix and match jobs, which might kick off runs from multiple instigation policies. For instance, you could have the same Job be run on a schedule, and have a sensor hooked up to it, and manually triggered by some push mechanism.

It's a fair question to wonder if this level of flexibility is worth the complexity. It's also been a helpful exercise to see if we could flatten out the APIs to partitions, schedules, sensors, etc. Right now, it's still TBD whether the API feels good. A lot of this is still RFC because I want to hook up simple sensors, and then see if the API is right for doing very complicated conditions (e.g. sequential partition dependency sensors).

https://excalidraw.com/#json=5713369029935104,EW3dMUf0EGgki-VYdNQwSA

python_modules/dagster/dagster/core/definitions/decorators/sensor.py
15

It's a fair question to wonder if this level of flexibility is worth the complexity

I think it's fairly common for people to manually trigger one-offs of regularly scheduled jobs. E.g. you push some new code and want to update the table immediately instead of waiting for the next scheduled run.

It's pretty natural to want to group the scheduled runs with the manually-triggered runs. E.g. if you're trying to answer the question "given that some new code was pushed, do I need to kick off a job to update the table?", you want to see both in one place.

It's a little difficult for me to think of situations where you'd want to see _only_ the scheduled runs or _only_ the manual runs.

rumor has it there are changes coming - to your queue

This revision now requires changes to proceed.Oct 28 2020, 5:45 PM