Page MenuHomePhabricator

[sensors-10] schedule def new API
ClosedPublic

Authored by prha on Nov 24 2020, 11:41 PM.

Details

Summary

Should note: this will require a wipe for folks who are running off of master. This changes the schema by dropping run_key as a column, and changing the tick data schema.

Thought about doing a migration, but it might save some complexity to require the wipe.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Nov 24 2020, 11:57 PM
Harbormaster failed remote builds in B21735: Diff 26391!
Harbormaster returned this revision to the author for changes because remote builds failed.Nov 25 2020, 3:58 PM
Harbormaster failed remote builds in B21754: Diff 26418!
Harbormaster returned this revision to the author for changes because remote builds failed.Nov 25 2020, 10:25 PM
Harbormaster failed remote builds in B21807: Diff 26488!
  • test multi run scheduler
  • get rid of run_key on ticks table
  • store multiple run_ids instead of single run_id
Harbormaster returned this revision to the author for changes because remote builds failed.Dec 1 2020, 3:45 AM
Harbormaster failed remote builds in B21911: Diff 26606!
prha requested review of this revision.Dec 1 2020, 5:02 AM
prha added reviewers: dgibson, schrockn.
prha added a reviewer: sashank.

accepting but we should probably sort out the range query thing before landing

python_modules/dagster-graphql/dagster_graphql/schema/jobs.py
34

this is O(n) sql queries in the number of runs, do we anticipate sensors returning lots of runs?

python_modules/dagster-graphql/dagster_graphql/schema/schedules/ticks.py
4–6

does whatever UI this is powering need to be updated? Technically after your work it's possible to have a success tick but no first run Id right?

python_modules/dagster/dagster/cli/api.py
593

as discussed offline this should probably wait until all runs are created?

We don't need to be as careful for this path though since we don't have the same failure recovery guarantees

python_modules/dagster/dagster/core/definitions/job.py
58

sensor/schedule?

65

i missed that this is now required, excellent

python_modules/dagster/dagster/core/definitions/schedule.py
65–81

should we indicate that some of these are deprecated now

python_modules/dagster/dagster/core/storage/schedules/sql_schedule_storage.py
155–172

as discussed offline it seems super important that the query we are replacing this with be possible without a full table scan on run storage - that seems blocking for landing this so that existing schedule runs on master doesn't slow to a crawl on large run DBs

python_modules/dagster/dagster/scheduler/scheduler.py
53

oops

314

if we're going to show run key in dagit for sensors should we also show this? Maybe in a less gross-looking format / in the actual execution timezone? Shouldn't block this massive diff though of course

316–321

this is what we need to make efficient looks like

python_modules/dagster/dagster_tests/scheduler_tests/test_scheduler_failure_recovery.py
69

Should we include how many runs? Maybe their IDs?

This revision is now accepted and ready to land.Dec 1 2020, 6:00 PM
prha edited the summary of this revision. (Show Details)
  • comments

i'll let dgibson handle the rest of this, but i'm really excited to see this api consolidation with sensors

python_modules/dagster/dagster_tests/scheduler_tests/test_scheduler_run.py
195–202

really exciting to see this

fix check for wrong config

This revision was automatically updated to reflect the committed changes.