Page MenuHomePhabricator

Implement start_schedule and end_schedule for SystemCronTabScheduler
ClosedPublic

Authored by sashank on Aug 15 2019, 7:08 PM.

Details

Summary

The first implementation of the scheduler is SystemCronScheduler. We can interact with the scheduler through dagster-graphql, with the following four mutation:

  • createRunSchedule(schedule: $schedule)
  • startRunSchedule(scheduleId: $scheduleId)
  • endRunSchedule(scheduleId: $scheduleId)
  • deleteRunSchedule(scheduleId: $scheduleId)

When you create a schedule using SystemCronScheduler, a file is persisted to disk containing the schedule_id, name, cron_schedule, and execution_params of the schedule.

When you _start_ a schedule using SystemCronScheduler, a bash script is created that invokes dagster-graphql:

#!/bin/bash
export DAGSTER_HOME={dagster_home}
export LANG=en_US.UTF-8

{dagster_graphql_path} -p startPipelineExecution -v '{variables}' --log -y "{repo_path}" >> {log_file} 2>&1

The dagster_graphql_path and the repo_path are captured during the startRunSchedule. For this v1, the ExecutionTargetHandle on the DagsterGraphQLContext must point to a repository yaml file. A log is maintained, per schedule, to help debug issues with the dagster-graphql invocation.

A job is added to the user's crontab that runs the bash script:

* * * * * /Users/sashankthupukari/dagster/schedules/experimental/toys_repository/many_events_schedule_b215b131-6083-4b7f-993f-1eaa6e295e6d.sh # b215b131-6083-4b7f-993f-1eaa6e295e6d

When you _end_ a schedule, the cron job is removed, the bash script is deleted, and the repo_path and python_path are reset.

Test Plan

test_scheduler.py, and manual e2e tests

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

sashank created this revision.Aug 15 2019, 7:08 PM
sashank planned changes to this revision.Aug 15 2019, 7:08 PM
sashank updated this revision to Diff 3745.Aug 15 2019, 9:04 PM

Get repository path from DagsterGraphQLContext, and python path from sys.executable, and store both on RunSchedule

sashank updated this revision to Diff 3747.Aug 15 2019, 10:16 PM

Add flag to RunSchedule to indictate whether a schedule is runnning

sashank updated this revision to Diff 3754.Aug 15 2019, 10:58 PM

Add dependency

sashank updated this revision to Diff 3770.Aug 16 2019, 8:18 PM

Mutations for start and end schedule, create TestSystemCronScheduler, and write tests for all new queries and mutations

sashank updated this revision to Diff 3772.Aug 16 2019, 8:31 PM

Write to log file with filename {schedule name}_{schedule id}.log in the same directory as the bash script and metadata file

sashank updated this revision to Diff 3778.Aug 16 2019, 8:55 PM

Fix end_schedule, and remove hardcoded DAGSTER_HOME

sashank updated this revision to Diff 3779.Aug 16 2019, 8:55 PM

make graphql

Harbormaster failed remote builds in B3024: Diff 3779!
sashank retitled this revision from [WIP] Implement crontab scheduler start and end to Implement crontab scheduler start and end.Aug 16 2019, 9:08 PM
sashank edited the summary of this revision. (Show Details)
sashank edited the test plan for this revision. (Show Details)
sashank added a reviewer: Restricted Project.
sashank updated this revision to Diff 3780.Aug 16 2019, 9:15 PM
sashank edited the summary of this revision. (Show Details)

Mock $DAGSTER_HOME for test

sashank retitled this revision from Implement crontab scheduler start and end to Implement start_schedule and end_schedule for SystemCronTabScheduler.Aug 16 2019, 9:16 PM
sashank updated this revision to Diff 3781.Aug 16 2019, 9:19 PM

Add bash script deletion check in test_start_and_end_schedule()

looks pretty solid to me

python_modules/dagster-graphql/dagster_graphql/schema/run_schedule.py
125–140

kwargs['schedule_id']

why not just define schedule_id in the function signature

python_modules/dagster-graphql/dagster_graphql_tests/graphql/setup.py
82–98

hm i worry about the test one inheriting from the real one a bit - is it not feasible to have a TestScheduler that inherits from the base?

python_modules/dagster/requirements.txt
14

@max this seem fine?

sashank updated this revision to Diff 3783.Aug 16 2019, 9:40 PM

Remove use of kwargs in start and end schedule mutations

sashank added inline comments.Aug 16 2019, 9:55 PM
python_modules/dagster-graphql/dagster_graphql/schema/run_schedule.py
125–140

good point, fixed

python_modules/dagster-graphql/dagster_graphql_tests/graphql/setup.py
82–98

I wanted to test the file read/write functionality (like creating the bash script) that is specific to SystemCronScheduler, and just prevent touching the crontab. I could inherit from the base class, but I would end up copying almost all the code over.

sashank updated this revision to Diff 3792.Aug 16 2019, 10:54 PM

Add mock to dagster-graphql dev requirements

schrockn added inline comments.
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/scheduler.py
138

In general I think we should bias towards having immutable objects and then having methods like start_schedule and end_schedule create new copies of RunSchedule with the new values. Channel your Rich Hickey (https://www.youtube.com/watch?v=-6BsiVyC1kM)

alangenfeld added inline comments.Mon, Aug 19, 3:26 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/scheduler.py
138

its also worth considering whether a RunSchedule should even have is_running, whether it is running on this particular machine (and py path and repo path) are arguably scheduler impl specific details

alangenfeld requested changes to this revision.Mon, Aug 19, 3:29 PM
This revision now requires changes to proceed.Mon, Aug 19, 3:29 PM
sashank updated this revision to Diff 3810.Mon, Aug 19, 8:47 PM

Make RunSchedule immutable and remove is_running for now

sashank added inline comments.Mon, Aug 19, 8:53 PM
python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/scheduler.py
138

Good point. I'll leave py path and repo path on here for now, and move them to the cron scheduler implementation in the next diff in the stack where I'm making more changes regarding these two fields.

seems good now. will let @alangenfeld approve

python_modules/dagster-graphql/dagster_graphql/implementation/scheduler/scheduler.py
105

i would suggest using kwargs for None arguments for documentation

RunSchedule(
            self.schedule_id, self.name, self.cron_schedule, self.execution_params, python_path=None, repository_path=None
)
python_modules/dagster-graphql/dagster_graphql/schema/run_schedule.py
132

feedback re: kwargs unaddressed

schrockn accepted this revision.Mon, Aug 19, 11:38 PM

Let's get this merged. Please heed final feedback

sashank updated this revision to Diff 3834.Mon, Aug 19, 11:51 PM

Remove use of kwargs

sashank updated this revision to Diff 3835.Mon, Aug 19, 11:56 PM

Use kwargs for RunSchedule

This revision is now accepted and ready to land.Tue, Aug 20, 12:10 AM