Page MenuHomePhabricator

Add schedules to weather pipelines

Authored by themissinghlink on Jan 13 2020, 9:16 PM.



This PR gets rid of the original weather pipeline which did nothing more than generate a huge one time dataset. It opts for a more realistic pipeline that gets the days weather intended to run on a daily schedule while relying on a backfill pipeline in cases of error.


  • Got rid of weather presets
  • Got rid of weather pipeline and added extract_daily_weather_data_pipeline and weather_dataset_backfill pipelines.
  • Added schedules for the extract_daily_weather_data_pipeline pipeline.
  • Added necessary solids to make these pipelines work the way they were intended.
Test Plan


Diff Detail

R1 dagster
implement-retraining-pipelines (branched from master)
Lint OK
No Unit Test Coverage

Event Timeline

themissinghlink retitled this revision from clean up weather pipelines and make it work for retraining to Add schedules to weather pipelines.Jan 13 2020, 9:22 PM
themissinghlink edited the summary of this revision. (Show Details)
  • made bay bike examples mypy compatible
prha added a comment.Jan 14 2020, 12:39 AM

As discussed offline, ideally, we'd run the backfill using a dagster pipeline backfill command instead of a custom pipeline.

We'd then only need the extract_daily_weather_data_pipeline pipeline defined. If the CLI is too limiting, I can help write a custom backfill script using the partition logic.

prha requested changes to this revision.Jan 14 2020, 6:45 PM

clearing queue

This revision now requires changes to proceed.Jan 14 2020, 6:45 PM

Realized that if we wanted to switch to the partition API, we need to design my pipelines to require transactional consistency. cloud storage is not that, so we need to switch to using a database. I should have done this a long time ago, but am going to do a bunch of incremental refactors to get there instead of blowing up this diff.