Page MenuHomePhabricator

Add schedules to weather pipelines
AbandonedPublic

Authored by themissinghlink on Jan 13 2020, 9:16 PM.

Details

Summary

This PR gets rid of the original weather pipeline which did nothing more than generate a huge one time dataset. It opts for a more realistic pipeline that gets the days weather intended to run on a daily schedule while relying on a backfill pipeline in cases of error.

changelog

  • Got rid of weather presets
  • Got rid of weather pipeline and added extract_daily_weather_data_pipeline and weather_dataset_backfill pipelines.
  • Added schedules for the extract_daily_weather_data_pipeline pipeline.
  • Added necessary solids to make these pipelines work the way they were intended.
Test Plan

unit

Diff Detail

Repository
R1 dagster
Branch
implement-retraining-pipelines (branched from master)
Lint
Lint OK
Unit
No Unit Test Coverage

Event Timeline

themissinghlink retitled this revision from clean up weather pipelines and make it work for retraining to Add schedules to weather pipelines.Jan 13 2020, 9:22 PM
themissinghlink edited the summary of this revision. (Show Details)
  • made bay bike examples mypy compatible
prha added a comment.Jan 14 2020, 12:39 AM

As discussed offline, ideally, we'd run the backfill using a dagster pipeline backfill command instead of a custom pipeline.

We'd then only need the extract_daily_weather_data_pipeline pipeline defined. If the CLI is too limiting, I can help write a custom backfill script using the partition logic.

prha requested changes to this revision.Jan 14 2020, 6:45 PM

clearing queue

This revision now requires changes to proceed.Jan 14 2020, 6:45 PM

Realized that if we wanted to switch to the partition API, we need to design my pipelines to require transactional consistency. cloud storage is not that, so we need to switch to using a database. I should have done this a long time ago, but am going to do a bunch of incremental refactors to get there instead of blowing up this diff.