HomePhabricator

Consolidate bay bike csv files into one csv file and upload to bucket resource

Authored by themissinghlink on Nov 14 2019, 2:15 AM.

Description

Consolidate bay bike csv files into one csv file and upload to bucket resource

Summary:
Problem

We have like 24 csv files sitting in a directory. Now we want to consolidate them into one big csv file and upload it to a bucket.

Solution

  • Wrote consolidate_csv_files and upload_file_to_bucket solids.
  • Extended pipeline to use these solids and hooked in local mode resources.
  • Defined a poor mans AbstractBucket interface and a LocalBucket resource.
  • Define a GCS bucket for prod mode.
  • Updated unit tests which work.

Proof This Works

Use the following config in prod mode on dagit and see the dagster-scratch bucket for the consolidated.csv file!

resources:
  bucket:
    config:
      bucket_name: dagster-scratch-ccdfe1e
solids:
  consolidate_csv_files:
    inputs:
      source_dir:
        value: ./data
      target:
        value: ./data/consolidated.csv
  download_zipfiles_from_urls:
    inputs:
      base_url:
        value: 'https://s3.amazonaws.com/baywheels-data'
      chunk_size:
        value: 8192
      file_names:
          - value: 201801-fordgobike-tripdata.csv.zip
          - value: 201802-fordgobike-tripdata.csv.zip
          - value: 201803-fordgobike-tripdata.csv.zip
          - value: 201804-fordgobike-tripdata.csv.zip
          - value: 201805-fordgobike-tripdata.csv.zip
          - value: 201806-fordgobike-tripdata.csv.zip
          - value: 201807-fordgobike-tripdata.csv.zip
          - value: 201808-fordgobike-tripdata.csv.zip
          - value: 201809-fordgobike-tripdata.csv.zip
          - value: 201810-fordgobike-tripdata.csv.zip
          - value: 201811-fordgobike-tripdata.csv.zip
          - value: 201812-fordgobike-tripdata.csv.zip
          - value: 201901-fordgobike-tripdata.csv.zip
          - value: 201902-fordgobike-tripdata.csv.zip
          - value: 201903-fordgobike-tripdata.csv.zip
          - value: 201904-fordgobike-tripdata.csv.zip
          - value: 201905-baywheels-tripdata.csv.zip
          - value: 201906-baywheels-tripdata.csv.zip
          - value: 201907-baywheels-tripdata.csv.zip
          - value: 201908-baywheels-tripdata.csv.zip
          - value: 201909-baywheels-tripdata.csv.zip
      target_dir:
        value: /tmp
  unzip_files:
    inputs:
      source_dir:
        value: /tmp
      target_dir:
        value: ./data

Test Plan: unit

Reviewers: #ft, max

Reviewed By: #ft, max

Subscribers: max

Differential Revision: https://dagster.phacility.com/D1396

Details

Committed
themissinghlinkNov 14 2019, 10:29 PM
Reviewer
Restricted Project
Differential Revision
D1396: Consolidate bay bike csv files into one csv file and upload to bucket resource
Parents
R1:d8d18647f6c2: [instance] support overrides
Branches
Unknown
Tags
Unknown