Page MenuHomePhabricator

Initial dagster-dbt prototype
ClosedPublic

Authored by schrockn on Aug 10 2019, 9:37 PM.

Details

Reviewers
natekupp
Group Reviewers
Restricted Project
Commits
R1:f545529d3bdb: Initial dagster-dbt prototype
Summary

Finally was able to make some time for this! This is a
prototype-quality dbt integration, but it demonstrates what the shape of
this would look like.

I copied the example from https://github.com/fishtown-analytics/jaffle_shop/

This shells out to dbt itself and runs against the database in the
examples docker container. (I had to manually create the database). It
just parses stdout with regex's, which is quite fragile.

It would be better if dbt emitted some sort of structured log so that
this could be parsed more reliably. Taylor is excited enough about this
possibility that he might tackle it. See https://github.com/fishtown-analytics/dbt/issues/1237

This emits materializations for each view or table created in the
example. This has not been thoroughly tested.

Next Steps:

  • Parse materializations out of the dbt project and render them as outputs
  • Also create a type-per-model and render metadata within dagit.
  • Consume dbt tests during a run and emit Expectations.
  • Model "seeds" as inputs into the dbt node.

Here is a view of this in action:

Test Plan

Run jaffle example in dagit. Buildkite

Diff Detail

Repository
R1 dagster
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

schrockn created this revision.Aug 10 2019, 9:37 PM

Confused by test failure:

"+ coverage combine
Can't combine line data with arc data"

natekupp accepted this revision.Aug 12 2019, 12:11 AM
natekupp added a subscriber: natekupp.
natekupp added inline comments.
python_modules/dagster/dagster/core/events/__init__.py
364

maybe else 'Materialized value {label}.'format(label=materialization.label or '<unknown>')?

python_modules/libraries/dagster-aws/dagster_aws_tests/cloudwatch_tests/test_loggers.py
94

ha - rebase on master for this

This revision is now accepted and ready to land.Aug 12 2019, 12:11 AM

Looks like a good start. Are there other outputs w/ semantic significance besides tables and views that we should parse? or worth waiting for those on the structured events?

schrockn added a comment.EditedAug 12 2019, 1:25 AM

Yeah there is a test concept that i think we can do. Will follow up in another diff

schrockn updated this revision to Diff 3595.Aug 12 2019, 1:27 AM
schrockn removed a reviewer: natekupp.

refactor

This revision now requires review to proceed.Aug 12 2019, 1:27 AM
max accepted this revision.Aug 12 2019, 9:29 PM
This revision is now accepted and ready to land.Aug 12 2019, 9:29 PM
natekupp accepted this revision.Aug 12 2019, 9:30 PM
schrockn updated this revision to Diff 3616.Aug 12 2019, 9:38 PM
schrockn removed reviewers: max, natekupp.

up

This revision now requires review to proceed.Aug 12 2019, 9:38 PM
natekupp accepted this revision.Aug 12 2019, 9:39 PM
This revision is now accepted and ready to land.Aug 12 2019, 9:39 PM
This revision was automatically updated to reflect the committed changes.

Here is a video of this in action.

schrockn edited the summary of this revision. (Show Details)Aug 13 2019, 6:06 PM