Page MenuHomeElementl

Refactor dagster-dbt integration / add dbt cli resource
ClosedPublic

Authored by cdecarolis on Jun 16 2021, 1:05 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, May 14, 8:48 PM
Unknown Object (File)
Sun, May 14, 5:06 PM
Unknown Object (File)
Sun, May 14, 3:00 AM
Unknown Object (File)
Sat, May 13, 11:25 PM
Unknown Object (File)
Sat, May 13, 7:57 PM
Unknown Object (File)
Sat, May 13, 11:54 AM
Unknown Object (File)
Sat, May 13, 11:30 AM
Unknown Object (File)
Mon, May 1, 10:32 AM
Subscribers
None

Details

Summary

Refactored / reworked a lot of the cli side of the dagster-dbt integration. List of changes:

  • changed output data types so that we aren't passing around untyped dictionaries as much and so that we can have more consistent return types
  • removed the "summary" thing from DbtCliOutput. I can absolutely guarantee that nobody was using this because there was a bug that would make it empty regardless of the input :)
  • removed the weird asset materializations that represent just normal outputs. I feel like these make absolutely zero sense with the story that we want to tell around asset materializations, and will just clutter up the asset catalog while providing basically zero value.

There are a couple breaking changes mixed in here (mostly related to the output types), but migration would be fairly easy for that stuff. Technically people might have been depending on the weird asset materializations, in which case this change would be more disruptive, but tbh I kinda doubt that this was that useful to anyone (famous last words).

To be honest, I'd ideally like to remove a lot of the solids outside of the main path dbt commands (like test and run). These add a ton of maintenance overhead (if dbt adds a flag to any of these, people will be blocked on using it until we update the library), whereas the resource workflow is a lot more flexible (it has a permissive flags dictionary, so you can pass in whatever flag you want, even if we're not aware of it yet).

The main value-add here is the dbt_cli_resource, which will solve basically all the ergonomic issues that I was experiencing w/ the dbt pipeline in demo-land. Now that this is in a working state,I'll try to put out a quick diff on that pipeline and link it to this to give a better sense of how that would look.

Test Plan

bk

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

owen held this revision as a draft.

Requesting changes for queue management

python_modules/libraries/dagster-dbt/dagster_dbt/cli/constants.py
15

I believe that, in our other integrations, we underscores instead of dashes. I think it's worth standardizing if we're providing something new.

python_modules/libraries/dagster-dbt/dagster_dbt/cli/resources.py
124

Do we typically suffix resources in our other integrations with resource?

131

Why can't someone just use context.log?

This revision now requires changes to proceed.Jul 1 2021, 4:08 PM
python_modules/libraries/dagster-dbt/dagster_dbt/cli/constants.py
15

hm in this case, these are representing specific command-line flags (which have the hyphen format). I definitely could do some conversion behind the scenes to make it work, but I think it would be more intuitive to someone familiar with dbt to write the "-" instead of "_"

python_modules/libraries/dagster-dbt/dagster_dbt/cli/resources.py
131

The library produces log messages when running CLI commands, which are generally fairly helpful. However, in order for this to work, the internal library function needs to have access to the logger in the context. Having the logger at the resource level means that you don't have to pass in the context.log into every resource method call

python_modules/libraries/dagster-dbt/dagster_dbt/cli/resources.py
124

as far as I know, yep. we have things like s3_resource, bigquery_resource, snowflake_resource, etc.

  • major refactor
  • make black
python_modules/libraries/dagster-dbt/dagster_dbt/cli/constants.py
15

My instinct is still to go with the underscores.

I think it's fairly common that, when an API is translated to a different setting, the naming convention changes. E.g. our GraphQL APIs use camel case, but our Python wrappers for them use snake case.

To use an example outside of Dagster, Spark has a command-line flag called "--driver-memory", but their corresponding config property is "spark.driver.memory".

If we had a config property that was called something like "cli_flags", then I'd definitely be on board with having it accept hyphen-delimited strings as the values.

Maybe easier to resolve in person if we still see things differently.

  • Updated docs for dbt integration
python_modules/libraries/dagster-dbt/dagster_dbt/cli/constants.py
15

I'm fine with that, but I think I'll keep the solids in the hyphen state for now, just for backcompat. If and when we eventually remove those solids, we can change all of these to underscores, but for now I'll just do it in code.

This looks great!

python_modules/libraries/dagster-dbt/dagster_dbt/cli/resources.py
202

This documentation should include what configuration options are available.

211

dbt_rpc_resource -> dbt_cli_resource

This revision is now accepted and ready to land.Jul 7 2021, 3:25 PM
cdecarolis edited reviewers, added: owen; removed: cdecarolis.

Rebasing + final touches