Page MenuHomeElementl

RFC use custom yaml loader
Needs RevisionPublic

Authored by alangenfeld on Nov 12 2020, 10:56 PM.

Details

Summary

If someone has a yaml file with a datestring like 2020-01-01 , it gets parsed in to a datetime object which will fail when it hits our serdes libraries.

I think if we want to support datetime object creation, it sould be part of the config system's post processing, not the yaml step.

This diff changes our yaml utils to use a custom child of SafeLoader which drops this datetime resolution.

resolves #2654

Test Plan

added test, can add more

Diff Detail

Repository
R1 dagster
Branch
yaml (branched from master)
Lint
Lint Warnings
SeverityLocationCodeMessage
Warningpython_modules/dagster/dagster_tests/general_tests/utils_tests/test_yaml.py:117W0612Unused Variable
Unit
No Test Coverage

Event Timeline

So I guess the question is, would users expect these to be interchangeable?

preset = PresetDefinition.from_files(['file.yaml'])
preset = PresetDefinition(yaml.safe_load(open('file.yaml')))

I think maybe? How difficult is handling this at the serdes level?

To your queue

This revision now requires changes to proceed.Nov 16 2020, 3:43 PM

I think maybe?

That makes sense - but then you are going to fail config validation unless this is in a part of the config that is unchecked like in a Permissive dict.

My intuition was that it would be weird to get a datetime object in your solid_config , but that also comes from a place of being surprised that yaml parsing automatically coerces certain strings in to datetime objects.

Should this diff come with a datetime oriented config type? Would that type allow datetime objects at input time as well as parsing strings in to datetime in post process?

requesting review to gather more thoughts

How difficult is handling this at the serdes level?

very doable if we want to go that path

My intuition was that it would be weird to get a datetime object in your solid_config , but that also comes from a place of being surprised that yaml parsing automatically coerces certain strings in to datetime objects.

Yeah, I definitely agree this is weird.

I don't think I have a strong opinion either way.

What's the error message, or does parsing just proceed without the datetime being parsed? YAML is complex enough that I can imagine someone tearing their hair out about this behavior with no clue how to proceed..

schrockn requested changes to this revision.Dec 1 2020, 6:44 PM

whats the current error message?

This revision now requires changes to proceed.Dec 1 2020, 6:44 PM

the current status is having one of these date yaml entries in a preset file like

(dagenv38) ~/dagster:master$ git diff
diff --git a/python_modules/dagster-test/dagster_test/toys/environments/error.yaml b/python_modules/dagster-test/dagster_test/toys/environments/error.yaml
index 25a36607a..6be2300d9 100644
--- a/python_modules/dagster-test/dagster_test/toys/environments/error.yaml
+++ b/python_modules/dagster-test/dagster_test/toys/environments/error.yaml
@@ -2,6 +2,7 @@ resources:
   errorable_resource:
     config:
       throw_on_resource_init: false
+      date: 2020-01-01

 solids:
   end:

causes

(dagenv38) ~/dagster:master$ dagit
Traceback (most recent call last):
  File "/Users/alangenfeld/venvs/dagenv38/bin/dagit", line 11, in <module>
    load_entry_point('dagit', 'console_scripts', 'dagit')()
  File "/Users/alangenfeld/dagster/python_modules/dagit/dagit/cli.py", line 212, in main
    cli(auto_envvar_prefix="DAGIT")  # pylint:disable=E1120
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/alangenfeld/dagster/python_modules/dagit/dagit/cli.py", line 107, in ui
    host_dagit_ui(
  File "/Users/alangenfeld/dagster/python_modules/dagit/dagit/cli.py", line 133, in host_dagit_ui
    host_dagit_ui_with_workspace(instance, workspace, host, port, path_prefix, port_lookup)
  File "/Users/alangenfeld/dagster/python_modules/dagit/dagit/cli.py", line 140, in host_dagit_ui_with_workspace
    log_workspace_stats(instance, workspace)
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/core/telemetry.py", line 417, in log_workspace_stats
    repo_location = RepositoryLocation.from_handle(repository_location_handle)
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/core/host_representation/repository_location.py", line 143, in from_handle
    return GrpcServerRepositoryLocation(repository_location_handle)
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/core/host_representation/repository_location.py", line 315, in __init__
    external_repositories_list = sync_get_streaming_external_repositories_grpc(
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/api/snapshot_repository.py", line 46, in sync_get_streaming_external_repositories_grpc
    external_repository_chunks = list(
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/grpc/client.py", line 227, in streaming_external_repository
    for res in self._streaming_query(
  File "/Users/alangenfeld/dagster/python_modules/dagster/dagster/grpc/client.py", line 81, in _streaming_query
    yield from response_stream
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/grpc/_channel.py", line 416, in __next__
    return self._next()
  File "/Users/alangenfeld/venvs/dagenv38/lib/python3.8/site-packages/grpc/_channel.py", line 706, in _next
    raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception iterating responses: Object of type date is not JSON serializable"
	debug_error_string = "{"created":"@1610576893.368408000","description":"Error received from peer unix:/var/folders/yh/518m9l_92qngl6_ldyw54x2m0000gn/T/tmpmjdmmxck","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Exception iterating responses: Object of type date is not JSON serializable","grpc_status":2}"
>
This comment was removed by max.

if we do this, we need to clearly document it

clearing out queue. what is the plan with this?

This revision now requires changes to proceed.Mar 17 2021, 6:39 PM