Page MenuHomePhabricator

#2188 convert local time to utc in PipelineRunStatsSnapshot

Authored by yuhan on Thu, Mar 19, 11:57 PM.

Diff Detail

R1 dagster
Automatic diff as part of commit; lint not applicable.
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

yuhan created this revision.Thu, Mar 19, 11:57 PM
yuhan edited the summary of this revision. (Show Details)Fri, Mar 20, 12:01 AM
yuhan added a reviewer: sashank.
sashank added subscribers: max, alangenfeld.EditedFri, Mar 20, 12:21 AM

I believe the issue is on the datetime creation end rather than the parsing end. In Python, there are two types of datetime objects: naive and aware. For context: The main problem is that we're storing naive datetimes in our database, i.e. datetimes without a timezone.

When we use unix timestamps, we're in the clear because a unix timestamp is consistent across timezones, by definition. However, when we store events in a SQL database, we do the following:$53-58

I believe the culprit is: datetime.datetime.fromtimestamp(event.timestamp), which generates a naive timestamp using the user's timezone. The moment we do this, we lose any relative time information we have because this datetime doesn't have a timezone. I think we should pass fromtimestamp a tz=timezone.utc argument, and when we the date time back out it out converting it back to a unix timestamp using datetime_as_float should work correctly without any changes.

cc @max @alangenfeld since this seems like a pretty big issue for our historical runs.

sashank requested changes to this revision.Fri, Mar 20, 9:16 PM

Back to your queue

This revision now requires changes to proceed.Fri, Mar 20, 9:16 PM
yuhan updated this revision to Diff 10790.Sat, Mar 21, 4:54 AM

fix timezone at creation rather than parsing

@sashank plz see the version tags in the runs. Both runs were started around 3:20PM PST, which is 22:20PM UTC)
with this change,

  • (version: master) the runs started before this change would not have correct timestamp
    • stored in storage as local
    • displayed in ui as localized local (which is messed up).
  • (version: diff w/ utc_datetime_from_timestamp) only the runs started after this change will have the correct timestamp
    • they are stored in storage as utc
    • displayed as local

however, changing EPOCH = datetime.datetime.utcfromtimestamp(0) to EPOCH = datetime.datetime.fromtimestamp(0) gave:

which are all wrong.

What I could do is to add a condition to check if the dt has timezone in datetime_as_float, and then adjust the EPOCH to either local or utc accordingly.
However, the TIMESTAMP type in sqlite ( doesn't seem to have timezone info with it, even though we inserted the datetime with timezone in utc_datetime_from_timestamp(event.timestamp) (#56 of this diff).
The reason is, I believe, sqlite doesn't have a DATETIME type and it only has TIMESTAMP type and a DATETIME func (which allows you to tell it which timezone you want to convert the TIMESTAMP to). So it will convert all datetime obj in python through sqlalchemy to timestamp (i.e. stored everything without timezone). Meaning, although we give python datetime with timezone to sqlite, it will lose the timezone info in its backend.

So, unfortunately, with sqlite we couldn't tell the timezone to make this backwards compatible.

  • well, the only way to do this is to store the timezone explicitly in db, either as an extra column or store time in string. But this would be a bad practice because (1) all time should be in UTC in db level, we shouldn't introduce timezone there (2) different db treats timezone differently, so we will lose the current nice flexibility to use any db if we introduce timezone.
sashank accepted this revision.Wed, Mar 25, 7:17 PM
This revision is now accepted and ready to land.Wed, Mar 25, 7:17 PM

Thanks for taking the time to talk this through!

yuhan updated this revision to Diff 10994.Wed, Mar 25, 7:20 PM

rebase master and run BK before land