Page MenuHomeElementl

Use tz_localize to get naive local timestamp
ClosedPublic

Authored by jordansanders on Jul 2 2021, 9:04 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Nov 26, 12:21 PM
Unknown Object (File)
Fri, Nov 18, 12:37 AM
Unknown Object (File)
Fri, Nov 18, 12:37 AM
Unknown Object (File)
Fri, Nov 18, 12:37 AM
Unknown Object (File)
Mon, Nov 14, 4:14 PM
Unknown Object (File)
Fri, Nov 11, 1:59 AM
Unknown Object (File)
Wed, Nov 9, 10:58 PM
Unknown Object (File)
Tue, Nov 8, 5:12 AM
Subscribers
None

Details

Summary

Pandas 1.3.0 introduces a bunch of changes and deprecations around how
timestamps are handled. With the change, calling
Timestamp.min.replace() or Timestamp.max.replace() causes out of
bounds errors:

pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43

This is presumably the same kind of error that this code is trying to
avoid - by converting min and max timestamps to something slightly
inside the true bounds.

It seems that Timestamp.min.replace(tzinfo=None) and
Timestamp.min.tz_localize(None) are semantically equivalent - and as
an added bonus, the latter is more performant and doesn't hit our out of
bounds errors:

https://stackoverflow.com/a/34687479

I believe the risk of this causing issues is low - if I understand the
surrounding code correctly, it only matters when trying to validate
DataFrames that have Timestamps that are very close to the actual
min/max valid Timestamp.

Test Plan

unit

Diff Detail

Repository
R1 dagster
Lint
Lint Not Applicable
Unit
Tests Not Applicable