| 1 | /*! |
| 2 | Provides a parser for [POSIX's `TZ` environment variable][posix-env]. |
| 3 | |
| 4 | NOTE: Sadly, at time of writing, the actual parser is in `src/shared/posix.rs`. |
| 5 | This is so it can be shared (via simple code copying) with proc macros like |
| 6 | the one found in `jiff-tzdb-static`. The parser populates a "lowest common |
| 7 | denominator" data type. In normal use in Jiff, this type is converted into |
| 8 | the types defined below. This module still does provide the various time zone |
| 9 | operations. Only the parsing is written elsewhere. |
| 10 | |
| 11 | The `TZ` environment variable is most commonly used to set a time zone. For |
| 12 | example, `TZ=America/New_York`. But it can also be used to tersely define DST |
| 13 | transitions. Moreover, the format is not just used as an environment variable, |
| 14 | but is also included at the end of TZif files (version 2 or greater). The IANA |
| 15 | Time Zone Database project also [documents the `TZ` variable][iana-env] with |
| 16 | a little more commentary. |
| 17 | |
| 18 | Note that we (along with pretty much everyone else) don't strictly follow |
| 19 | POSIX here. Namely, `TZ=America/New_York` isn't a POSIX compatible usage, |
| 20 | and I believe it technically should be `TZ=:America/New_York`. Nevertheless, |
| 21 | apparently some group of people (IANA folks?) decided `TZ=America/New_York` |
| 22 | should be fine. From the [IANA `theory.html` documentation][iana-env]: |
| 23 | |
| 24 | > It was recognized that allowing the TZ environment variable to take on values |
| 25 | > such as 'America/New_York' might cause "old" programs (that expect TZ to have |
| 26 | > a certain form) to operate incorrectly; consideration was given to using |
| 27 | > some other environment variable (for example, TIMEZONE) to hold the string |
| 28 | > used to generate the TZif file's name. In the end, however, it was decided |
| 29 | > to continue using TZ: it is widely used for time zone purposes; separately |
| 30 | > maintaining both TZ and TIMEZONE seemed a nuisance; and systems where "new" |
| 31 | > forms of TZ might cause problems can simply use legacy TZ values such as |
| 32 | > "EST5EDT" which can be used by "new" programs as well as by "old" programs |
| 33 | > that assume pre-POSIX TZ values. |
| 34 | |
| 35 | Indeed, even [musl subscribes to this behavior][musl-env]. So that's what we do |
| 36 | here too. |
| 37 | |
| 38 | Note that a POSIX time zone like `EST5` corresponds to the UTC offset `-05:00`, |
| 39 | and `GMT-4` corresponds to the UTC offset `+04:00`. Yes, it's backwards. How |
| 40 | fun. |
| 41 | |
| 42 | # IANA v3+ Support |
| 43 | |
| 44 | While this module and many of its types are directly associated with POSIX, |
| 45 | this module also plays a supporting role for `TZ` strings in the IANA TZif |
| 46 | binary format for versions 2 and greater. Specifically, for versions 3 and |
| 47 | greater, some minor extensions are supported here via `IanaTz::parse`. But |
| 48 | using `PosixTz::parse` is limited to parsing what is specified by POSIX. |
| 49 | Nevertheless, we generally use `IanaTz::parse` everywhere, even when parsing |
| 50 | the `TZ` environment variable. The reason for this is that it seems to be what |
| 51 | other programs do in practice (for example, GNU date). |
| 52 | |
| 53 | # `no-std` and `no-alloc` support |
| 54 | |
| 55 | A big part of this module works fine in core-only environments. But because |
| 56 | core-only environments provide means of indirection, and embedding a |
| 57 | `PosixTimeZone` into a `TimeZone` without indirection would use up a lot of |
| 58 | space (and thereby make `Zoned` quite chunky), we provide core-only support |
| 59 | principally through a proc macro. Namely, a `PosixTimeZone` can be parsed by |
| 60 | the proc macro and then turned into static data. |
| 61 | |
| 62 | POSIX time zone support isn't explicitly provided directly as a public API |
| 63 | for core-only environments, but is implicitly supported via TZif. (Since TZif |
| 64 | data contains POSIX time zone strings.) |
| 65 | |
| 66 | [posix-env]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 |
| 67 | [iana-env]: https://data.iana.org/time-zones/tzdb-2024a/theory.html#functions |
| 68 | [musl-env]: https://wiki.musl-libc.org/environment-variables |
| 69 | */ |
| 70 | |
| 71 | use crate::{ |
| 72 | civil::DateTime, |
| 73 | error::{err, Error, ErrorContext}, |
| 74 | shared, |
| 75 | timestamp::Timestamp, |
| 76 | tz::{ |
| 77 | timezone::TimeZoneAbbreviation, AmbiguousOffset, Dst, Offset, |
| 78 | TimeZoneOffsetInfo, TimeZoneTransition, |
| 79 | }, |
| 80 | util::{array_str::Abbreviation, escape::Bytes, parse}, |
| 81 | }; |
| 82 | |
| 83 | /// The result of parsing the POSIX `TZ` environment variable. |
| 84 | /// |
| 85 | /// A `TZ` variable can either be a time zone string with an optional DST |
| 86 | /// transition rule, or it can begin with a `:` followed by an arbitrary set of |
| 87 | /// bytes that is implementation defined. |
| 88 | /// |
| 89 | /// In practice, the content following a `:` is treated as an IANA time zone |
| 90 | /// name. Moreover, even if the `TZ` string doesn't start with a `:` but |
| 91 | /// corresponds to a IANA time zone name, then it is interpreted as such. |
| 92 | /// (See the module docs.) However, this type only encapsulates the choices |
| 93 | /// strictly provided by POSIX: either a time zone string with an optional DST |
| 94 | /// transition rule, or an implementation defined string with a `:` prefix. If, |
| 95 | /// for example, `TZ="America/New_York"`, then that case isn't encapsulated by |
| 96 | /// this type. Callers needing that functionality will need to handle the error |
| 97 | /// returned by parsing this type and layer their own semantics on top. |
| 98 | #[cfg (feature = "tz-system" )] |
| 99 | #[derive (Debug, Eq, PartialEq)] |
| 100 | pub(crate) enum PosixTzEnv { |
| 101 | /// A valid POSIX time zone with an optional DST transition rule. |
| 102 | Rule(PosixTimeZoneOwned), |
| 103 | /// An implementation defined string. This occurs when the `TZ` value |
| 104 | /// starts with a `:`. The string returned here does not include the `:`. |
| 105 | Implementation(alloc::boxed::Box<str>), |
| 106 | } |
| 107 | |
| 108 | #[cfg (feature = "tz-system" )] |
| 109 | impl PosixTzEnv { |
| 110 | /// Parse a POSIX `TZ` environment variable string from the given bytes. |
| 111 | fn parse(bytes: impl AsRef<[u8]>) -> Result<PosixTzEnv, Error> { |
| 112 | let bytes = bytes.as_ref(); |
| 113 | if bytes.get(0) == Some(&b':' ) { |
| 114 | let Ok(string) = core::str::from_utf8(&bytes[1..]) else { |
| 115 | return Err(err!( |
| 116 | "POSIX time zone string with a ':' prefix contains \ |
| 117 | invalid UTF-8: {:?}" , |
| 118 | Bytes(&bytes[1..]), |
| 119 | )); |
| 120 | }; |
| 121 | Ok(PosixTzEnv::Implementation(string.into())) |
| 122 | } else { |
| 123 | PosixTimeZone::parse(bytes).map(PosixTzEnv::Rule) |
| 124 | } |
| 125 | } |
| 126 | |
| 127 | /// Parse a POSIX `TZ` environment variable string from the given `OsStr`. |
| 128 | pub(crate) fn parse_os_str( |
| 129 | osstr: impl AsRef<std::ffi::OsStr>, |
| 130 | ) -> Result<PosixTzEnv, Error> { |
| 131 | PosixTzEnv::parse(parse::os_str_bytes(osstr.as_ref())?) |
| 132 | } |
| 133 | } |
| 134 | |
| 135 | #[cfg (feature = "tz-system" )] |
| 136 | impl core::fmt::Display for PosixTzEnv { |
| 137 | fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { |
| 138 | match *self { |
| 139 | PosixTzEnv::Rule(ref tz) => write!(f, "{tz}" ), |
| 140 | PosixTzEnv::Implementation(ref imp) => write!(f, ":{imp}" ), |
| 141 | } |
| 142 | } |
| 143 | } |
| 144 | |
| 145 | /// An owned POSIX time zone. |
| 146 | /// |
| 147 | /// That is, a POSIX time zone whose abbreviations are inlined into the |
| 148 | /// representation. As opposed to a static POSIX time zone whose abbreviations |
| 149 | /// are `&'static str`. |
| 150 | pub(crate) type PosixTimeZoneOwned = PosixTimeZone<Abbreviation>; |
| 151 | |
| 152 | /// An owned POSIX time zone whose abbreviations are `&'static str`. |
| 153 | pub(crate) type PosixTimeZoneStatic = PosixTimeZone<&'static str>; |
| 154 | |
| 155 | /// A POSIX time zone. |
| 156 | /// |
| 157 | /// # On "reasonable" POSIX time zones |
| 158 | /// |
| 159 | /// Jiff only supports "reasonable" POSIX time zones. A "reasonable" POSIX time |
| 160 | /// zone is a POSIX time zone that has a DST transition rule _when_ it has a |
| 161 | /// DST time zone abbreviation. Without the transition rule, it isn't possible |
| 162 | /// to know when DST starts and stops. |
| 163 | /// |
| 164 | /// POSIX technically allows a DST time zone abbreviation *without* a |
| 165 | /// transition rule, but the behavior is literally unspecified. So Jiff just |
| 166 | /// rejects them. |
| 167 | /// |
| 168 | /// Note that if you're confused as to why Jiff accepts `TZ=EST5EDT` (where |
| 169 | /// `EST5EDT` is an example of an _unreasonable_ POSIX time zone), that's |
| 170 | /// because Jiff rejects `EST5EDT` and instead attempts to use it as an IANA |
| 171 | /// time zone identifier. And indeed, the IANA Time Zone Database contains an |
| 172 | /// entry for `EST5EDT` (presumably for legacy reasons). |
| 173 | /// |
| 174 | /// Also, we expect `TZ` strings parsed from IANA v2+ formatted `tzfile`s to |
| 175 | /// also be reasonable or parsing fails. This also seems to be consistent with |
| 176 | /// the [GNU C Library]'s treatment of the `TZ` variable: it only documents |
| 177 | /// support for reasonable POSIX time zone strings. |
| 178 | /// |
| 179 | /// Note that a V2 `TZ` string is precisely identical to a POSIX `TZ` |
| 180 | /// environment variable string. A V3 `TZ` string however supports signed DST |
| 181 | /// transition times, and hours in the range `0..=167`. The V2 and V3 here |
| 182 | /// reference how `TZ` strings are defined in the TZif format specified by |
| 183 | /// [RFC 9636]. V2 is the original version of it straight from POSIX, where as |
| 184 | /// V3+ corresponds to an extension added to V3 (and newer versions) of the |
| 185 | /// TZif format. V3 is a superset of V2, so in practice, Jiff just permits |
| 186 | /// V3 everywhere. |
| 187 | /// |
| 188 | /// [GNU C Library]: https://www.gnu.org/software/libc/manual/2.25/html_node/TZ-Variable.html |
| 189 | /// [RFC 9636]: https://datatracker.ietf.org/doc/rfc9636/ |
| 190 | #[derive (Clone, Debug, Eq, PartialEq)] |
| 191 | // NOT part of Jiff's public API |
| 192 | #[doc (hidden)] |
| 193 | // This ensures the alignment of this type is always *at least* 8 bytes. This |
| 194 | // is required for the pointer tagging inside of `TimeZone` to be sound. At |
| 195 | // time of writing (2024-02-24), this explicit `repr` isn't required on 64-bit |
| 196 | // systems since the type definition is such that it will have an alignment of |
| 197 | // at least 8 bytes anyway. But this *is* required for 32-bit systems, where |
| 198 | // the type definition at present only has an alignment of 4 bytes. |
| 199 | #[repr (align(8))] |
| 200 | pub struct PosixTimeZone<ABBREV> { |
| 201 | inner: shared::PosixTimeZone<ABBREV>, |
| 202 | } |
| 203 | |
| 204 | impl PosixTimeZone<Abbreviation> { |
| 205 | /// Parse a IANA tzfile v3+ `TZ` string from the given bytes. |
| 206 | #[cfg (feature = "alloc" )] |
| 207 | pub(crate) fn parse( |
| 208 | bytes: impl AsRef<[u8]>, |
| 209 | ) -> Result<PosixTimeZoneOwned, Error> { |
| 210 | let bytes = bytes.as_ref(); |
| 211 | let inner = shared::PosixTimeZone::parse(bytes.as_ref()) |
| 212 | .map_err(Error::shared) |
| 213 | .map_err(|e| { |
| 214 | e.context(err!("invalid POSIX TZ string {:?}" , Bytes(bytes))) |
| 215 | })?; |
| 216 | Ok(PosixTimeZone { inner }) |
| 217 | } |
| 218 | |
| 219 | /// Like `parse`, but parses a POSIX TZ string from a prefix of the |
| 220 | /// given input. And remaining input is returned. |
| 221 | #[cfg (feature = "alloc" )] |
| 222 | pub(crate) fn parse_prefix<'b, B: AsRef<[u8]> + ?Sized + 'b>( |
| 223 | bytes: &'b B, |
| 224 | ) -> Result<(PosixTimeZoneOwned, &'b [u8]), Error> { |
| 225 | let bytes = bytes.as_ref(); |
| 226 | let (inner, remaining) = |
| 227 | shared::PosixTimeZone::parse_prefix(bytes.as_ref()) |
| 228 | .map_err(Error::shared) |
| 229 | .map_err(|e| { |
| 230 | e.context(err!( |
| 231 | "invalid POSIX TZ string {:?}" , |
| 232 | Bytes(bytes) |
| 233 | )) |
| 234 | })?; |
| 235 | Ok((PosixTimeZone { inner }, remaining)) |
| 236 | } |
| 237 | |
| 238 | /// Converts from the shared-but-internal API for use in proc macros. |
| 239 | #[cfg (feature = "alloc" )] |
| 240 | pub(crate) fn from_shared_owned( |
| 241 | sh: shared::PosixTimeZone<Abbreviation>, |
| 242 | ) -> PosixTimeZoneOwned { |
| 243 | PosixTimeZone { inner: sh } |
| 244 | } |
| 245 | } |
| 246 | |
| 247 | impl PosixTimeZone<&'static str> { |
| 248 | /// Converts from the shared-but-internal API for use in proc macros. |
| 249 | /// |
| 250 | /// This works in a `const` context by requiring that the time zone |
| 251 | /// abbreviations are `static` strings. This is used when converting |
| 252 | /// code generated by a proc macro to this Jiff internal type. |
| 253 | pub(crate) const fn from_shared_const( |
| 254 | sh: shared::PosixTimeZone<&'static str>, |
| 255 | ) -> PosixTimeZoneStatic { |
| 256 | PosixTimeZone { inner: sh } |
| 257 | } |
| 258 | } |
| 259 | |
| 260 | impl<ABBREV: AsRef<str>> PosixTimeZone<ABBREV> { |
| 261 | /// Returns the appropriate time zone offset to use for the given |
| 262 | /// timestamp. |
| 263 | /// |
| 264 | /// If you need information like whether the offset is in DST or not, or |
| 265 | /// the time zone abbreviation, then use `PosixTimeZone::to_offset_info`. |
| 266 | /// But that API may be more expensive to use, so only use it if you need |
| 267 | /// the additional data. |
| 268 | pub(crate) fn to_offset(&self, timestamp: Timestamp) -> Offset { |
| 269 | Offset::from_ioffset_const( |
| 270 | self.inner.to_offset(timestamp.to_itimestamp_const()), |
| 271 | ) |
| 272 | } |
| 273 | |
| 274 | /// Returns the appropriate time zone offset to use for the given |
| 275 | /// timestamp. |
| 276 | /// |
| 277 | /// This also includes whether the offset returned should be considered |
| 278 | /// to be "DST" or not, along with the time zone abbreviation (e.g., EST |
| 279 | /// for standard time in New York, and EDT for DST in New York). |
| 280 | pub(crate) fn to_offset_info( |
| 281 | &self, |
| 282 | timestamp: Timestamp, |
| 283 | ) -> TimeZoneOffsetInfo<'_> { |
| 284 | let (ioff, abbrev, is_dst) = |
| 285 | self.inner.to_offset_info(timestamp.to_itimestamp_const()); |
| 286 | let offset = Offset::from_ioffset_const(ioff); |
| 287 | let abbreviation = TimeZoneAbbreviation::Borrowed(abbrev); |
| 288 | TimeZoneOffsetInfo { offset, dst: Dst::from(is_dst), abbreviation } |
| 289 | } |
| 290 | |
| 291 | /// Returns a possibly ambiguous timestamp for the given civil datetime. |
| 292 | /// |
| 293 | /// The given datetime should correspond to the "wall" clock time of what |
| 294 | /// humans use to tell time for this time zone. |
| 295 | /// |
| 296 | /// Note that "ambiguous timestamp" is represented by the possible |
| 297 | /// selection of offsets that could be applied to the given datetime. In |
| 298 | /// general, it is only ambiguous around transitions to-and-from DST. The |
| 299 | /// ambiguity can arise as a "fold" (when a particular wall clock time is |
| 300 | /// repeated) or as a "gap" (when a particular wall clock time is skipped |
| 301 | /// entirely). |
| 302 | pub(crate) fn to_ambiguous_kind(&self, dt: DateTime) -> AmbiguousOffset { |
| 303 | let iamoff = self.inner.to_ambiguous_kind(dt.to_idatetime_const()); |
| 304 | AmbiguousOffset::from_iambiguous_offset_const(iamoff) |
| 305 | } |
| 306 | |
| 307 | /// Returns the timestamp of the most recent time zone transition prior |
| 308 | /// to the timestamp given. If one doesn't exist, `None` is returned. |
| 309 | pub(crate) fn previous_transition( |
| 310 | &self, |
| 311 | timestamp: Timestamp, |
| 312 | ) -> Option<TimeZoneTransition> { |
| 313 | let (its, ioff, abbrev, is_dst) = |
| 314 | self.inner.previous_transition(timestamp.to_itimestamp_const())?; |
| 315 | let timestamp = Timestamp::from_itimestamp_const(its); |
| 316 | let offset = Offset::from_ioffset_const(ioff); |
| 317 | let dst = Dst::from(is_dst); |
| 318 | Some(TimeZoneTransition { timestamp, offset, abbrev, dst }) |
| 319 | } |
| 320 | |
| 321 | /// Returns the timestamp of the soonest time zone transition after the |
| 322 | /// timestamp given. If one doesn't exist, `None` is returned. |
| 323 | pub(crate) fn next_transition( |
| 324 | &self, |
| 325 | timestamp: Timestamp, |
| 326 | ) -> Option<TimeZoneTransition> { |
| 327 | let (its, ioff, abbrev, is_dst) = |
| 328 | self.inner.next_transition(timestamp.to_itimestamp_const())?; |
| 329 | let timestamp = Timestamp::from_itimestamp_const(its); |
| 330 | let offset = Offset::from_ioffset_const(ioff); |
| 331 | let dst = Dst::from(is_dst); |
| 332 | Some(TimeZoneTransition { timestamp, offset, abbrev, dst }) |
| 333 | } |
| 334 | } |
| 335 | |
| 336 | impl<ABBREV: AsRef<str>> core::fmt::Display for PosixTimeZone<ABBREV> { |
| 337 | fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { |
| 338 | core::fmt::Display::fmt(&self.inner, f) |
| 339 | } |
| 340 | } |
| 341 | |
| 342 | // The tests below require parsing which requires alloc. |
| 343 | #[cfg (feature = "alloc" )] |
| 344 | #[cfg (test)] |
| 345 | mod tests { |
| 346 | use super::*; |
| 347 | |
| 348 | #[cfg (feature = "tz-system" )] |
| 349 | #[test ] |
| 350 | fn parse_posix_tz() { |
| 351 | // We used to parse this and then error when we tried to |
| 352 | // convert to a "reasonable" POSIX time zone with a DST |
| 353 | // transition rule. We never actually used unreasonable POSIX |
| 354 | // time zones and it was complicating the type definitions, so |
| 355 | // now we just reject it outright. |
| 356 | assert!(PosixTzEnv::parse("EST5EDT" ).is_err()); |
| 357 | |
| 358 | let tz = PosixTzEnv::parse(":EST5EDT" ).unwrap(); |
| 359 | assert_eq!(tz, PosixTzEnv::Implementation("EST5EDT" .into())); |
| 360 | |
| 361 | // We require implementation strings to be UTF-8, because we're |
| 362 | // sensible. |
| 363 | assert!(PosixTzEnv::parse(b":EST5 \xFFEDT" ).is_err()); |
| 364 | } |
| 365 | } |
| 366 | |