1 | /*! |
2 | Provides a parser for [POSIX's `TZ` environment variable][posix-env]. |
3 | |
4 | NOTE: Sadly, at time of writing, the actual parser is in `src/shared/posix.rs`. |
5 | This is so it can be shared (via simple code copying) with proc macros like |
6 | the one found in `jiff-tzdb-static`. The parser populates a "lowest common |
7 | denominator" data type. In normal use in Jiff, this type is converted into |
8 | the types defined below. This module still does provide the various time zone |
9 | operations. Only the parsing is written elsewhere. |
10 | |
11 | The `TZ` environment variable is most commonly used to set a time zone. For |
12 | example, `TZ=America/New_York`. But it can also be used to tersely define DST |
13 | transitions. Moreover, the format is not just used as an environment variable, |
14 | but is also included at the end of TZif files (version 2 or greater). The IANA |
15 | Time Zone Database project also [documents the `TZ` variable][iana-env] with |
16 | a little more commentary. |
17 | |
18 | Note that we (along with pretty much everyone else) don't strictly follow |
19 | POSIX here. Namely, `TZ=America/New_York` isn't a POSIX compatible usage, |
20 | and I believe it technically should be `TZ=:America/New_York`. Nevertheless, |
21 | apparently some group of people (IANA folks?) decided `TZ=America/New_York` |
22 | should be fine. From the [IANA `theory.html` documentation][iana-env]: |
23 | |
24 | > It was recognized that allowing the TZ environment variable to take on values |
25 | > such as 'America/New_York' might cause "old" programs (that expect TZ to have |
26 | > a certain form) to operate incorrectly; consideration was given to using |
27 | > some other environment variable (for example, TIMEZONE) to hold the string |
28 | > used to generate the TZif file's name. In the end, however, it was decided |
29 | > to continue using TZ: it is widely used for time zone purposes; separately |
30 | > maintaining both TZ and TIMEZONE seemed a nuisance; and systems where "new" |
31 | > forms of TZ might cause problems can simply use legacy TZ values such as |
32 | > "EST5EDT" which can be used by "new" programs as well as by "old" programs |
33 | > that assume pre-POSIX TZ values. |
34 | |
35 | Indeed, even [musl subscribes to this behavior][musl-env]. So that's what we do |
36 | here too. |
37 | |
38 | Note that a POSIX time zone like `EST5` corresponds to the UTC offset `-05:00`, |
39 | and `GMT-4` corresponds to the UTC offset `+04:00`. Yes, it's backwards. How |
40 | fun. |
41 | |
42 | # IANA v3+ Support |
43 | |
44 | While this module and many of its types are directly associated with POSIX, |
45 | this module also plays a supporting role for `TZ` strings in the IANA TZif |
46 | binary format for versions 2 and greater. Specifically, for versions 3 and |
47 | greater, some minor extensions are supported here via `IanaTz::parse`. But |
48 | using `PosixTz::parse` is limited to parsing what is specified by POSIX. |
49 | Nevertheless, we generally use `IanaTz::parse` everywhere, even when parsing |
50 | the `TZ` environment variable. The reason for this is that it seems to be what |
51 | other programs do in practice (for example, GNU date). |
52 | |
53 | # `no-std` and `no-alloc` support |
54 | |
55 | A big part of this module works fine in core-only environments. But because |
56 | core-only environments provide means of indirection, and embedding a |
57 | `PosixTimeZone` into a `TimeZone` without indirection would use up a lot of |
58 | space (and thereby make `Zoned` quite chunky), we provide core-only support |
59 | principally through a proc macro. Namely, a `PosixTimeZone` can be parsed by |
60 | the proc macro and then turned into static data. |
61 | |
62 | POSIX time zone support isn't explicitly provided directly as a public API |
63 | for core-only environments, but is implicitly supported via TZif. (Since TZif |
64 | data contains POSIX time zone strings.) |
65 | |
66 | [posix-env]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_03 |
67 | [iana-env]: https://data.iana.org/time-zones/tzdb-2024a/theory.html#functions |
68 | [musl-env]: https://wiki.musl-libc.org/environment-variables |
69 | */ |
70 | |
71 | use crate::{ |
72 | civil::DateTime, |
73 | error::{err, Error, ErrorContext}, |
74 | shared, |
75 | timestamp::Timestamp, |
76 | tz::{ |
77 | timezone::TimeZoneAbbreviation, AmbiguousOffset, Dst, Offset, |
78 | TimeZoneOffsetInfo, TimeZoneTransition, |
79 | }, |
80 | util::{array_str::Abbreviation, escape::Bytes, parse}, |
81 | }; |
82 | |
83 | /// The result of parsing the POSIX `TZ` environment variable. |
84 | /// |
85 | /// A `TZ` variable can either be a time zone string with an optional DST |
86 | /// transition rule, or it can begin with a `:` followed by an arbitrary set of |
87 | /// bytes that is implementation defined. |
88 | /// |
89 | /// In practice, the content following a `:` is treated as an IANA time zone |
90 | /// name. Moreover, even if the `TZ` string doesn't start with a `:` but |
91 | /// corresponds to a IANA time zone name, then it is interpreted as such. |
92 | /// (See the module docs.) However, this type only encapsulates the choices |
93 | /// strictly provided by POSIX: either a time zone string with an optional DST |
94 | /// transition rule, or an implementation defined string with a `:` prefix. If, |
95 | /// for example, `TZ="America/New_York"`, then that case isn't encapsulated by |
96 | /// this type. Callers needing that functionality will need to handle the error |
97 | /// returned by parsing this type and layer their own semantics on top. |
98 | #[cfg (feature = "tz-system" )] |
99 | #[derive (Debug, Eq, PartialEq)] |
100 | pub(crate) enum PosixTzEnv { |
101 | /// A valid POSIX time zone with an optional DST transition rule. |
102 | Rule(PosixTimeZoneOwned), |
103 | /// An implementation defined string. This occurs when the `TZ` value |
104 | /// starts with a `:`. The string returned here does not include the `:`. |
105 | Implementation(alloc::boxed::Box<str>), |
106 | } |
107 | |
108 | #[cfg (feature = "tz-system" )] |
109 | impl PosixTzEnv { |
110 | /// Parse a POSIX `TZ` environment variable string from the given bytes. |
111 | fn parse(bytes: impl AsRef<[u8]>) -> Result<PosixTzEnv, Error> { |
112 | let bytes = bytes.as_ref(); |
113 | if bytes.get(0) == Some(&b':' ) { |
114 | let Ok(string) = core::str::from_utf8(&bytes[1..]) else { |
115 | return Err(err!( |
116 | "POSIX time zone string with a ':' prefix contains \ |
117 | invalid UTF-8: {:?}" , |
118 | Bytes(&bytes[1..]), |
119 | )); |
120 | }; |
121 | Ok(PosixTzEnv::Implementation(string.into())) |
122 | } else { |
123 | PosixTimeZone::parse(bytes).map(PosixTzEnv::Rule) |
124 | } |
125 | } |
126 | |
127 | /// Parse a POSIX `TZ` environment variable string from the given `OsStr`. |
128 | pub(crate) fn parse_os_str( |
129 | osstr: impl AsRef<std::ffi::OsStr>, |
130 | ) -> Result<PosixTzEnv, Error> { |
131 | PosixTzEnv::parse(parse::os_str_bytes(osstr.as_ref())?) |
132 | } |
133 | } |
134 | |
135 | #[cfg (feature = "tz-system" )] |
136 | impl core::fmt::Display for PosixTzEnv { |
137 | fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { |
138 | match *self { |
139 | PosixTzEnv::Rule(ref tz) => write!(f, "{tz}" ), |
140 | PosixTzEnv::Implementation(ref imp) => write!(f, ":{imp}" ), |
141 | } |
142 | } |
143 | } |
144 | |
145 | /// An owned POSIX time zone. |
146 | /// |
147 | /// That is, a POSIX time zone whose abbreviations are inlined into the |
148 | /// representation. As opposed to a static POSIX time zone whose abbreviations |
149 | /// are `&'static str`. |
150 | pub(crate) type PosixTimeZoneOwned = PosixTimeZone<Abbreviation>; |
151 | |
152 | /// An owned POSIX time zone whose abbreviations are `&'static str`. |
153 | pub(crate) type PosixTimeZoneStatic = PosixTimeZone<&'static str>; |
154 | |
155 | /// A POSIX time zone. |
156 | /// |
157 | /// # On "reasonable" POSIX time zones |
158 | /// |
159 | /// Jiff only supports "reasonable" POSIX time zones. A "reasonable" POSIX time |
160 | /// zone is a POSIX time zone that has a DST transition rule _when_ it has a |
161 | /// DST time zone abbreviation. Without the transition rule, it isn't possible |
162 | /// to know when DST starts and stops. |
163 | /// |
164 | /// POSIX technically allows a DST time zone abbreviation *without* a |
165 | /// transition rule, but the behavior is literally unspecified. So Jiff just |
166 | /// rejects them. |
167 | /// |
168 | /// Note that if you're confused as to why Jiff accepts `TZ=EST5EDT` (where |
169 | /// `EST5EDT` is an example of an _unreasonable_ POSIX time zone), that's |
170 | /// because Jiff rejects `EST5EDT` and instead attempts to use it as an IANA |
171 | /// time zone identifier. And indeed, the IANA Time Zone Database contains an |
172 | /// entry for `EST5EDT` (presumably for legacy reasons). |
173 | /// |
174 | /// Also, we expect `TZ` strings parsed from IANA v2+ formatted `tzfile`s to |
175 | /// also be reasonable or parsing fails. This also seems to be consistent with |
176 | /// the [GNU C Library]'s treatment of the `TZ` variable: it only documents |
177 | /// support for reasonable POSIX time zone strings. |
178 | /// |
179 | /// Note that a V2 `TZ` string is precisely identical to a POSIX `TZ` |
180 | /// environment variable string. A V3 `TZ` string however supports signed DST |
181 | /// transition times, and hours in the range `0..=167`. The V2 and V3 here |
182 | /// reference how `TZ` strings are defined in the TZif format specified by |
183 | /// [RFC 9636]. V2 is the original version of it straight from POSIX, where as |
184 | /// V3+ corresponds to an extension added to V3 (and newer versions) of the |
185 | /// TZif format. V3 is a superset of V2, so in practice, Jiff just permits |
186 | /// V3 everywhere. |
187 | /// |
188 | /// [GNU C Library]: https://www.gnu.org/software/libc/manual/2.25/html_node/TZ-Variable.html |
189 | /// [RFC 9636]: https://datatracker.ietf.org/doc/rfc9636/ |
190 | #[derive (Clone, Debug, Eq, PartialEq)] |
191 | // NOT part of Jiff's public API |
192 | #[doc (hidden)] |
193 | // This ensures the alignment of this type is always *at least* 8 bytes. This |
194 | // is required for the pointer tagging inside of `TimeZone` to be sound. At |
195 | // time of writing (2024-02-24), this explicit `repr` isn't required on 64-bit |
196 | // systems since the type definition is such that it will have an alignment of |
197 | // at least 8 bytes anyway. But this *is* required for 32-bit systems, where |
198 | // the type definition at present only has an alignment of 4 bytes. |
199 | #[repr (align(8))] |
200 | pub struct PosixTimeZone<ABBREV> { |
201 | inner: shared::PosixTimeZone<ABBREV>, |
202 | } |
203 | |
204 | impl PosixTimeZone<Abbreviation> { |
205 | /// Parse a IANA tzfile v3+ `TZ` string from the given bytes. |
206 | #[cfg (feature = "alloc" )] |
207 | pub(crate) fn parse( |
208 | bytes: impl AsRef<[u8]>, |
209 | ) -> Result<PosixTimeZoneOwned, Error> { |
210 | let bytes = bytes.as_ref(); |
211 | let inner = shared::PosixTimeZone::parse(bytes.as_ref()) |
212 | .map_err(Error::shared) |
213 | .map_err(|e| { |
214 | e.context(err!("invalid POSIX TZ string {:?}" , Bytes(bytes))) |
215 | })?; |
216 | Ok(PosixTimeZone { inner }) |
217 | } |
218 | |
219 | /// Like `parse`, but parses a POSIX TZ string from a prefix of the |
220 | /// given input. And remaining input is returned. |
221 | #[cfg (feature = "alloc" )] |
222 | pub(crate) fn parse_prefix<'b, B: AsRef<[u8]> + ?Sized + 'b>( |
223 | bytes: &'b B, |
224 | ) -> Result<(PosixTimeZoneOwned, &'b [u8]), Error> { |
225 | let bytes = bytes.as_ref(); |
226 | let (inner, remaining) = |
227 | shared::PosixTimeZone::parse_prefix(bytes.as_ref()) |
228 | .map_err(Error::shared) |
229 | .map_err(|e| { |
230 | e.context(err!( |
231 | "invalid POSIX TZ string {:?}" , |
232 | Bytes(bytes) |
233 | )) |
234 | })?; |
235 | Ok((PosixTimeZone { inner }, remaining)) |
236 | } |
237 | |
238 | /// Converts from the shared-but-internal API for use in proc macros. |
239 | #[cfg (feature = "alloc" )] |
240 | pub(crate) fn from_shared_owned( |
241 | sh: shared::PosixTimeZone<Abbreviation>, |
242 | ) -> PosixTimeZoneOwned { |
243 | PosixTimeZone { inner: sh } |
244 | } |
245 | } |
246 | |
247 | impl PosixTimeZone<&'static str> { |
248 | /// Converts from the shared-but-internal API for use in proc macros. |
249 | /// |
250 | /// This works in a `const` context by requiring that the time zone |
251 | /// abbreviations are `static` strings. This is used when converting |
252 | /// code generated by a proc macro to this Jiff internal type. |
253 | pub(crate) const fn from_shared_const( |
254 | sh: shared::PosixTimeZone<&'static str>, |
255 | ) -> PosixTimeZoneStatic { |
256 | PosixTimeZone { inner: sh } |
257 | } |
258 | } |
259 | |
260 | impl<ABBREV: AsRef<str>> PosixTimeZone<ABBREV> { |
261 | /// Returns the appropriate time zone offset to use for the given |
262 | /// timestamp. |
263 | /// |
264 | /// If you need information like whether the offset is in DST or not, or |
265 | /// the time zone abbreviation, then use `PosixTimeZone::to_offset_info`. |
266 | /// But that API may be more expensive to use, so only use it if you need |
267 | /// the additional data. |
268 | pub(crate) fn to_offset(&self, timestamp: Timestamp) -> Offset { |
269 | Offset::from_ioffset_const( |
270 | self.inner.to_offset(timestamp.to_itimestamp_const()), |
271 | ) |
272 | } |
273 | |
274 | /// Returns the appropriate time zone offset to use for the given |
275 | /// timestamp. |
276 | /// |
277 | /// This also includes whether the offset returned should be considered |
278 | /// to be "DST" or not, along with the time zone abbreviation (e.g., EST |
279 | /// for standard time in New York, and EDT for DST in New York). |
280 | pub(crate) fn to_offset_info( |
281 | &self, |
282 | timestamp: Timestamp, |
283 | ) -> TimeZoneOffsetInfo<'_> { |
284 | let (ioff, abbrev, is_dst) = |
285 | self.inner.to_offset_info(timestamp.to_itimestamp_const()); |
286 | let offset = Offset::from_ioffset_const(ioff); |
287 | let abbreviation = TimeZoneAbbreviation::Borrowed(abbrev); |
288 | TimeZoneOffsetInfo { offset, dst: Dst::from(is_dst), abbreviation } |
289 | } |
290 | |
291 | /// Returns a possibly ambiguous timestamp for the given civil datetime. |
292 | /// |
293 | /// The given datetime should correspond to the "wall" clock time of what |
294 | /// humans use to tell time for this time zone. |
295 | /// |
296 | /// Note that "ambiguous timestamp" is represented by the possible |
297 | /// selection of offsets that could be applied to the given datetime. In |
298 | /// general, it is only ambiguous around transitions to-and-from DST. The |
299 | /// ambiguity can arise as a "fold" (when a particular wall clock time is |
300 | /// repeated) or as a "gap" (when a particular wall clock time is skipped |
301 | /// entirely). |
302 | pub(crate) fn to_ambiguous_kind(&self, dt: DateTime) -> AmbiguousOffset { |
303 | let iamoff = self.inner.to_ambiguous_kind(dt.to_idatetime_const()); |
304 | AmbiguousOffset::from_iambiguous_offset_const(iamoff) |
305 | } |
306 | |
307 | /// Returns the timestamp of the most recent time zone transition prior |
308 | /// to the timestamp given. If one doesn't exist, `None` is returned. |
309 | pub(crate) fn previous_transition( |
310 | &self, |
311 | timestamp: Timestamp, |
312 | ) -> Option<TimeZoneTransition> { |
313 | let (its, ioff, abbrev, is_dst) = |
314 | self.inner.previous_transition(timestamp.to_itimestamp_const())?; |
315 | let timestamp = Timestamp::from_itimestamp_const(its); |
316 | let offset = Offset::from_ioffset_const(ioff); |
317 | let dst = Dst::from(is_dst); |
318 | Some(TimeZoneTransition { timestamp, offset, abbrev, dst }) |
319 | } |
320 | |
321 | /// Returns the timestamp of the soonest time zone transition after the |
322 | /// timestamp given. If one doesn't exist, `None` is returned. |
323 | pub(crate) fn next_transition( |
324 | &self, |
325 | timestamp: Timestamp, |
326 | ) -> Option<TimeZoneTransition> { |
327 | let (its, ioff, abbrev, is_dst) = |
328 | self.inner.next_transition(timestamp.to_itimestamp_const())?; |
329 | let timestamp = Timestamp::from_itimestamp_const(its); |
330 | let offset = Offset::from_ioffset_const(ioff); |
331 | let dst = Dst::from(is_dst); |
332 | Some(TimeZoneTransition { timestamp, offset, abbrev, dst }) |
333 | } |
334 | } |
335 | |
336 | impl<ABBREV: AsRef<str>> core::fmt::Display for PosixTimeZone<ABBREV> { |
337 | fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result { |
338 | core::fmt::Display::fmt(&self.inner, f) |
339 | } |
340 | } |
341 | |
342 | // The tests below require parsing which requires alloc. |
343 | #[cfg (feature = "alloc" )] |
344 | #[cfg (test)] |
345 | mod tests { |
346 | use super::*; |
347 | |
348 | #[cfg (feature = "tz-system" )] |
349 | #[test ] |
350 | fn parse_posix_tz() { |
351 | // We used to parse this and then error when we tried to |
352 | // convert to a "reasonable" POSIX time zone with a DST |
353 | // transition rule. We never actually used unreasonable POSIX |
354 | // time zones and it was complicating the type definitions, so |
355 | // now we just reject it outright. |
356 | assert!(PosixTzEnv::parse("EST5EDT" ).is_err()); |
357 | |
358 | let tz = PosixTzEnv::parse(":EST5EDT" ).unwrap(); |
359 | assert_eq!(tz, PosixTzEnv::Implementation("EST5EDT" .into())); |
360 | |
361 | // We require implementation strings to be UTF-8, because we're |
362 | // sensible. |
363 | assert!(PosixTzEnv::parse(b":EST5 \xFFEDT" ).is_err()); |
364 | } |
365 | } |
366 | |