1// Copyright (C) 2016 The Qt Company Ltd.
2// Copyright (C) 2016 Intel Corporation.
3// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only
4
5/*!
6 \class QUrl
7 \inmodule QtCore
8
9 \brief The QUrl class provides a convenient interface for working
10 with URLs.
11
12 \reentrant
13 \ingroup io
14 \ingroup network
15 \ingroup shared
16
17 \compares weak
18
19 It can parse and construct URLs in both encoded and unencoded
20 form. QUrl also has support for internationalized domain names
21 (IDNs).
22
23 The most common way to use QUrl is to initialize it via the constructor by
24 passing a QString containing a full URL. QUrl objects can also be created
25 from a QByteArray containing a full URL using QUrl::fromEncoded(), or
26 heuristically from incomplete URLs using QUrl::fromUserInput(). The URL
27 representation can be obtained from a QUrl using either QUrl::toString() or
28 QUrl::toEncoded().
29
30 URLs can be represented in two forms: encoded or unencoded. The
31 unencoded representation is suitable for showing to users, but
32 the encoded representation is typically what you would send to
33 a web server. For example, the unencoded URL
34 "http://bühler.example.com/List of applicants.xml"
35 would be sent to the server as
36 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
37
38 A URL can also be constructed piece by piece by calling
39 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
40 setPath(), setQuery() and setFragment(). Some convenience
41 functions are also available: setAuthority() sets the user name,
42 password, host and port. setUserInfo() sets the user name and
43 password at once.
44
45 Call isValid() to check if the URL is valid. This can be done at any point
46 during the constructing of a URL. If isValid() returns \c false, you should
47 clear() the URL before proceeding, or start over by parsing a new URL with
48 setUrl().
49
50 Constructing a query is particularly convenient through the use of the \l
51 QUrlQuery class and its methods QUrlQuery::setQueryItems(),
52 QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use
53 QUrlQuery::setQueryDelimiters() to customize the delimiters used for
54 generating the query string.
55
56 For the convenience of generating encoded URL strings or query
57 strings, there are two static functions called
58 fromPercentEncoding() and toPercentEncoding() which deal with
59 percent encoding and decoding of QString objects.
60
61 fromLocalFile() constructs a QUrl by parsing a local
62 file path. toLocalFile() converts a URL to a local file path.
63
64 The human readable representation of the URL is fetched with
65 toString(). This representation is appropriate for displaying a
66 URL to a user in unencoded form. The encoded form however, as
67 returned by toEncoded(), is for internal use, passing to web
68 servers, mail clients and so on. Both forms are technically correct
69 and represent the same URL unambiguously -- in fact, passing either
70 form to QUrl's constructor or to setUrl() will yield the same QUrl
71 object.
72
73 QUrl conforms to the URI specification from
74 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
75 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
76 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
77 Profile for Internationalized Domain Names (IDN)). It is also compatible with the
78 \l{http://freedesktop.org/wiki/Specifications/file-uri-spec/}{file URI specification}
79 from freedesktop.org, provided that the locale encodes file names using
80 UTF-8 (required by IDN).
81
82 \section2 Relative URLs vs Relative Paths
83
84 Calling isRelative() will return whether or not the URL is relative.
85 A relative URL has no \l {scheme}. For example:
86
87 \snippet code/src_corelib_io_qurl.cpp 8
88
89 Notice that a URL can be absolute while containing a relative path, and
90 vice versa:
91
92 \snippet code/src_corelib_io_qurl.cpp 9
93
94 A relative URL can be resolved by passing it as an argument to resolved(),
95 which returns an absolute URL. isParentOf() is used for determining whether
96 one URL is a parent of another.
97
98 \section2 Error checking
99
100 QUrl is capable of detecting many errors in URLs while parsing it or when
101 components of the URL are set with individual setter methods (like
102 setScheme(), setHost() or setPath()). If the parsing or setter function is
103 successful, any previously recorded error conditions will be discarded.
104
105 By default, QUrl setter methods operate in QUrl::TolerantMode, which means
106 they accept some common mistakes and mis-representation of data. An
107 alternate method of parsing is QUrl::StrictMode, which applies further
108 checks. See QUrl::ParsingMode for a description of the difference of the
109 parsing modes.
110
111 QUrl only checks for conformance with the URL specification. It does not
112 try to verify that high-level protocol URLs are in the format they are
113 expected to be by handlers elsewhere. For example, the following URIs are
114 all considered valid by QUrl, even if they do not make sense when used:
115
116 \list
117 \li "http:/filename.html"
118 \li "mailto://example.com"
119 \endlist
120
121 When the parser encounters an error, it signals the event by making
122 isValid() return false and toString() / toEncoded() return an empty string.
123 If it is necessary to show the user the reason why the URL failed to parse,
124 the error condition can be obtained from QUrl by calling errorString().
125 Note that this message is highly technical and may not make sense to
126 end-users.
127
128 QUrl is capable of recording only one error condition. If more than one
129 error is found, it is undefined which error is reported.
130
131 \section2 Character Conversions
132
133 Follow these rules to avoid erroneous character conversion when
134 dealing with URLs and strings:
135
136 \list
137 \li When creating a QString to contain a URL from a QByteArray or a
138 char*, always use QString::fromUtf8().
139 \endlist
140*/
141
142/*!
143 \enum QUrl::ParsingMode
144
145 The parsing mode controls the way QUrl parses strings.
146
147 \value TolerantMode QUrl will try to correct some common errors in URLs.
148 This mode is useful for parsing URLs coming from sources
149 not known to be strictly standards-conforming.
150
151 \value StrictMode Only valid URLs are accepted. This mode is useful for
152 general URL validation.
153
154 \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
155 where percent characters stand for themselves, not as the beginning
156 of a percent-encoded sequence. This mode is only valid for the
157 setters setting components of a URL; it is not permitted in
158 the QUrl constructor, in fromEncoded() or in setUrl().
159 For more information on this mode, see the documentation for
160 \l {QUrl::ComponentFormattingOption}{QUrl::FullyDecoded}.
161
162 In TolerantMode, the parser has the following behaviour:
163
164 \list
165
166 \li Spaces and "%20": unencoded space characters will be accepted and will
167 be treated as equivalent to "%20".
168
169 \li Single "%" characters: Any occurrences of a percent character "%" not
170 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
171 will be replaced by "%25". Note that one lone "%" character will trigger
172 the correction mode for all percent characters.
173
174 \li Reserved and unreserved characters: An encoded URL should only
175 contain a few characters as literals; all other characters should
176 be percent-encoded. In TolerantMode, these characters will be
177 accepted if they are found in the URL:
178 space / double-quote / "<" / ">" / "\" /
179 "^" / "`" / "{" / "|" / "}"
180 Those same characters can be decoded again by passing QUrl::DecodeReserved
181 to toString() or toEncoded(). In the getters of individual components,
182 those characters are often returned in decoded form.
183
184 \endlist
185
186 When in StrictMode, if a parsing error is found, isValid() will return \c
187 false and errorString() will return a message describing the error.
188 If more than one error is detected, it is undefined which error gets
189 reported.
190
191 Note that TolerantMode is not usually enough for parsing user input, which
192 often contains more errors and expectations than the parser can deal with.
193 When dealing with data coming directly from the user -- as opposed to data
194 coming from data-transfer sources, such as other programs -- it is
195 recommended to use fromUserInput().
196
197 \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
198*/
199
200/*!
201 \enum QUrl::UrlFormattingOption
202
203 The formatting options define how the URL is formatted when written out
204 as text.
205
206 \value None The format of the URL is unchanged.
207 \value RemoveScheme The scheme is removed from the URL.
208 \value RemovePassword Any password in the URL is removed.
209 \value RemoveUserInfo Any user information in the URL is removed.
210 \value RemovePort Any specified port is removed from the URL.
211 \value RemoveAuthority
212 \value RemovePath The URL's path is removed, leaving only the scheme,
213 host address, and port (if present).
214 \value RemoveQuery The query part of the URL (following a '?' character)
215 is removed.
216 \value RemoveFragment
217 \value RemoveFilename The filename (i.e. everything after the last '/' in the path) is removed.
218 The trailing '/' is kept, unless StripTrailingSlash is set.
219 Only valid if RemovePath is not set.
220 \value PreferLocalFile If the URL is a local file according to isLocalFile()
221 and contains no query or fragment, a local file path is returned.
222 \value StripTrailingSlash The trailing slash is removed from the path, if one is present.
223 \value NormalizePathSegments Modifies the path to remove redundant directory separators,
224 and to resolve "."s and ".."s (as far as possible). For non-local paths, adjacent
225 slashes are preserved.
226
227 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
228 conforms to, require host names to always be converted to lower case,
229 regardless of the Qt::FormattingOptions used.
230
231 The options from QUrl::ComponentFormattingOptions are also possible.
232
233 \sa QUrl::ComponentFormattingOptions
234*/
235
236/*!
237 \enum QUrl::ComponentFormattingOption
238 \since 5.0
239
240 The component formatting options define how the components of an URL will
241 be formatted when written out as text. They can be combined with the
242 options from QUrl::FormattingOptions when used in toString() and
243 toEncoded().
244
245 \value PrettyDecoded The component is returned in a "pretty form", with
246 most percent-encoded characters decoded. The exact
247 behavior of PrettyDecoded varies from component to
248 component and may also change from Qt release to Qt
249 release. This is the default.
250
251 \value EncodeSpaces Leave space characters in their encoded form ("%20").
252
253 \value EncodeUnicode Leave non-US-ASCII characters encoded in their UTF-8
254 percent-encoded form (e.g., "%C3%A9" for the U+00E9
255 codepoint, LATIN SMALL LETTER E WITH ACUTE).
256
257 \value EncodeDelimiters Leave certain delimiters in their encoded form, as
258 would appear in the URL when the full URL is
259 represented as text. The delimiters are affected
260 by this option change from component to component.
261 This flag has no effect in toString() or toEncoded().
262
263 \value EncodeReserved Leave US-ASCII characters not permitted in the URL by
264 the specification in their encoded form. This is the
265 default on toString() and toEncoded().
266
267 \value DecodeReserved Decode the US-ASCII characters that the URL specification
268 does not allow to appear in the URL. This is the
269 default on the getters of individual components.
270
271 \value FullyEncoded Leave all characters in their properly-encoded form,
272 as this component would appear as part of a URL. When
273 used with toString(), this produces a fully-compliant
274 URL in QString form, exactly equal to the result of
275 toEncoded()
276
277 \value FullyDecoded Attempt to decode as much as possible. For individual
278 components of the URL, this decodes every percent
279 encoding sequence, including control characters (U+0000
280 to U+001F) and UTF-8 sequences found in percent-encoded form.
281 Use of this mode may cause data loss, see below for more information.
282
283 The values of EncodeReserved and DecodeReserved should not be used together
284 in one call. The behavior is undefined if that happens. They are provided
285 as separate values because the behavior of the "pretty mode" with regards
286 to reserved characters is different on certain components and specially on
287 the full URL.
288
289 \section2 Full decoding
290
291 The FullyDecoded mode is similar to the behavior of the functions returning
292 QString in Qt 4.x, in that every character represents itself and never has
293 any special meaning. This is true even for the percent character ('%'),
294 which should be interpreted to mean a literal percent, not the beginning of
295 a percent-encoded sequence. The same actual character, in all other
296 decoding modes, is represented by the sequence "%25".
297
298 Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl,
299 care must be taken to use the QUrl::DecodedMode parameter to the setters
300 (like setPath() and setUserName()). Failure to do so may cause
301 re-interpretation of the percent character ('%') as the beginning of a
302 percent-encoded sequence.
303
304 This mode is quite useful when portions of a URL are used in a non-URL
305 context. For example, to extract the username, password or file paths in an
306 FTP client application, the FullyDecoded mode should be used.
307
308 This mode should be used with care, since there are two conditions that
309 cannot be reliably represented in the returned QString. They are:
310
311 \list
312 \li \b{Non-UTF-8 sequences:} URLs may contain sequences of
313 percent-encoded characters that do not form valid UTF-8 sequences. Since
314 URLs need to be decoded using UTF-8, any decoder failure will result in
315 the QString containing one or more replacement characters where the
316 sequence existed.
317
318 \li \b{Encoded delimiters:} URLs are also allowed to make a distinction
319 between a delimiter found in its literal form and its equivalent in
320 percent-encoded form. This is most commonly found in the query, but is
321 permitted in most parts of the URL.
322 \endlist
323
324 The following example illustrates the problem:
325
326 \snippet code/src_corelib_io_qurl.cpp 10
327
328 If the two URLs were used via HTTP GET, the interpretation by the web
329 server would probably be different. In the first case, it would interpret
330 as one parameter, with a key of "q" and value "a+=b&c". In the second
331 case, it would probably interpret as two parameters, one with a key of "q"
332 and value "a =b", and the second with a key "c" and no value.
333
334 \sa QUrl::FormattingOptions
335*/
336
337/*!
338 \enum QUrl::UserInputResolutionOption
339 \since 5.4
340
341 The user input resolution options define how fromUserInput() should
342 interpret strings that could either be a relative path or the short
343 form of a HTTP URL. For instance \c{file.pl} can be either a local file
344 or the URL \c{http://file.pl}.
345
346 \value DefaultResolution The default resolution mechanism is to check
347 whether a local file exists, in the working
348 directory given to fromUserInput, and only
349 return a local path in that case. Otherwise a URL
350 is assumed.
351 \value AssumeLocalFile This option makes fromUserInput() always return
352 a local path unless the input contains a scheme, such as
353 \c{http://file.pl}. This is useful for applications
354 such as text editors, which are able to create
355 the file if it doesn't exist.
356
357 \sa fromUserInput()
358*/
359
360/*!
361 \enum QUrl::AceProcessingOption
362 \since 6.3
363
364 The ACE processing options control the way URLs are transformed to and from
365 ASCII-Compatible Encoding.
366
367 \value IgnoreIDNWhitelist Ignore the IDN whitelist when converting URLs
368 to Unicode.
369 \value AceTransitionalProcessing Use transitional processing described in UTS #46.
370 This allows better compatibility with IDNA 2003
371 specification.
372
373 The default is to use nontransitional processing and to allow non-ASCII
374 characters only inside URLs whose top-level domains are listed in the IDN whitelist.
375
376 \sa toAce(), fromAce(), idnWhitelist()
377*/
378
379/*!
380 \fn QUrl::QUrl(QUrl &&other)
381
382 Move-constructs a QUrl instance, making it point at the same
383 object that \a other was pointing to.
384
385 \since 5.2
386*/
387
388/*!
389 \fn QUrl &QUrl::operator=(QUrl &&other)
390
391 Move-assigns \a other to this QUrl instance.
392
393 \since 5.2
394*/
395
396#include "qurl.h"
397#include "qurl_p.h"
398#include "qplatformdefs.h"
399#include "qstring.h"
400#include "qstringlist.h"
401#include "qdebug.h"
402#include "qhash.h"
403#include "qdatastream.h"
404#include "private/qipaddress_p.h"
405#include "qurlquery.h"
406#include "private/qdir_p.h"
407#include <private/qtools_p.h>
408
409QT_BEGIN_NAMESPACE
410
411using namespace Qt::StringLiterals;
412using namespace QtMiscUtils;
413
414inline static bool isHex(char c)
415{
416 c |= 0x20;
417 return isAsciiDigit(c) || (c >= 'a' && c <= 'f');
418}
419
420static inline QString ftpScheme()
421{
422 return QStringLiteral("ftp");
423}
424
425static inline QString fileScheme()
426{
427 return QStringLiteral("file");
428}
429
430static inline QString webDavScheme()
431{
432 return QStringLiteral("webdavs");
433}
434
435static inline QString webDavSslTag()
436{
437 return QStringLiteral("@SSL");
438}
439
440class QUrlPrivate
441{
442public:
443 enum Section : uchar {
444 Scheme = 0x01,
445 UserName = 0x02,
446 Password = 0x04,
447 UserInfo = UserName | Password,
448 Host = 0x08,
449 Port = 0x10,
450 Authority = UserInfo | Host | Port,
451 Path = 0x20,
452 Hierarchy = Authority | Path,
453 Query = 0x40,
454 Fragment = 0x80,
455 FullUrl = 0xff
456 };
457
458 enum Flags : uchar {
459 IsLocalFile = 0x01
460 };
461
462 enum ErrorCode {
463 // the high byte of the error code matches the Section
464 // the first item in each value must be the generic "Invalid xxx Error"
465 InvalidSchemeError = Scheme << 8,
466
467 InvalidUserNameError = UserName << 8,
468
469 InvalidPasswordError = Password << 8,
470
471 InvalidRegNameError = Host << 8,
472 InvalidIPv4AddressError,
473 InvalidIPv6AddressError,
474 InvalidCharacterInIPv6Error,
475 InvalidIPvFutureError,
476 HostMissingEndBracket,
477
478 InvalidPortError = Port << 8,
479 PortEmptyError,
480
481 InvalidPathError = Path << 8,
482
483 InvalidQueryError = Query << 8,
484
485 InvalidFragmentError = Fragment << 8,
486
487 // the following three cases are only possible in combination with
488 // presence/absence of the path, authority and scheme. See validityError().
489 AuthorityPresentAndPathIsRelative = Authority << 8 | Path << 8 | 0x10000,
490 AuthorityAbsentAndPathIsDoubleSlash,
491 RelativeUrlPathContainsColonBeforeSlash = Scheme << 8 | Authority << 8 | Path << 8 | 0x10000,
492
493 NoError = 0
494 };
495
496 struct Error {
497 QString source;
498 qsizetype position;
499 ErrorCode code;
500 };
501
502 QUrlPrivate();
503 QUrlPrivate(const QUrlPrivate &copy);
504 ~QUrlPrivate();
505
506 void parse(const QString &url, QUrl::ParsingMode parsingMode);
507 bool isEmpty() const
508 { return sectionIsPresent == 0 && port == -1 && path.isEmpty(); }
509
510 std::unique_ptr<Error> cloneError() const;
511 void clearError();
512 void setError(ErrorCode errorCode, const QString &source, qsizetype supplement = -1);
513 ErrorCode validityError(QString *source = nullptr, qsizetype *position = nullptr) const;
514 bool validateComponent(Section section, const QString &input, qsizetype begin, qsizetype end);
515 bool validateComponent(Section section, const QString &input)
516 { return validateComponent(section, input, begin: 0, end: input.size()); }
517
518 // no QString scheme() const;
519 void appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
520 void appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
521 void appendUserName(QString &appendTo, QUrl::FormattingOptions options) const;
522 void appendPassword(QString &appendTo, QUrl::FormattingOptions options) const;
523 void appendHost(QString &appendTo, QUrl::FormattingOptions options) const;
524 void appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
525 void appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
526 void appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
527
528 // the "end" parameters are like STL iterators: they point to one past the last valid element
529 bool setScheme(const QString &value, qsizetype len, bool doSetError);
530 void setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
531 void setUserInfo(const QString &userInfo, qsizetype from, qsizetype end);
532 void setUserName(const QString &value, qsizetype from, qsizetype end);
533 void setPassword(const QString &value, qsizetype from, qsizetype end);
534 bool setHost(const QString &value, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
535 void setPath(const QString &value, qsizetype from, qsizetype end);
536 void setQuery(const QString &value, qsizetype from, qsizetype end);
537 void setFragment(const QString &value, qsizetype from, qsizetype end);
538
539 inline bool hasScheme() const { return sectionIsPresent & Scheme; }
540 inline bool hasAuthority() const { return sectionIsPresent & Authority; }
541 inline bool hasUserInfo() const { return sectionIsPresent & UserInfo; }
542 inline bool hasUserName() const { return sectionIsPresent & UserName; }
543 inline bool hasPassword() const { return sectionIsPresent & Password; }
544 inline bool hasHost() const { return sectionIsPresent & Host; }
545 inline bool hasPort() const { return port != -1; }
546 inline bool hasPath() const { return !path.isEmpty(); }
547 inline bool hasQuery() const { return sectionIsPresent & Query; }
548 inline bool hasFragment() const { return sectionIsPresent & Fragment; }
549
550 inline bool isLocalFile() const { return flags & IsLocalFile; }
551 QString toLocalFile(QUrl::FormattingOptions options) const;
552
553 QString mergePaths(const QString &relativePath) const;
554
555 QAtomicInt ref;
556 int port;
557
558 QString scheme;
559 QString userName;
560 QString password;
561 QString host;
562 QString path;
563 QString query;
564 QString fragment;
565
566 std::unique_ptr<Error> error;
567
568 // not used for:
569 // - Port (port == -1 means absence)
570 // - Path (there's no path delimiter, so we optimize its use out of existence)
571 // Schemes are never supposed to be empty, but we keep the flag anyway
572 uchar sectionIsPresent;
573 uchar flags;
574
575 // 32-bit: 2 bytes tail padding available
576 // 64-bit: 6 bytes tail padding available
577};
578
579inline QUrlPrivate::QUrlPrivate()
580 : ref(1), port(-1),
581 sectionIsPresent(0),
582 flags(0)
583{
584}
585
586inline QUrlPrivate::QUrlPrivate(const QUrlPrivate &copy)
587 : ref(1), port(copy.port),
588 scheme(copy.scheme),
589 userName(copy.userName),
590 password(copy.password),
591 host(copy.host),
592 path(copy.path),
593 query(copy.query),
594 fragment(copy.fragment),
595 error(copy.cloneError()),
596 sectionIsPresent(copy.sectionIsPresent),
597 flags(copy.flags)
598{
599}
600
601inline QUrlPrivate::~QUrlPrivate()
602 = default;
603
604std::unique_ptr<QUrlPrivate::Error> QUrlPrivate::cloneError() const
605{
606 return error ? std::make_unique<Error>(args&: *error) : nullptr;
607}
608
609inline void QUrlPrivate::clearError()
610{
611 error.reset();
612}
613
614inline void QUrlPrivate::setError(ErrorCode errorCode, const QString &source, qsizetype supplement)
615{
616 if (error) {
617 // don't overwrite an error set in a previous section during parsing
618 return;
619 }
620 error = std::make_unique<Error>();
621 error->code = errorCode;
622 error->source = source;
623 error->position = supplement;
624}
625
626// From RFC 3986, Appendix A Collected ABNF for URI
627// URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
628//[...]
629// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
630//
631// authority = [ userinfo "@" ] host [ ":" port ]
632// userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
633// host = IP-literal / IPv4address / reg-name
634// port = *DIGIT
635//[...]
636// reg-name = *( unreserved / pct-encoded / sub-delims )
637//[..]
638// pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
639//
640// query = *( pchar / "/" / "?" )
641//
642// fragment = *( pchar / "/" / "?" )
643//
644// pct-encoded = "%" HEXDIG HEXDIG
645//
646// unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
647// reserved = gen-delims / sub-delims
648// gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
649// sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
650// / "*" / "+" / "," / ";" / "="
651// the path component has a complex ABNF that basically boils down to
652// slash-separated segments of "pchar"
653
654// The above is the strict definition of the URL components and we mostly
655// adhere to it, with few exceptions. QUrl obeys the following behavior:
656// - percent-encoding sequences always use uppercase HEXDIG;
657// - unreserved characters are *always* decoded, no exceptions;
658// - the space character and bytes with the high bit set are controlled by
659// the EncodeSpaces and EncodeUnicode bits;
660// - control characters, the percent sign itself, and bytes with the high
661// bit set that don't form valid UTF-8 sequences are always encoded,
662// except in FullyDecoded mode;
663// - sub-delims are always left alone, except in FullyDecoded mode;
664// - gen-delim change behavior depending on which section of the URL (or
665// the entire URL) we're looking at; see below;
666// - characters not mentioned above, like "<", and ">", are usually
667// decoded in individual sections of the URL, but encoded when the full
668// URL is put together (we can change on subjective definition of
669// "pretty").
670//
671// The behavior for the delimiters bears some explanation. The spec says in
672// section 2.2:
673// URIs that differ in the replacement of a reserved character with its
674// corresponding percent-encoded octet are not equivalent.
675// (note: QUrl API mistakenly uses the "reserved" term, so we will refer to
676// them here as "delimiters").
677//
678// For that reason, we cannot encode delimiters found in decoded form and we
679// cannot decode the ones found in encoded form if that would change the
680// interpretation. Conversely, we *can* perform the transformation if it would
681// not change the interpretation. From the last component of a URL to the first,
682// here are the gen-delims we can unambiguously transform when the field is
683// taken in isolation:
684// - fragment: none, since it's the last
685// - query: "#" is unambiguous
686// - path: "#" and "?" are unambiguous
687// - host: completely special but never ambiguous, see setHost() below.
688// - password: the "#", "?", "/", "[", "]" and "@" characters are unambiguous
689// - username: the "#", "?", "/", "[", "]", "@", and ":" characters are unambiguous
690// - scheme: doesn't accept any delimiter, see setScheme() below.
691//
692// Internally, QUrl stores each component in the format that corresponds to the
693// default mode (PrettyDecoded). It deviates from the "strict" FullyEncoded
694// mode in the following way:
695// - spaces are decoded
696// - valid UTF-8 sequences are decoded
697// - gen-delims that can be unambiguously transformed are decoded
698// - characters controlled by DecodeReserved are often decoded, though this behavior
699// can change depending on the subjective definition of "pretty"
700//
701// Note that the list of gen-delims that we can transform is different for the
702// user info (user name + password) and the authority (user info + host +
703// port).
704
705
706// list the recoding table modifications to be used with the recodeFromUser and
707// appendToUser functions, according to the rules above. Spaces and UTF-8
708// sequences are handled outside the tables.
709
710// the encodedXXX tables are run with the delimiters set to "leave" by default;
711// the decodedXXX tables are run with the delimiters set to "decode" by default
712// (except for the query, which doesn't use these functions)
713
714namespace {
715template <typename T> constexpr ushort decode(T x) noexcept { return ushort(x); }
716template <typename T> constexpr ushort leave(T x) noexcept { return ushort(0x100 | x); }
717template <typename T> constexpr ushort encode(T x) noexcept { return ushort(0x200 | x); }
718}
719
720static const ushort userNameInIsolation[] = {
721 decode(x: ':'), // 0
722 decode(x: '@'), // 1
723 decode(x: ']'), // 2
724 decode(x: '['), // 3
725 decode(x: '/'), // 4
726 decode(x: '?'), // 5
727 decode(x: '#'), // 6
728
729 decode(x: '"'), // 7
730 decode(x: '<'),
731 decode(x: '>'),
732 decode(x: '^'),
733 decode(x: '\\'),
734 decode(x: '|'),
735 decode(x: '{'),
736 decode(x: '}'),
737 0
738};
739static const ushort * const passwordInIsolation = userNameInIsolation + 1;
740static const ushort * const pathInIsolation = userNameInIsolation + 5;
741static const ushort * const queryInIsolation = userNameInIsolation + 6;
742static const ushort * const fragmentInIsolation = userNameInIsolation + 7;
743
744static const ushort userNameInUserInfo[] = {
745 encode(x: ':'), // 0
746 decode(x: '@'), // 1
747 decode(x: ']'), // 2
748 decode(x: '['), // 3
749 decode(x: '/'), // 4
750 decode(x: '?'), // 5
751 decode(x: '#'), // 6
752
753 decode(x: '"'), // 7
754 decode(x: '<'),
755 decode(x: '>'),
756 decode(x: '^'),
757 decode(x: '\\'),
758 decode(x: '|'),
759 decode(x: '{'),
760 decode(x: '}'),
761 0
762};
763static const ushort * const passwordInUserInfo = userNameInUserInfo + 1;
764
765static const ushort userNameInAuthority[] = {
766 encode(x: ':'), // 0
767 encode(x: '@'), // 1
768 encode(x: ']'), // 2
769 encode(x: '['), // 3
770 decode(x: '/'), // 4
771 decode(x: '?'), // 5
772 decode(x: '#'), // 6
773
774 decode(x: '"'), // 7
775 decode(x: '<'),
776 decode(x: '>'),
777 decode(x: '^'),
778 decode(x: '\\'),
779 decode(x: '|'),
780 decode(x: '{'),
781 decode(x: '}'),
782 0
783};
784static const ushort * const passwordInAuthority = userNameInAuthority + 1;
785
786static const ushort userNameInUrl[] = {
787 encode(x: ':'), // 0
788 encode(x: '@'), // 1
789 encode(x: ']'), // 2
790 encode(x: '['), // 3
791 encode(x: '/'), // 4
792 encode(x: '?'), // 5
793 encode(x: '#'), // 6
794
795 // no need to list encode(x) for the other characters
796 0
797};
798static const ushort * const passwordInUrl = userNameInUrl + 1;
799static const ushort * const pathInUrl = userNameInUrl + 5;
800static const ushort * const queryInUrl = userNameInUrl + 6;
801static const ushort * const fragmentInUrl = userNameInUrl + 6;
802
803static inline void parseDecodedComponent(QString &data)
804{
805 data.replace(c: u'%', after: "%25"_L1);
806}
807
808static inline QString
809recodeFromUser(const QString &input, const ushort *actions, qsizetype from, qsizetype to)
810{
811 QString output;
812 const QChar *begin = input.constData() + from;
813 const QChar *end = input.constData() + to;
814 if (qt_urlRecode(appendTo&: output, url: QStringView{begin, end}, encoding: {}, tableModifications: actions))
815 return output;
816
817 return input.mid(position: from, n: to - from);
818}
819
820// appendXXXX functions: copy from the internal form to the external, user form.
821// the internal value is stored in its PrettyDecoded form, so that case is easy.
822static inline void appendToUser(QString &appendTo, QStringView value, QUrl::FormattingOptions options,
823 const ushort *actions)
824{
825 // The stored value is already QUrl::PrettyDecoded, so there's nothing to
826 // do if that's what the user asked for (test only
827 // ComponentFormattingOptions, ignore FormattingOptions).
828 if ((options & 0xFFFF0000) == QUrl::PrettyDecoded ||
829 !qt_urlRecode(appendTo, url: value, encoding: options, tableModifications: actions))
830 appendTo += value;
831
832 // copy nullness, if necessary, because QString::operator+=(QStringView) doesn't
833 if (appendTo.isNull() && !value.isNull())
834 appendTo.detach();
835}
836
837inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
838{
839 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
840 appendUserInfo(appendTo, options, appendingTo);
841
842 // add '@' only if we added anything
843 if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
844 appendTo += u'@';
845 }
846 appendHost(appendTo, options);
847 if (!(options & QUrl::RemovePort) && port != -1)
848 appendTo += u':' + QString::number(port);
849}
850
851inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
852{
853 if (Q_LIKELY(!hasUserInfo()))
854 return;
855
856 const ushort *userNameActions;
857 const ushort *passwordActions;
858 if (options & QUrl::EncodeDelimiters) {
859 userNameActions = userNameInUrl;
860 passwordActions = passwordInUrl;
861 } else {
862 switch (appendingTo) {
863 case UserInfo:
864 userNameActions = userNameInUserInfo;
865 passwordActions = passwordInUserInfo;
866 break;
867
868 case Authority:
869 userNameActions = userNameInAuthority;
870 passwordActions = passwordInAuthority;
871 break;
872
873 case FullUrl:
874 userNameActions = userNameInUrl;
875 passwordActions = passwordInUrl;
876 break;
877
878 default:
879 // can't happen
880 Q_UNREACHABLE();
881 break;
882 }
883 }
884
885 if (!qt_urlRecode(appendTo, url: userName, encoding: options, tableModifications: userNameActions))
886 appendTo += userName;
887 if (options & QUrl::RemovePassword || !hasPassword()) {
888 return;
889 } else {
890 appendTo += u':';
891 if (!qt_urlRecode(appendTo, url: password, encoding: options, tableModifications: passwordActions))
892 appendTo += password;
893 }
894}
895
896inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
897{
898 // only called from QUrl::userName()
899 appendToUser(appendTo, value: userName, options,
900 actions: options & QUrl::EncodeDelimiters ? userNameInUrl : userNameInIsolation);
901}
902
903inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
904{
905 // only called from QUrl::password()
906 appendToUser(appendTo, value: password, options,
907 actions: options & QUrl::EncodeDelimiters ? passwordInUrl : passwordInIsolation);
908}
909
910inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
911{
912 QString thePath = path;
913 if (options & QUrl::NormalizePathSegments) {
914 qt_normalizePathSegments(
915 path: &thePath,
916 flags: isLocalFile() ? QDirPrivate::KeepLocalTrailingSlash : QDirPrivate::RemotePath);
917 }
918
919 QStringView thePathView(thePath);
920 if (options & QUrl::RemoveFilename) {
921 const qsizetype slash = thePathView.lastIndexOf(c: u'/');
922 if (slash == -1)
923 return;
924 thePathView = thePathView.left(n: slash + 1);
925 }
926 // check if we need to remove trailing slashes
927 if (options & QUrl::StripTrailingSlash) {
928 while (thePathView.size() > 1 && thePathView.endsWith(c: u'/'))
929 thePathView.chop(n: 1);
930 }
931
932 appendToUser(appendTo, value: thePathView, options,
933 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? pathInUrl : pathInIsolation);
934}
935
936inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
937{
938 appendToUser(appendTo, value: fragment, options,
939 actions: options & QUrl::EncodeDelimiters ? fragmentInUrl :
940 appendingTo == FullUrl ? nullptr : fragmentInIsolation);
941}
942
943inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
944{
945 appendToUser(appendTo, value: query, options,
946 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? queryInUrl : queryInIsolation);
947}
948
949// setXXX functions
950
951inline bool QUrlPrivate::setScheme(const QString &value, qsizetype len, bool doSetError)
952{
953 // schemes are strictly RFC-compliant:
954 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
955 // we also lowercase the scheme
956
957 // schemes in URLs are not allowed to be empty, but they can be in
958 // "Relative URIs" which QUrl also supports. QUrl::setScheme does
959 // not call us with len == 0, so this can only be from parse()
960 scheme.clear();
961 if (len == 0)
962 return false;
963
964 sectionIsPresent |= Scheme;
965
966 // validate it:
967 qsizetype needsLowercasing = -1;
968 const ushort *p = reinterpret_cast<const ushort *>(value.data());
969 for (qsizetype i = 0; i < len; ++i) {
970 if (isAsciiLower(c: p[i]))
971 continue;
972 if (isAsciiUpper(c: p[i])) {
973 needsLowercasing = i;
974 continue;
975 }
976 if (i) {
977 if (isAsciiDigit(c: p[i]))
978 continue;
979 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
980 continue;
981 }
982
983 // found something else
984 // don't call setError needlessly:
985 // if we've been called from parse(), it will try to recover
986 if (doSetError)
987 setError(errorCode: InvalidSchemeError, source: value, supplement: i);
988 return false;
989 }
990
991 scheme = value.left(n: len);
992
993 if (needsLowercasing != -1) {
994 // schemes are ASCII only, so we don't need the full Unicode toLower
995 QChar *schemeData = scheme.data(); // force detaching here
996 for (qsizetype i = needsLowercasing; i >= 0; --i) {
997 ushort c = schemeData[i].unicode();
998 if (isAsciiUpper(c))
999 schemeData[i] = QChar(c + 0x20);
1000 }
1001 }
1002
1003 // did we set to the file protocol?
1004 if (scheme == fileScheme()
1005#ifdef Q_OS_WIN
1006 || scheme == webDavScheme()
1007#endif
1008 ) {
1009 flags |= IsLocalFile;
1010 } else {
1011 flags &= ~IsLocalFile;
1012 }
1013 return true;
1014}
1015
1016inline void QUrlPrivate::setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode)
1017{
1018 sectionIsPresent &= ~Authority;
1019 port = -1;
1020 if (from == end && !auth.isNull())
1021 sectionIsPresent |= Host; // empty but not null authority implies host
1022
1023 // we never actually _loop_
1024 while (from != end) {
1025 qsizetype userInfoIndex = auth.indexOf(c: u'@', from);
1026 if (size_t(userInfoIndex) < size_t(end)) {
1027 setUserInfo(userInfo: auth, from, end: userInfoIndex);
1028 if (mode == QUrl::StrictMode && !validateComponent(section: UserInfo, input: auth, begin: from, end: userInfoIndex))
1029 break;
1030 from = userInfoIndex + 1;
1031 }
1032
1033 qsizetype colonIndex = auth.lastIndexOf(c: u':', from: end - 1);
1034 if (colonIndex < from)
1035 colonIndex = -1;
1036
1037 if (size_t(colonIndex) < size_t(end)) {
1038 if (auth.at(i: from).unicode() == '[') {
1039 // check if colonIndex isn't inside the "[...]" part
1040 qsizetype closingBracket = auth.indexOf(c: u']', from);
1041 if (size_t(closingBracket) > size_t(colonIndex))
1042 colonIndex = -1;
1043 }
1044 }
1045
1046 if (size_t(colonIndex) < size_t(end) - 1) {
1047 // found a colon with digits after it
1048 unsigned long x = 0;
1049 for (qsizetype i = colonIndex + 1; i < end; ++i) {
1050 ushort c = auth.at(i).unicode();
1051 if (isAsciiDigit(c)) {
1052 x *= 10;
1053 x += c - '0';
1054 } else {
1055 x = ulong(-1); // x != ushort(x)
1056 break;
1057 }
1058 }
1059 if (x == ushort(x)) {
1060 port = ushort(x);
1061 } else {
1062 setError(errorCode: InvalidPortError, source: auth, supplement: colonIndex + 1);
1063 if (mode == QUrl::StrictMode)
1064 break;
1065 }
1066 }
1067
1068 setHost(value: auth, from, end: qMin<size_t>(a: end, b: colonIndex), mode);
1069 if (mode == QUrl::StrictMode && !validateComponent(section: Host, input: auth, begin: from, end: qMin<size_t>(a: end, b: colonIndex))) {
1070 // clear host too
1071 sectionIsPresent &= ~Authority;
1072 break;
1073 }
1074
1075 // success
1076 return;
1077 }
1078 // clear all sections but host
1079 sectionIsPresent &= ~Authority | Host;
1080 userName.clear();
1081 password.clear();
1082 host.clear();
1083 port = -1;
1084}
1085
1086inline void QUrlPrivate::setUserInfo(const QString &userInfo, qsizetype from, qsizetype end)
1087{
1088 qsizetype delimIndex = userInfo.indexOf(c: u':', from);
1089 setUserName(value: userInfo, from, end: qMin<size_t>(a: delimIndex, b: end));
1090
1091 if (size_t(delimIndex) >= size_t(end)) {
1092 password.clear();
1093 sectionIsPresent &= ~Password;
1094 } else {
1095 setPassword(value: userInfo, from: delimIndex + 1, end);
1096 }
1097}
1098
1099inline void QUrlPrivate::setUserName(const QString &value, qsizetype from, qsizetype end)
1100{
1101 sectionIsPresent |= UserName;
1102 userName = recodeFromUser(input: value, actions: userNameInIsolation, from, to: end);
1103}
1104
1105inline void QUrlPrivate::setPassword(const QString &value, qsizetype from, qsizetype end)
1106{
1107 sectionIsPresent |= Password;
1108 password = recodeFromUser(input: value, actions: passwordInIsolation, from, to: end);
1109}
1110
1111inline void QUrlPrivate::setPath(const QString &value, qsizetype from, qsizetype end)
1112{
1113 // sectionIsPresent |= Path; // not used, save some cycles
1114 path = recodeFromUser(input: value, actions: pathInIsolation, from, to: end);
1115}
1116
1117inline void QUrlPrivate::setFragment(const QString &value, qsizetype from, qsizetype end)
1118{
1119 sectionIsPresent |= Fragment;
1120 fragment = recodeFromUser(input: value, actions: fragmentInIsolation, from, to: end);
1121}
1122
1123inline void QUrlPrivate::setQuery(const QString &value, qsizetype from, qsizetype iend)
1124{
1125 sectionIsPresent |= Query;
1126 query = recodeFromUser(input: value, actions: queryInIsolation, from, to: iend);
1127}
1128
1129// Host handling
1130// The RFC says the host is:
1131// host = IP-literal / IPv4address / reg-name
1132// IP-literal = "[" ( IPv6address / IPvFuture ) "]"
1133// IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1134// [a strict definition of IPv6Address and IPv4Address]
1135// reg-name = *( unreserved / pct-encoded / sub-delims )
1136//
1137// We deviate from the standard in all but IPvFuture. For IPvFuture we accept
1138// and store only exactly what the RFC says we should. No percent-encoding is
1139// permitted in this field, so Unicode characters and space aren't either.
1140//
1141// For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
1142// less than three dots). However, we correct the address to the proper form
1143// and store the corrected address. After correction, we comply to the RFC and
1144// it's exclusively composed of unreserved characters.
1145//
1146// For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
1147// addresses, the so-called v4-compat and v4-mapped addresses. We also store
1148// those addresses like that in the hostname field, which violates the spec.
1149// IPv6 hosts are stored with the square brackets in the QString. It also
1150// requires no transformation in any way.
1151//
1152// As for registered names, it's the other way around: we accept only valid
1153// hostnames as specified by STD 3 and IDNA. That means everything we accept is
1154// valid in the RFC definition above, but there are many valid reg-names
1155// according to the RFC that we do not accept in the name of security. Since we
1156// do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
1157// specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
1158
1159inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
1160{
1161 if (host.isEmpty()) {
1162 if ((sectionIsPresent & Host) && appendTo.isNull())
1163 appendTo.detach();
1164 return;
1165 }
1166 if (host.at(i: 0).unicode() == '[') {
1167 // IPv6 addresses might contain a zone-id which needs to be recoded
1168 if (options != 0)
1169 if (qt_urlRecode(appendTo, url: host, encoding: options, tableModifications: nullptr))
1170 return;
1171 appendTo += host;
1172 } else {
1173 // this is either an IPv4Address or a reg-name
1174 // if it is a reg-name, it is already stored in Unicode form
1175 if (options & QUrl::EncodeUnicode && !(options & 0x4000000))
1176 appendTo += qt_ACE_do(domain: host, op: ToAceOnly, dot: AllowLeadingDot, options: {});
1177 else
1178 appendTo += host;
1179 }
1180}
1181
1182// the whole IPvFuture is passed and parsed here, including brackets;
1183// returns null if the parsing was successful, or the QChar of the first failure
1184static const QChar *parseIpFuture(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1185{
1186 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1187 static const char acceptable[] =
1188 "!$&'()*+,;=" // sub-delims
1189 ":" // ":"
1190 "-._~"; // unreserved
1191
1192 // the brackets and the "v" have been checked
1193 const QChar *const origBegin = begin;
1194 if (begin[3].unicode() != '.')
1195 return &begin[3];
1196 if (isHexDigit(c: begin[2].unicode())) {
1197 // this is so unlikely that we'll just go down the slow path
1198 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
1199 host += QStringView(begin, 4);
1200
1201 // uppercase the version, if necessary
1202 if (begin[2].unicode() >= 'a')
1203 host[host.size() - 2] = QChar{begin[2].unicode() - 0x20};
1204
1205 begin += 4;
1206 --end;
1207
1208 QString decoded;
1209 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: decoded, url: QStringView{begin, end}, encoding: QUrl::FullyDecoded, tableModifications: nullptr)) {
1210 begin = decoded.constBegin();
1211 end = decoded.constEnd();
1212 }
1213
1214 for ( ; begin != end; ++begin) {
1215 if (isAsciiLetterOrNumber(c: begin->unicode()))
1216 host += *begin;
1217 else if (begin->unicode() < 0x80 && strchr(s: acceptable, c: begin->unicode()) != nullptr)
1218 host += *begin;
1219 else
1220 return decoded.isEmpty() ? begin : &origBegin[2];
1221 }
1222 host += u']';
1223 return nullptr;
1224 }
1225 return &origBegin[2];
1226}
1227
1228// ONLY the IPv6 address is parsed here, WITHOUT the brackets
1229static const QChar *parseIp6(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1230{
1231 QStringView decoded(begin, end);
1232 QString decodedBuffer;
1233 if (mode == QUrl::TolerantMode) {
1234 // this struct is kept in automatic storage because it's only 4 bytes
1235 const ushort decodeColon[] = { decode(x: ':'), 0 };
1236 if (qt_urlRecode(appendTo&: decodedBuffer, url: decoded, encoding: QUrl::ComponentFormattingOption::PrettyDecoded, tableModifications: decodeColon))
1237 decoded = decodedBuffer;
1238 }
1239
1240 const QStringView zoneIdIdentifier(u"%25");
1241 QIPAddressUtils::IPv6Address address;
1242 QStringView zoneId;
1243
1244 qsizetype zoneIdPosition = decoded.indexOf(s: zoneIdIdentifier);
1245 if ((zoneIdPosition != -1) && (decoded.lastIndexOf(s: zoneIdIdentifier) == zoneIdPosition)) {
1246 zoneId = decoded.mid(pos: zoneIdPosition + zoneIdIdentifier.size());
1247 decoded.truncate(n: zoneIdPosition);
1248
1249 // was there anything after the zone ID separator?
1250 if (zoneId.isEmpty())
1251 return end;
1252 }
1253
1254 // did the address become empty after removing the zone ID?
1255 // (it might have always been empty)
1256 if (decoded.isEmpty())
1257 return end;
1258
1259 const QChar *ret = QIPAddressUtils::parseIp6(address, begin: decoded.constBegin(), end: decoded.constEnd());
1260 if (ret)
1261 return begin + (ret - decoded.constBegin());
1262
1263 host.reserve(asize: host.size() + (end - begin) + 2); // +2 for the brackets
1264 host += u'[';
1265 QIPAddressUtils::toString(appendTo&: host, address);
1266
1267 if (!zoneId.isEmpty()) {
1268 host += zoneIdIdentifier;
1269 host += zoneId;
1270 }
1271 host += u']';
1272 return nullptr;
1273}
1274
1275inline bool
1276QUrlPrivate::setHost(const QString &value, qsizetype from, qsizetype iend, QUrl::ParsingMode mode)
1277{
1278 const QChar *begin = value.constData() + from;
1279 const QChar *end = value.constData() + iend;
1280
1281 const qsizetype len = end - begin;
1282 host.clear();
1283 sectionIsPresent &= ~Host;
1284 if (!value.isNull() || (sectionIsPresent & Authority))
1285 sectionIsPresent |= Host;
1286 if (len == 0)
1287 return true;
1288
1289 if (begin[0].unicode() == '[') {
1290 // IPv6Address or IPvFuture
1291 // smallest IPv6 address is "[::]" (len = 4)
1292 // smallest IPvFuture address is "[v7.X]" (len = 6)
1293 if (end[-1].unicode() != ']') {
1294 setError(errorCode: HostMissingEndBracket, source: value);
1295 return false;
1296 }
1297
1298 if (len > 5 && begin[1].unicode() == 'v') {
1299 const QChar *c = parseIpFuture(host, begin, end, mode);
1300 if (c)
1301 setError(errorCode: InvalidIPvFutureError, source: value, supplement: c - value.constData());
1302 return !c;
1303 } else if (begin[1].unicode() == 'v') {
1304 setError(errorCode: InvalidIPvFutureError, source: value, supplement: from);
1305 }
1306
1307 const QChar *c = parseIp6(host, begin: begin + 1, end: end - 1, mode);
1308 if (!c)
1309 return true;
1310
1311 if (c == end - 1)
1312 setError(errorCode: InvalidIPv6AddressError, source: value, supplement: from);
1313 else
1314 setError(errorCode: InvalidCharacterInIPv6Error, source: value, supplement: c - value.constData());
1315 return false;
1316 }
1317
1318 // check if it's an IPv4 address
1319 QIPAddressUtils::IPv4Address ip4;
1320 if (QIPAddressUtils::parseIp4(address&: ip4, begin, end)) {
1321 // yes, it was
1322 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1323 return true;
1324 }
1325
1326 // This is probably a reg-name.
1327 // But it can also be an encoded string that, when decoded becomes one
1328 // of the types above.
1329 //
1330 // Two types of encoding are possible:
1331 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1332 // Unicode encoding (some non-ASCII characters case-fold to digits
1333 // when nameprepping is done)
1334 //
1335 // The qt_ACE_do function below does IDNA normalization and the STD3 check.
1336 // That means a Unicode string may become an IPv4 address, but it cannot
1337 // produce a '[' or a '%'.
1338
1339 // check for percent-encoding first
1340 QString s;
1341 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: s, url: QStringView{begin, end}, encoding: { }, tableModifications: nullptr)) {
1342 // something was decoded
1343 // anything encoded left?
1344 qsizetype pos = s.indexOf(c: QChar(0x25)); // '%'
1345 if (pos != -1) {
1346 setError(errorCode: InvalidRegNameError, source: s, supplement: pos);
1347 return false;
1348 }
1349
1350 // recurse
1351 return setHost(value: s, from: 0, iend: s.size(), mode: QUrl::StrictMode);
1352 }
1353
1354 s = qt_ACE_do(domain: value.mid(position: from, n: iend - from), op: NormalizeAce, dot: ForbidLeadingDot, options: {});
1355 if (s.isEmpty()) {
1356 setError(errorCode: InvalidRegNameError, source: value);
1357 return false;
1358 }
1359
1360 // check IPv4 again
1361 if (QIPAddressUtils::parseIp4(address&: ip4, begin: s.constBegin(), end: s.constEnd())) {
1362 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1363 } else {
1364 host = s;
1365 }
1366 return true;
1367}
1368
1369inline void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1370{
1371 // URI-reference = URI / relative-ref
1372 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1373 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
1374 // hier-part = "//" authority path-abempty
1375 // / other path types
1376 // relative-part = "//" authority path-abempty
1377 // / other path types here
1378
1379 sectionIsPresent = 0;
1380 flags = 0;
1381 clearError();
1382
1383 // find the important delimiters
1384 qsizetype colon = -1;
1385 qsizetype question = -1;
1386 qsizetype hash = -1;
1387 const qsizetype len = url.size();
1388 const QChar *const begin = url.constData();
1389 const ushort *const data = reinterpret_cast<const ushort *>(begin);
1390
1391 for (qsizetype i = 0; i < len; ++i) {
1392 size_t uc = data[i];
1393 if (uc == '#' && hash == -1) {
1394 hash = i;
1395
1396 // nothing more to be found
1397 break;
1398 }
1399
1400 if (question == -1) {
1401 if (uc == ':' && colon == -1)
1402 colon = i;
1403 else if (uc == '?')
1404 question = i;
1405 }
1406 }
1407
1408 // check if we have a scheme
1409 qsizetype hierStart;
1410 if (colon != -1 && setScheme(value: url, len: colon, /* don't set error */ doSetError: false)) {
1411 hierStart = colon + 1;
1412 } else {
1413 // recover from a failed scheme: it might not have been a scheme at all
1414 scheme.clear();
1415 sectionIsPresent = 0;
1416 hierStart = 0;
1417 }
1418
1419 qsizetype pathStart;
1420 qsizetype hierEnd = qMin<size_t>(a: qMin<size_t>(a: question, b: hash), b: len);
1421 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1422 // we have an authority, it ends at the first slash after these
1423 qsizetype authorityEnd = hierEnd;
1424 for (qsizetype i = hierStart + 2; i < authorityEnd ; ++i) {
1425 if (data[i] == '/') {
1426 authorityEnd = i;
1427 break;
1428 }
1429 }
1430
1431 setAuthority(auth: url, from: hierStart + 2, end: authorityEnd, mode: parsingMode);
1432
1433 // even if we failed to set the authority properly, let's try to recover
1434 pathStart = authorityEnd;
1435 setPath(value: url, from: pathStart, end: hierEnd);
1436 } else {
1437 userName.clear();
1438 password.clear();
1439 host.clear();
1440 port = -1;
1441 pathStart = hierStart;
1442
1443 if (hierStart < hierEnd)
1444 setPath(value: url, from: hierStart, end: hierEnd);
1445 else
1446 path.clear();
1447 }
1448
1449 if (size_t(question) < size_t(hash))
1450 setQuery(value: url, from: question + 1, iend: qMin<size_t>(a: hash, b: len));
1451
1452 if (hash != -1)
1453 setFragment(value: url, from: hash + 1, end: len);
1454
1455 if (error || parsingMode == QUrl::TolerantMode)
1456 return;
1457
1458 // The parsing so far was partially tolerant of errors, except for the
1459 // scheme parser (which is always strict) and the authority (which was
1460 // executed in strict mode).
1461 // If we haven't found any errors so far, continue the strict-mode parsing
1462 // from the path component onwards.
1463
1464 if (!validateComponent(section: Path, input: url, begin: pathStart, end: hierEnd))
1465 return;
1466 if (size_t(question) < size_t(hash) && !validateComponent(section: Query, input: url, begin: question + 1, end: qMin<size_t>(a: hash, b: len)))
1467 return;
1468 if (hash != -1)
1469 validateComponent(section: Fragment, input: url, begin: hash + 1, end: len);
1470}
1471
1472QString QUrlPrivate::toLocalFile(QUrl::FormattingOptions options) const
1473{
1474 QString tmp;
1475 QString ourPath;
1476 appendPath(appendTo&: ourPath, options, appendingTo: QUrlPrivate::Path);
1477
1478 // magic for shared drive on windows
1479 if (!host.isEmpty()) {
1480 tmp = "//"_L1 + host;
1481#ifdef Q_OS_WIN // QTBUG-42346, WebDAV is visible as local file on Windows only.
1482 if (scheme == webDavScheme())
1483 tmp += webDavSslTag();
1484#endif
1485 if (!ourPath.isEmpty() && !ourPath.startsWith(c: u'/'))
1486 tmp += u'/';
1487 tmp += ourPath;
1488 } else {
1489 tmp = ourPath;
1490#ifdef Q_OS_WIN
1491 // magic for drives on windows
1492 if (ourPath.length() > 2 && ourPath.at(0) == u'/' && ourPath.at(2) == u':')
1493 tmp.remove(0, 1);
1494#endif
1495 }
1496 return tmp;
1497}
1498
1499/*
1500 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1501
1502 Returns a merge of the current path with the relative path passed
1503 as argument.
1504
1505 Note: \a relativePath is relative (does not start with '/').
1506*/
1507inline QString QUrlPrivate::mergePaths(const QString &relativePath) const
1508{
1509 // If the base URI has a defined authority component and an empty
1510 // path, then return a string consisting of "/" concatenated with
1511 // the reference's path; otherwise,
1512 if (!host.isEmpty() && path.isEmpty())
1513 return u'/' + relativePath;
1514
1515 // Return a string consisting of the reference's path component
1516 // appended to all but the last segment of the base URI's path
1517 // (i.e., excluding any characters after the right-most "/" in the
1518 // base URI path, or excluding the entire base URI path if it does
1519 // not contain any "/" characters).
1520 QString newPath;
1521 if (!path.contains(c: u'/'))
1522 newPath = relativePath;
1523 else
1524 newPath = QStringView{path}.left(n: path.lastIndexOf(c: u'/') + 1) + relativePath;
1525
1526 return newPath;
1527}
1528
1529// Authority-less URLs cannot have paths starting with double slashes (see
1530// QUrlPrivate::validityError). We refuse to turn a valid URL into invalid by
1531// way of QUrl::resolved().
1532static void fixupNonAuthorityPath(QString *path)
1533{
1534 if (path->isEmpty() || path->at(i: 0) != u'/')
1535 return;
1536
1537 // Find the first non-slash character, because its position is equal to the
1538 // number of slashes. We'll remove all but one of them.
1539 qsizetype i = 0;
1540 while (i + 1 < path->size() && path->at(i: i + 1) == u'/')
1541 ++i;
1542 if (i)
1543 path->remove(i: 0, len: i);
1544}
1545
1546inline QUrlPrivate::ErrorCode QUrlPrivate::validityError(QString *source, qsizetype *position) const
1547{
1548 Q_ASSERT(!source == !position);
1549 if (error) {
1550 if (source) {
1551 *source = error->source;
1552 *position = error->position;
1553 }
1554 return error->code;
1555 }
1556
1557 // There are three more cases of invalid URLs that QUrl recognizes and they
1558 // are only possible with constructed URLs (setXXX methods), not with
1559 // parsing. Therefore, they are tested here.
1560 //
1561 // Two cases are a non-empty path that doesn't start with a slash and:
1562 // - with an authority
1563 // - without an authority, without scheme but the path with a colon before
1564 // the first slash
1565 // The third case is an empty authority and a non-empty path that starts
1566 // with "//".
1567 // Those cases are considered invalid because toString() would produce a URL
1568 // that wouldn't be parsed back to the same QUrl.
1569
1570 if (path.isEmpty())
1571 return NoError;
1572 if (path.at(i: 0) == u'/') {
1573 if (hasAuthority() || path.size() == 1 || path.at(i: 1) != u'/')
1574 return NoError;
1575 if (source) {
1576 *source = path;
1577 *position = 0;
1578 }
1579 return AuthorityAbsentAndPathIsDoubleSlash;
1580 }
1581
1582 if (sectionIsPresent & QUrlPrivate::Host) {
1583 if (source) {
1584 *source = path;
1585 *position = 0;
1586 }
1587 return AuthorityPresentAndPathIsRelative;
1588 }
1589 if (sectionIsPresent & QUrlPrivate::Scheme)
1590 return NoError;
1591
1592 // check for a path of "text:text/"
1593 for (qsizetype i = 0; i < path.size(); ++i) {
1594 ushort c = path.at(i).unicode();
1595 if (c == '/') {
1596 // found the slash before the colon
1597 return NoError;
1598 }
1599 if (c == ':') {
1600 // found the colon before the slash, it's invalid
1601 if (source) {
1602 *source = path;
1603 *position = i;
1604 }
1605 return RelativeUrlPathContainsColonBeforeSlash;
1606 }
1607 }
1608 return NoError;
1609}
1610
1611bool QUrlPrivate::validateComponent(QUrlPrivate::Section section, const QString &input,
1612 qsizetype begin, qsizetype end)
1613{
1614 // What we need to look out for, that the regular parser tolerates:
1615 // - percent signs not followed by two hex digits
1616 // - forbidden characters, which should always appear encoded
1617 // '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1618 // control characters
1619 // - delimiters not allowed in certain positions
1620 // . scheme: parser is already strict
1621 // . user info: gen-delims except ":" disallowed ("/" / "?" / "#" / "[" / "]" / "@")
1622 // . host: parser is stricter than the standard
1623 // . port: parser is stricter than the standard
1624 // . path: all delimiters allowed
1625 // . fragment: all delimiters allowed
1626 // . query: all delimiters allowed
1627 static const char forbidden[] = "\"<>\\^`{|}\x7F";
1628 static const char forbiddenUserInfo[] = ":/?#[]@";
1629
1630 Q_ASSERT(section != Authority && section != Hierarchy && section != FullUrl);
1631
1632 const ushort *const data = reinterpret_cast<const ushort *>(input.constData());
1633 for (size_t i = size_t(begin); i < size_t(end); ++i) {
1634 uint uc = data[i];
1635 if (uc >= 0x80)
1636 continue;
1637
1638 bool error = false;
1639 if ((uc == '%' && (size_t(end) < i + 2 || !isHex(c: data[i + 1]) || !isHex(c: data[i + 2])))
1640 || uc <= 0x20 || strchr(s: forbidden, c: uc)) {
1641 // found an error
1642 error = true;
1643 } else if (section & UserInfo) {
1644 if (section == UserInfo && strchr(s: forbiddenUserInfo + 1, c: uc))
1645 error = true;
1646 else if (section != UserInfo && strchr(s: forbiddenUserInfo, c: uc))
1647 error = true;
1648 }
1649
1650 if (!error)
1651 continue;
1652
1653 ErrorCode errorCode = ErrorCode(int(section) << 8);
1654 if (section == UserInfo) {
1655 // is it the user name or the password?
1656 errorCode = InvalidUserNameError;
1657 for (size_t j = size_t(begin); j < i; ++j)
1658 if (data[j] == ':') {
1659 errorCode = InvalidPasswordError;
1660 break;
1661 }
1662 }
1663
1664 setError(errorCode, source: input, supplement: i);
1665 return false;
1666 }
1667
1668 // no errors
1669 return true;
1670}
1671
1672#if 0
1673inline void QUrlPrivate::validate() const
1674{
1675 QUrlPrivate *that = (QUrlPrivate *)this;
1676 that->encodedOriginal = that->toEncoded(); // may detach
1677 parse(ParseOnly);
1678
1679 QURL_SETFLAG(that->stateFlags, Validated);
1680
1681 if (!isValid)
1682 return;
1683
1684 QString auth = authority(); // causes the non-encoded forms to be valid
1685
1686 // authority() calls canonicalHost() which sets this
1687 if (!isHostValid)
1688 return;
1689
1690 if (scheme == "mailto"_L1) {
1691 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1692 that->isValid = false;
1693 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1694 "port and password"),
1695 0, 0);
1696 }
1697 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1698 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1699 that->isValid = false;
1700 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1701 0, 0);
1702 }
1703 }
1704}
1705#endif
1706
1707/*!
1708 \macro QT_NO_URL_CAST_FROM_STRING
1709 \relates QUrl
1710
1711 Disables automatic conversions from QString (or char *) to QUrl.
1712
1713 Compiling your code with this define is useful when you have a lot of
1714 code that uses QString for file names and you wish to convert it to
1715 use QUrl for network transparency. In any code that uses QUrl, it can
1716 help avoid missing QUrl::resolved() calls, and other misuses of
1717 QString to QUrl conversions.
1718
1719 For example, if you have code like
1720
1721 \code
1722 url = filename; // probably not what you want
1723 \endcode
1724
1725 you can rewrite it as
1726
1727 \code
1728 url = QUrl::fromLocalFile(filename);
1729 url = baseurl.resolved(QUrl(filename));
1730 \endcode
1731
1732 \sa QT_NO_CAST_FROM_ASCII
1733*/
1734
1735
1736/*!
1737 Constructs a URL by parsing \a url. Note this constructor expects a proper
1738 URL or URL-Reference and will not attempt to guess intent. For example, the
1739 following declaration:
1740
1741 \snippet code/src_corelib_io_qurl.cpp constructor-url-reference
1742
1743 Will construct a valid URL but it may not be what one expects, as the
1744 scheme() part of the input is missing. For a string like the above,
1745 applications may want to use fromUserInput(). For this constructor or
1746 setUrl(), the following is probably what was intended:
1747
1748 \snippet code/src_corelib_io_qurl.cpp constructor-url
1749
1750 QUrl will automatically percent encode
1751 all characters that are not allowed in a URL and decode the percent-encoded
1752 sequences that represent an unreserved character (letters, digits, hyphens,
1753 underscores, dots and tildes). All other characters are left in their
1754 original forms.
1755
1756 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1757 (the default), QUrl will correct certain mistakes, notably the presence of
1758 a percent character ('%') not followed by two hexadecimal digits, and it
1759 will accept any character in any position. In StrictMode, encoding mistakes
1760 will not be tolerated and QUrl will also check that certain forbidden
1761 characters are not present in unencoded form. If an error is detected in
1762 StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1763 permitted in this context.
1764
1765 Example:
1766
1767 \snippet code/src_corelib_io_qurl.cpp 0
1768
1769 To construct a URL from an encoded string, you can also use fromEncoded():
1770
1771 \snippet code/src_corelib_io_qurl.cpp 1
1772
1773 Both functions are equivalent and, in Qt 5, both functions accept encoded
1774 data. Usually, the choice of the QUrl constructor or setUrl() versus
1775 fromEncoded() will depend on the source data: the constructor and setUrl()
1776 take a QString, whereas fromEncoded takes a QByteArray.
1777
1778 \sa setUrl(), fromEncoded(), TolerantMode
1779*/
1780QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(nullptr)
1781{
1782 setUrl(url, mode: parsingMode);
1783}
1784
1785/*!
1786 Constructs an empty QUrl object.
1787*/
1788QUrl::QUrl() : d(nullptr)
1789{
1790}
1791
1792/*!
1793 Constructs a copy of \a other.
1794*/
1795QUrl::QUrl(const QUrl &other) noexcept : d(other.d)
1796{
1797 if (d)
1798 d->ref.ref();
1799}
1800
1801/*!
1802 Destructor; called immediately before the object is deleted.
1803*/
1804QUrl::~QUrl()
1805{
1806 if (d && !d->ref.deref())
1807 delete d;
1808}
1809
1810/*!
1811 Returns \c true if the URL is non-empty and valid; otherwise returns \c false.
1812
1813 The URL is run through a conformance test. Every part of the URL
1814 must conform to the standard encoding rules of the URI standard
1815 for the URL to be reported as valid.
1816
1817 \snippet code/src_corelib_io_qurl.cpp 2
1818*/
1819bool QUrl::isValid() const
1820{
1821 if (isEmpty()) {
1822 // also catches d == nullptr
1823 return false;
1824 }
1825 return d->validityError() == QUrlPrivate::NoError;
1826}
1827
1828/*!
1829 Returns \c true if the URL has no data; otherwise returns \c false.
1830
1831 \sa clear()
1832*/
1833bool QUrl::isEmpty() const
1834{
1835 if (!d) return true;
1836 return d->isEmpty();
1837}
1838
1839/*!
1840 Resets the content of the QUrl. After calling this function, the
1841 QUrl is equal to one that has been constructed with the default
1842 empty constructor.
1843
1844 \sa isEmpty()
1845*/
1846void QUrl::clear()
1847{
1848 if (d && !d->ref.deref())
1849 delete d;
1850 d = nullptr;
1851}
1852
1853/*!
1854 Parses \a url and sets this object to that value. QUrl will automatically
1855 percent encode all characters that are not allowed in a URL and decode the
1856 percent-encoded sequences that represent an unreserved character (letters,
1857 digits, hyphens, underscores, dots and tildes). All other characters are
1858 left in their original forms.
1859
1860 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1861 (the default), QUrl will correct certain mistakes, notably the presence of
1862 a percent character ('%') not followed by two hexadecimal digits, and it
1863 will accept any character in any position. In StrictMode, encoding mistakes
1864 will not be tolerated and QUrl will also check that certain forbidden
1865 characters are not present in unencoded form. If an error is detected in
1866 StrictMode, isValid() will return false. The parsing mode DecodedMode is
1867 not permitted in this context and will produce a run-time warning.
1868
1869 \sa url(), toString()
1870*/
1871void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1872{
1873 if (parsingMode == DecodedMode) {
1874 qWarning(msg: "QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1875 } else {
1876 detach();
1877 d->parse(url, parsingMode);
1878 }
1879}
1880
1881/*!
1882 Sets the scheme of the URL to \a scheme. As a scheme can only
1883 contain ASCII characters, no conversion or decoding is done on the
1884 input. It must also start with an ASCII letter.
1885
1886 The scheme describes the type (or protocol) of the URL. It's
1887 represented by one or more ASCII characters at the start the URL.
1888
1889 A scheme is strictly \l {RFC 3986}-compliant:
1890 \tt {scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}
1891
1892 The following example shows a URL where the scheme is "ftp":
1893
1894 \image qurl-authority2.png
1895
1896 To set the scheme, the following call is used:
1897 \snippet code/src_corelib_io_qurl.cpp 11
1898
1899 The scheme can also be empty, in which case the URL is interpreted
1900 as relative.
1901
1902 \sa scheme(), isRelative()
1903*/
1904void QUrl::setScheme(const QString &scheme)
1905{
1906 detach();
1907 d->clearError();
1908 if (scheme.isEmpty()) {
1909 // schemes are not allowed to be empty
1910 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1911 d->flags &= ~QUrlPrivate::IsLocalFile;
1912 d->scheme.clear();
1913 } else {
1914 d->setScheme(value: scheme, len: scheme.size(), /* do set error */ doSetError: true);
1915 }
1916}
1917
1918/*!
1919 Returns the scheme of the URL. If an empty string is returned,
1920 this means the scheme is undefined and the URL is then relative.
1921
1922 The scheme can only contain US-ASCII letters or digits, which means it
1923 cannot contain any character that would otherwise require encoding.
1924 Additionally, schemes are always returned in lowercase form.
1925
1926 \sa setScheme(), isRelative()
1927*/
1928QString QUrl::scheme() const
1929{
1930 if (!d) return QString();
1931
1932 return d->scheme;
1933}
1934
1935/*!
1936 Sets the authority of the URL to \a authority.
1937
1938 The authority of a URL is the combination of user info, a host
1939 name and a port. All of these elements are optional; an empty
1940 authority is therefore valid.
1941
1942 The user info and host are separated by a '@', and the host and
1943 port are separated by a ':'. If the user info is empty, the '@'
1944 must be omitted; although a stray ':' is permitted if the port is
1945 empty.
1946
1947 The following example shows a valid authority string:
1948
1949 \image qurl-authority.png
1950
1951 The \a authority data is interpreted according to \a mode: in StrictMode,
1952 any '%' characters must be followed by exactly two hexadecimal characters
1953 and some characters (including space) are not allowed in undecoded form. In
1954 TolerantMode (the default), all characters are accepted in undecoded form
1955 and the tolerant parser will correct stray '%' not followed by two hex
1956 characters.
1957
1958 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
1959 decoded data, call setUserName(), setPassword(), setHost() and setPort()
1960 individually.
1961
1962 \sa setUserInfo(), setHost(), setPort()
1963*/
1964void QUrl::setAuthority(const QString &authority, ParsingMode mode)
1965{
1966 detach();
1967 d->clearError();
1968
1969 if (mode == DecodedMode) {
1970 qWarning(msg: "QUrl::setAuthority(): QUrl::DecodedMode is not permitted in this function");
1971 return;
1972 }
1973
1974 d->setAuthority(auth: authority, from: 0, end: authority.size(), mode);
1975}
1976
1977/*!
1978 Returns the authority of the URL if it is defined; otherwise
1979 an empty string is returned.
1980
1981 This function returns an unambiguous value, which may contain that
1982 characters still percent-encoded, plus some control sequences not
1983 representable in decoded form in QString.
1984
1985 The \a options argument controls how to format the user info component. The
1986 value of QUrl::FullyDecoded is not permitted in this function. If you need
1987 to obtain fully decoded data, call userName(), password(), host() and
1988 port() individually.
1989
1990 \sa setAuthority(), userInfo(), userName(), password(), host(), port()
1991*/
1992QString QUrl::authority(ComponentFormattingOptions options) const
1993{
1994 QString result;
1995 if (!d)
1996 return result;
1997
1998 if (options == QUrl::FullyDecoded) {
1999 qWarning(msg: "QUrl::authority(): QUrl::FullyDecoded is not permitted in this function");
2000 return result;
2001 }
2002
2003 d->appendAuthority(appendTo&: result, options, appendingTo: QUrlPrivate::Authority);
2004 return result;
2005}
2006
2007/*!
2008 Sets the user info of the URL to \a userInfo. The user info is an
2009 optional part of the authority of the URL, as described in
2010 setAuthority().
2011
2012 The user info consists of a user name and optionally a password,
2013 separated by a ':'. If the password is empty, the colon must be
2014 omitted. The following example shows a valid user info string:
2015
2016 \image qurl-authority3.png
2017
2018 The \a userInfo data is interpreted according to \a mode: in StrictMode,
2019 any '%' characters must be followed by exactly two hexadecimal characters
2020 and some characters (including space) are not allowed in undecoded form. In
2021 TolerantMode (the default), all characters are accepted in undecoded form
2022 and the tolerant parser will correct stray '%' not followed by two hex
2023 characters.
2024
2025 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2026 decoded data, call setUserName() and setPassword() individually.
2027
2028 \sa userInfo(), setUserName(), setPassword(), setAuthority()
2029*/
2030void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
2031{
2032 detach();
2033 d->clearError();
2034 QString trimmed = userInfo.trimmed();
2035 if (mode == DecodedMode) {
2036 qWarning(msg: "QUrl::setUserInfo(): QUrl::DecodedMode is not permitted in this function");
2037 return;
2038 }
2039
2040 d->setUserInfo(userInfo: trimmed, from: 0, end: trimmed.size());
2041 if (userInfo.isNull()) {
2042 // QUrlPrivate::setUserInfo cleared almost everything
2043 // but it leaves the UserName bit set
2044 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2045 } else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserInfo, input: userInfo)) {
2046 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2047 d->userName.clear();
2048 d->password.clear();
2049 }
2050}
2051
2052/*!
2053 Returns the user info of the URL, or an empty string if the user
2054 info is undefined.
2055
2056 This function returns an unambiguous value, which may contain that
2057 characters still percent-encoded, plus some control sequences not
2058 representable in decoded form in QString.
2059
2060 The \a options argument controls how to format the user info component. The
2061 value of QUrl::FullyDecoded is not permitted in this function. If you need
2062 to obtain fully decoded data, call userName() and password() individually.
2063
2064 \sa setUserInfo(), userName(), password(), authority()
2065*/
2066QString QUrl::userInfo(ComponentFormattingOptions options) const
2067{
2068 QString result;
2069 if (!d)
2070 return result;
2071
2072 if (options == QUrl::FullyDecoded) {
2073 qWarning(msg: "QUrl::userInfo(): QUrl::FullyDecoded is not permitted in this function");
2074 return result;
2075 }
2076
2077 d->appendUserInfo(appendTo&: result, options, appendingTo: QUrlPrivate::UserInfo);
2078 return result;
2079}
2080
2081/*!
2082 Sets the URL's user name to \a userName. The \a userName is part
2083 of the user info element in the authority of the URL, as described
2084 in setUserInfo().
2085
2086 The \a userName data is interpreted according to \a mode: in StrictMode,
2087 any '%' characters must be followed by exactly two hexadecimal characters
2088 and some characters (including space) are not allowed in undecoded form. In
2089 TolerantMode (the default), all characters are accepted in undecoded form
2090 and the tolerant parser will correct stray '%' not followed by two hex
2091 characters. In DecodedMode, '%' stand for themselves and encoded characters
2092 are not possible.
2093
2094 QUrl::DecodedMode should be used when setting the user name from a data
2095 source which is not a URL, such as a password dialog shown to the user or
2096 with a user name obtained by calling userName() with the QUrl::FullyDecoded
2097 formatting option.
2098
2099 \sa userName(), setUserInfo()
2100*/
2101void QUrl::setUserName(const QString &userName, ParsingMode mode)
2102{
2103 detach();
2104 d->clearError();
2105
2106 QString data = userName;
2107 if (mode == DecodedMode) {
2108 parseDecodedComponent(data);
2109 mode = TolerantMode;
2110 }
2111
2112 d->setUserName(value: data, from: 0, end: data.size());
2113 if (userName.isNull())
2114 d->sectionIsPresent &= ~QUrlPrivate::UserName;
2115 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserName, input: userName))
2116 d->userName.clear();
2117}
2118
2119/*!
2120 Returns the user name of the URL if it is defined; otherwise
2121 an empty string is returned.
2122
2123 The \a options argument controls how to format the user name component. All
2124 values produce an unambiguous result. With QUrl::FullyDecoded, all
2125 percent-encoded sequences are decoded; otherwise, the returned value may
2126 contain some percent-encoded sequences for some control sequences not
2127 representable in decoded form in QString.
2128
2129 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2130 sequences are present. It is recommended to use that value when the result
2131 will be used in a non-URL context, such as setting in QAuthenticator or
2132 negotiating a login.
2133
2134 \sa setUserName(), userInfo()
2135*/
2136QString QUrl::userName(ComponentFormattingOptions options) const
2137{
2138 QString result;
2139 if (d)
2140 d->appendUserName(appendTo&: result, options);
2141 return result;
2142}
2143
2144/*!
2145 Sets the URL's password to \a password. The \a password is part of
2146 the user info element in the authority of the URL, as described in
2147 setUserInfo().
2148
2149 The \a password data is interpreted according to \a mode: in StrictMode,
2150 any '%' characters must be followed by exactly two hexadecimal characters
2151 and some characters (including space) are not allowed in undecoded form. In
2152 TolerantMode, all characters are accepted in undecoded form and the
2153 tolerant parser will correct stray '%' not followed by two hex characters.
2154 In DecodedMode, '%' stand for themselves and encoded characters are not
2155 possible.
2156
2157 QUrl::DecodedMode should be used when setting the password from a data
2158 source which is not a URL, such as a password dialog shown to the user or
2159 with a password obtained by calling password() with the QUrl::FullyDecoded
2160 formatting option.
2161
2162 \sa password(), setUserInfo()
2163*/
2164void QUrl::setPassword(const QString &password, ParsingMode mode)
2165{
2166 detach();
2167 d->clearError();
2168
2169 QString data = password;
2170 if (mode == DecodedMode) {
2171 parseDecodedComponent(data);
2172 mode = TolerantMode;
2173 }
2174
2175 d->setPassword(value: data, from: 0, end: data.size());
2176 if (password.isNull())
2177 d->sectionIsPresent &= ~QUrlPrivate::Password;
2178 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Password, input: password))
2179 d->password.clear();
2180}
2181
2182/*!
2183 Returns the password of the URL if it is defined; otherwise
2184 an empty string is returned.
2185
2186 The \a options argument controls how to format the user name component. All
2187 values produce an unambiguous result. With QUrl::FullyDecoded, all
2188 percent-encoded sequences are decoded; otherwise, the returned value may
2189 contain some percent-encoded sequences for some control sequences not
2190 representable in decoded form in QString.
2191
2192 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2193 sequences are present. It is recommended to use that value when the result
2194 will be used in a non-URL context, such as setting in QAuthenticator or
2195 negotiating a login.
2196
2197 \sa setPassword()
2198*/
2199QString QUrl::password(ComponentFormattingOptions options) const
2200{
2201 QString result;
2202 if (d)
2203 d->appendPassword(appendTo&: result, options);
2204 return result;
2205}
2206
2207/*!
2208 Sets the host of the URL to \a host. The host is part of the
2209 authority.
2210
2211 The \a host data is interpreted according to \a mode: in StrictMode,
2212 any '%' characters must be followed by exactly two hexadecimal characters
2213 and some characters (including space) are not allowed in undecoded form. In
2214 TolerantMode, all characters are accepted in undecoded form and the
2215 tolerant parser will correct stray '%' not followed by two hex characters.
2216 In DecodedMode, '%' stand for themselves and encoded characters are not
2217 possible.
2218
2219 Note that, in all cases, the result of the parsing must be a valid hostname
2220 according to STD 3 rules, as modified by the Internationalized Resource
2221 Identifiers specification (RFC 3987). Invalid hostnames are not permitted
2222 and will cause isValid() to become false.
2223
2224 \sa host(), setAuthority()
2225*/
2226void QUrl::setHost(const QString &host, ParsingMode mode)
2227{
2228 detach();
2229 d->clearError();
2230
2231 QString data = host;
2232 if (mode == DecodedMode) {
2233 parseDecodedComponent(data);
2234 mode = TolerantMode;
2235 }
2236
2237 if (d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2238 return;
2239 } else if (!data.startsWith(c: u'[')) {
2240 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
2241 Q_ASSERT(d->error);
2242
2243 data.prepend(c: u'[');
2244 data.append(c: u']');
2245 if (!d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2246 // failed again
2247 if (data.contains(c: u':')) {
2248 // source data contains ':', so it's an IPv6 error
2249 d->error->code = QUrlPrivate::InvalidIPv6AddressError;
2250 }
2251 d->sectionIsPresent &= ~QUrlPrivate::Host;
2252 } else {
2253 // succeeded
2254 d->clearError();
2255 }
2256 }
2257}
2258
2259/*!
2260 Returns the host of the URL if it is defined; otherwise
2261 an empty string is returned.
2262
2263 The \a options argument controls how the hostname will be formatted. The
2264 QUrl::EncodeUnicode option will cause this function to return the hostname
2265 in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
2266 channels that are not 8-bit clean or that require the legacy hostname (such
2267 as DNS requests or in HTTP request headers). If that flag is not present,
2268 this function returns the International Domain Name (IDN) in Unicode form,
2269 according to the list of permissible top-level domains (see
2270 idnWhitelist()).
2271
2272 All other flags are ignored. Host names cannot contain control or percent
2273 characters, so the returned value can be considered fully decoded.
2274
2275 \sa setHost(), idnWhitelist(), setIdnWhitelist(), authority()
2276*/
2277QString QUrl::host(ComponentFormattingOptions options) const
2278{
2279 QString result;
2280 if (d) {
2281 d->appendHost(appendTo&: result, options);
2282 if (result.startsWith(c: u'['))
2283 result = result.mid(position: 1, n: result.size() - 2);
2284 }
2285 return result;
2286}
2287
2288/*!
2289 Sets the port of the URL to \a port. The port is part of the
2290 authority of the URL, as described in setAuthority().
2291
2292 \a port must be between 0 and 65535 inclusive. Setting the
2293 port to -1 indicates that the port is unspecified.
2294*/
2295void QUrl::setPort(int port)
2296{
2297 detach();
2298 d->clearError();
2299
2300 if (port < -1 || port > 65535) {
2301 d->setError(errorCode: QUrlPrivate::InvalidPortError, source: QString::number(port), supplement: 0);
2302 port = -1;
2303 }
2304
2305 d->port = port;
2306 if (port != -1)
2307 d->sectionIsPresent |= QUrlPrivate::Host;
2308}
2309
2310/*!
2311 \since 4.1
2312
2313 Returns the port of the URL, or \a defaultPort if the port is
2314 unspecified.
2315
2316 Example:
2317
2318 \snippet code/src_corelib_io_qurl.cpp 3
2319*/
2320int QUrl::port(int defaultPort) const
2321{
2322 if (!d) return defaultPort;
2323 return d->port == -1 ? defaultPort : d->port;
2324}
2325
2326/*!
2327 Sets the path of the URL to \a path. The path is the part of the
2328 URL that comes after the authority but before the query string.
2329
2330 \image qurl-ftppath.png
2331
2332 For non-hierarchical schemes, the path will be everything
2333 following the scheme declaration, as in the following example:
2334
2335 \image qurl-mailtopath.png
2336
2337 The \a path data is interpreted according to \a mode: in StrictMode,
2338 any '%' characters must be followed by exactly two hexadecimal characters
2339 and some characters (including space) are not allowed in undecoded form. In
2340 TolerantMode, all characters are accepted in undecoded form and the
2341 tolerant parser will correct stray '%' not followed by two hex characters.
2342 In DecodedMode, '%' stand for themselves and encoded characters are not
2343 possible.
2344
2345 QUrl::DecodedMode should be used when setting the path from a data source
2346 which is not a URL, such as a dialog shown to the user or with a path
2347 obtained by calling path() with the QUrl::FullyDecoded formatting option.
2348
2349 \sa path()
2350*/
2351void QUrl::setPath(const QString &path, ParsingMode mode)
2352{
2353 detach();
2354 d->clearError();
2355
2356 QString data = path;
2357 if (mode == DecodedMode) {
2358 parseDecodedComponent(data);
2359 mode = TolerantMode;
2360 }
2361
2362 d->setPath(value: data, from: 0, end: data.size());
2363
2364 // optimized out, since there is no path delimiter
2365// if (path.isNull())
2366// d->sectionIsPresent &= ~QUrlPrivate::Path;
2367// else
2368 if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Path, input: path))
2369 d->path.clear();
2370}
2371
2372/*!
2373 Returns the path of the URL.
2374
2375 \snippet code/src_corelib_io_qurl.cpp 12
2376
2377 The \a options argument controls how to format the path component. All
2378 values produce an unambiguous result. With QUrl::FullyDecoded, all
2379 percent-encoded sequences are decoded; otherwise, the returned value may
2380 contain some percent-encoded sequences for some control sequences not
2381 representable in decoded form in QString.
2382
2383 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2384 sequences are present. It is recommended to use that value when the result
2385 will be used in a non-URL context, such as sending to an FTP server.
2386
2387 An example of data loss is when you have non-Unicode percent-encoded sequences
2388 and use FullyDecoded (the default):
2389
2390 \snippet code/src_corelib_io_qurl.cpp 13
2391
2392 In this example, there will be some level of data loss because the \c %FF cannot
2393 be converted.
2394
2395 Data loss can also occur when the path contains sub-delimiters (such as \c +):
2396
2397 \snippet code/src_corelib_io_qurl.cpp 14
2398
2399 Other decoding examples:
2400
2401 \snippet code/src_corelib_io_qurl.cpp 15
2402
2403 \sa setPath()
2404*/
2405QString QUrl::path(ComponentFormattingOptions options) const
2406{
2407 QString result;
2408 if (d)
2409 d->appendPath(appendTo&: result, options, appendingTo: QUrlPrivate::Path);
2410 return result;
2411}
2412
2413/*!
2414 \since 5.2
2415
2416 Returns the name of the file, excluding the directory path.
2417
2418 Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
2419
2420 If the path doesn't contain any slash, it is fully returned as the fileName.
2421
2422 Example:
2423
2424 \snippet code/src_corelib_io_qurl.cpp 7
2425
2426 The \a options argument controls how to format the file name component. All
2427 values produce an unambiguous result. With QUrl::FullyDecoded, all
2428 percent-encoded sequences are decoded; otherwise, the returned value may
2429 contain some percent-encoded sequences for some control sequences not
2430 representable in decoded form in QString.
2431
2432 \sa path()
2433*/
2434QString QUrl::fileName(ComponentFormattingOptions options) const
2435{
2436 const QString ourPath = path(options);
2437 const qsizetype slash = ourPath.lastIndexOf(c: u'/');
2438 if (slash == -1)
2439 return ourPath;
2440 return ourPath.mid(position: slash + 1);
2441}
2442
2443/*!
2444 \since 4.2
2445
2446 Returns \c true if this URL contains a Query (i.e., if ? was seen on it).
2447
2448 \sa setQuery(), query(), hasFragment()
2449*/
2450bool QUrl::hasQuery() const
2451{
2452 if (!d) return false;
2453 return d->hasQuery();
2454}
2455
2456/*!
2457 Sets the query string of the URL to \a query.
2458
2459 This function is useful if you need to pass a query string that
2460 does not fit into the key-value pattern, or that uses a different
2461 scheme for encoding special characters than what is suggested by
2462 QUrl.
2463
2464 Passing a value of QString() to \a query (a null QString) unsets
2465 the query completely. However, passing a value of QString("")
2466 will set the query to an empty value, as if the original URL
2467 had a lone "?".
2468
2469 The \a query data is interpreted according to \a mode: in StrictMode,
2470 any '%' characters must be followed by exactly two hexadecimal characters
2471 and some characters (including space) are not allowed in undecoded form. In
2472 TolerantMode, all characters are accepted in undecoded form and the
2473 tolerant parser will correct stray '%' not followed by two hex characters.
2474 In DecodedMode, '%' stand for themselves and encoded characters are not
2475 possible.
2476
2477 Query strings often contain percent-encoded sequences, so use of
2478 DecodedMode is discouraged. One special sequence to be aware of is that of
2479 the plus character ('+'). QUrl does not convert spaces to plus characters,
2480 even though HTML forms posted by web browsers do. In order to represent an
2481 actual plus character in a query, the sequence "%2B" is usually used. This
2482 function will leave "%2B" sequences untouched in TolerantMode or
2483 StrictMode.
2484
2485 \sa query(), hasQuery()
2486*/
2487void QUrl::setQuery(const QString &query, ParsingMode mode)
2488{
2489 detach();
2490 d->clearError();
2491
2492 QString data = query;
2493 if (mode == DecodedMode) {
2494 parseDecodedComponent(data);
2495 mode = TolerantMode;
2496 }
2497
2498 d->setQuery(value: data, from: 0, iend: data.size());
2499 if (query.isNull())
2500 d->sectionIsPresent &= ~QUrlPrivate::Query;
2501 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Query, input: query))
2502 d->query.clear();
2503}
2504
2505/*!
2506 \overload
2507 \since 5.0
2508 Sets the query string of the URL to \a query.
2509
2510 This function reconstructs the query string from the QUrlQuery object and
2511 sets on this QUrl object. This function does not have parsing parameters
2512 because the QUrlQuery contains data that is already parsed.
2513
2514 \sa query(), hasQuery()
2515*/
2516void QUrl::setQuery(const QUrlQuery &query)
2517{
2518 detach();
2519 d->clearError();
2520
2521 // we know the data is in the right format
2522 d->query = query.toString();
2523 if (query.isEmpty())
2524 d->sectionIsPresent &= ~QUrlPrivate::Query;
2525 else
2526 d->sectionIsPresent |= QUrlPrivate::Query;
2527}
2528
2529/*!
2530 Returns the query string of the URL if there's a query string, or an empty
2531 result if not. To determine if the parsed URL contained a query string, use
2532 hasQuery().
2533
2534 The \a options argument controls how to format the query component. All
2535 values produce an unambiguous result. With QUrl::FullyDecoded, all
2536 percent-encoded sequences are decoded; otherwise, the returned value may
2537 contain some percent-encoded sequences for some control sequences not
2538 representable in decoded form in QString.
2539
2540 Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
2541 often contain data that is supposed to remain percent-encoded, including
2542 the use of the "%2B" sequence to represent a plus character ('+').
2543
2544 \sa setQuery(), hasQuery()
2545*/
2546QString QUrl::query(ComponentFormattingOptions options) const
2547{
2548 QString result;
2549 if (d) {
2550 d->appendQuery(appendTo&: result, options, appendingTo: QUrlPrivate::Query);
2551 if (d->hasQuery() && result.isNull())
2552 result.detach();
2553 }
2554 return result;
2555}
2556
2557/*!
2558 Sets the fragment of the URL to \a fragment. The fragment is the
2559 last part of the URL, represented by a '#' followed by a string of
2560 characters. It is typically used in HTTP for referring to a
2561 certain link or point on a page:
2562
2563 \image qurl-fragment.png
2564
2565 The fragment is sometimes also referred to as the URL "reference".
2566
2567 Passing an argument of QString() (a null QString) will unset the fragment.
2568 Passing an argument of QString("") (an empty but not null QString) will set the
2569 fragment to an empty string (as if the original URL had a lone "#").
2570
2571 The \a fragment data is interpreted according to \a mode: in StrictMode,
2572 any '%' characters must be followed by exactly two hexadecimal characters
2573 and some characters (including space) are not allowed in undecoded form. In
2574 TolerantMode, all characters are accepted in undecoded form and the
2575 tolerant parser will correct stray '%' not followed by two hex characters.
2576 In DecodedMode, '%' stand for themselves and encoded characters are not
2577 possible.
2578
2579 QUrl::DecodedMode should be used when setting the fragment from a data
2580 source which is not a URL or with a fragment obtained by calling
2581 fragment() with the QUrl::FullyDecoded formatting option.
2582
2583 \sa fragment(), hasFragment()
2584*/
2585void QUrl::setFragment(const QString &fragment, ParsingMode mode)
2586{
2587 detach();
2588 d->clearError();
2589
2590 QString data = fragment;
2591 if (mode == DecodedMode) {
2592 parseDecodedComponent(data);
2593 mode = TolerantMode;
2594 }
2595
2596 d->setFragment(value: data, from: 0, end: data.size());
2597 if (fragment.isNull())
2598 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2599 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Fragment, input: fragment))
2600 d->fragment.clear();
2601}
2602
2603/*!
2604 Returns the fragment of the URL. To determine if the parsed URL contained a
2605 fragment, use hasFragment().
2606
2607 The \a options argument controls how to format the fragment component. All
2608 values produce an unambiguous result. With QUrl::FullyDecoded, all
2609 percent-encoded sequences are decoded; otherwise, the returned value may
2610 contain some percent-encoded sequences for some control sequences not
2611 representable in decoded form in QString.
2612
2613 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2614 sequences are present. It is recommended to use that value when the result
2615 will be used in a non-URL context.
2616
2617 \sa setFragment(), hasFragment()
2618*/
2619QString QUrl::fragment(ComponentFormattingOptions options) const
2620{
2621 QString result;
2622 if (d) {
2623 d->appendFragment(appendTo&: result, options, appendingTo: QUrlPrivate::Fragment);
2624 if (d->hasFragment() && result.isNull())
2625 result.detach();
2626 }
2627 return result;
2628}
2629
2630/*!
2631 \since 4.2
2632
2633 Returns \c true if this URL contains a fragment (i.e., if # was seen on it).
2634
2635 \sa fragment(), setFragment()
2636*/
2637bool QUrl::hasFragment() const
2638{
2639 if (!d) return false;
2640 return d->hasFragment();
2641}
2642
2643/*!
2644 Returns the result of the merge of this URL with \a relative. This
2645 URL is used as a base to convert \a relative to an absolute URL.
2646
2647 If \a relative is not a relative URL, this function will return \a
2648 relative directly. Otherwise, the paths of the two URLs are
2649 merged, and the new URL returned has the scheme and authority of
2650 the base URL, but with the merged path, as in the following
2651 example:
2652
2653 \snippet code/src_corelib_io_qurl.cpp 5
2654
2655 Calling resolved() with ".." returns a QUrl whose directory is
2656 one level higher than the original. Similarly, calling resolved()
2657 with "../.." removes two levels from the path. If \a relative is
2658 "/", the path becomes "/".
2659
2660 \sa isRelative()
2661*/
2662QUrl QUrl::resolved(const QUrl &relative) const
2663{
2664 if (!d) return relative;
2665 if (!relative.d) return *this;
2666
2667 QUrl t;
2668 if (!relative.d->scheme.isEmpty()) {
2669 t = relative;
2670 t.detach();
2671 } else {
2672 if (relative.d->hasAuthority()) {
2673 t = relative;
2674 t.detach();
2675 } else {
2676 t.d = new QUrlPrivate;
2677
2678 // copy the authority
2679 t.d->userName = d->userName;
2680 t.d->password = d->password;
2681 t.d->host = d->host;
2682 t.d->port = d->port;
2683 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
2684
2685 if (relative.d->path.isEmpty()) {
2686 t.d->path = d->path;
2687 if (relative.d->hasQuery()) {
2688 t.d->query = relative.d->query;
2689 t.d->sectionIsPresent |= QUrlPrivate::Query;
2690 } else if (d->hasQuery()) {
2691 t.d->query = d->query;
2692 t.d->sectionIsPresent |= QUrlPrivate::Query;
2693 }
2694 } else {
2695 t.d->path = relative.d->path.startsWith(c: u'/')
2696 ? relative.d->path
2697 : d->mergePaths(relativePath: relative.d->path);
2698 if (relative.d->hasQuery()) {
2699 t.d->query = relative.d->query;
2700 t.d->sectionIsPresent |= QUrlPrivate::Query;
2701 }
2702 }
2703 }
2704 t.d->scheme = d->scheme;
2705 if (d->hasScheme())
2706 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
2707 else
2708 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
2709 t.d->flags |= d->flags & QUrlPrivate::IsLocalFile;
2710 }
2711 t.d->fragment = relative.d->fragment;
2712 if (relative.d->hasFragment())
2713 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
2714 else
2715 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2716
2717 qt_normalizePathSegments(
2718 path: &t.d->path,
2719 flags: isLocalFile() ? QDirPrivate::KeepLocalTrailingSlash : QDirPrivate::RemotePath);
2720 if (!t.d->hasAuthority())
2721 fixupNonAuthorityPath(path: &t.d->path);
2722
2723#if defined(QURL_DEBUG)
2724 qDebug("QUrl(\"%ls\").resolved(\"%ls\") = \"%ls\"",
2725 qUtf16Printable(url()),
2726 qUtf16Printable(relative.url()),
2727 qUtf16Printable(t.url()));
2728#endif
2729 return t;
2730}
2731
2732/*!
2733 Returns \c true if the URL is relative; otherwise returns \c false. A URL is
2734 relative reference if its scheme is undefined; this function is therefore
2735 equivalent to calling scheme().isEmpty().
2736
2737 Relative references are defined in RFC 3986 section 4.2.
2738
2739 \sa {Relative URLs vs Relative Paths}
2740*/
2741bool QUrl::isRelative() const
2742{
2743 if (!d) return true;
2744 return !d->hasScheme();
2745}
2746
2747/*!
2748 Returns a string representation of the URL. The output can be customized by
2749 passing flags with \a options. The option QUrl::FullyDecoded is not
2750 permitted in this function since it would generate ambiguous data.
2751
2752 The resulting QString can be passed back to a QUrl later on.
2753
2754 Synonym for toString(options).
2755
2756 \sa FormattingOptions, toEncoded(), toString()
2757*/
2758QString QUrl::url(FormattingOptions options) const
2759{
2760 return toString(options);
2761}
2762
2763/*!
2764 Returns a string representation of the URL. The output can be customized by
2765 passing flags with \a options. The option QUrl::FullyDecoded is not
2766 permitted in this function since it would generate ambiguous data.
2767
2768 The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
2769
2770 \sa FormattingOptions, url(), setUrl()
2771*/
2772QString QUrl::toString(FormattingOptions options) const
2773{
2774 QString url;
2775 if (!isValid()) {
2776 // also catches isEmpty()
2777 return url;
2778 }
2779 if ((options & QUrl::FullyDecoded) == QUrl::FullyDecoded) {
2780 qWarning(msg: "QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
2781 options &= ~QUrl::FullyDecoded;
2782 //options |= QUrl::PrettyDecoded; // no-op, value is 0
2783 }
2784
2785 // return just the path if:
2786 // - QUrl::PreferLocalFile is passed
2787 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
2788 // - there's no query or fragment to return
2789 // that is, either they aren't present, or we're removing them
2790 // - it's a local file
2791 if (options.testFlag(f: QUrl::PreferLocalFile) && !options.testFlag(f: QUrl::RemovePath)
2792 && (!d->hasQuery() || options.testFlag(f: QUrl::RemoveQuery))
2793 && (!d->hasFragment() || options.testFlag(f: QUrl::RemoveFragment))
2794 && isLocalFile()) {
2795 url = d->toLocalFile(options: options | QUrl::FullyDecoded);
2796 return url;
2797 }
2798
2799 // for the full URL, we consider that the reserved characters are prettier if encoded
2800 if (options & DecodeReserved)
2801 options &= ~EncodeReserved;
2802 else
2803 options |= EncodeReserved;
2804
2805 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
2806 url += d->scheme + u':';
2807
2808 bool pathIsAbsolute = d->path.startsWith(c: u'/');
2809 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
2810 url += "//"_L1;
2811 d->appendAuthority(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2812 } else if (isLocalFile() && pathIsAbsolute) {
2813 // Comply with the XDG file URI spec, which requires triple slashes.
2814 url += "//"_L1;
2815 }
2816
2817 if (!(options & QUrl::RemovePath))
2818 d->appendPath(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2819
2820 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
2821 url += u'?';
2822 d->appendQuery(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2823 }
2824 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
2825 url += u'#';
2826 d->appendFragment(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2827 }
2828
2829 return url;
2830}
2831
2832/*!
2833 \since 5.0
2834
2835 Returns a human-displayable string representation of the URL.
2836 The output can be customized by passing flags with \a options.
2837 The option RemovePassword is always enabled, since passwords
2838 should never be shown back to users.
2839
2840 With the default options, the resulting QString can be passed back
2841 to a QUrl later on, but any password that was present initially will
2842 be lost.
2843
2844 \sa FormattingOptions, toEncoded(), toString()
2845*/
2846
2847QString QUrl::toDisplayString(FormattingOptions options) const
2848{
2849 return toString(options: options | RemovePassword);
2850}
2851
2852/*!
2853 \since 5.2
2854
2855 Returns an adjusted version of the URL.
2856 The output can be customized by passing flags with \a options.
2857
2858 The encoding options from QUrl::ComponentFormattingOption don't make
2859 much sense for this method, nor does QUrl::PreferLocalFile.
2860
2861 This is always equivalent to QUrl(url.toString(options)).
2862
2863 \sa FormattingOptions, toEncoded(), toString()
2864*/
2865QUrl QUrl::adjusted(QUrl::FormattingOptions options) const
2866{
2867 if (!isValid()) {
2868 // also catches isEmpty()
2869 return QUrl();
2870 }
2871 QUrl that = *this;
2872 if (options & RemoveScheme)
2873 that.setScheme(QString());
2874 if ((options & RemoveAuthority) == RemoveAuthority) {
2875 that.setAuthority(authority: QString());
2876 } else {
2877 if ((options & RemoveUserInfo) == RemoveUserInfo)
2878 that.setUserInfo(userInfo: QString());
2879 else if (options & RemovePassword)
2880 that.setPassword(password: QString());
2881 if (options & RemovePort)
2882 that.setPort(-1);
2883 }
2884 if (options & RemoveQuery)
2885 that.setQuery(query: QString());
2886 if (options & RemoveFragment)
2887 that.setFragment(fragment: QString());
2888 if (options & RemovePath) {
2889 that.setPath(path: QString());
2890 } else if (options & (StripTrailingSlash | RemoveFilename | NormalizePathSegments)) {
2891 that.detach();
2892 QString path;
2893 d->appendPath(appendTo&: path, options: options | FullyEncoded, appendingTo: QUrlPrivate::Path);
2894 that.d->setPath(value: path, from: 0, end: path.size());
2895 }
2896 return that;
2897}
2898
2899/*!
2900 Returns the encoded representation of the URL if it's valid;
2901 otherwise an empty QByteArray is returned. The output can be
2902 customized by passing flags with \a options.
2903
2904 The user info, path and fragment are all converted to UTF-8, and
2905 all non-ASCII characters are then percent encoded. The host name
2906 is encoded using Punycode.
2907*/
2908QByteArray QUrl::toEncoded(FormattingOptions options) const
2909{
2910 options &= ~(FullyDecoded | FullyEncoded);
2911 return toString(options: options | FullyEncoded).toLatin1();
2912}
2913
2914/*!
2915 Parses \a input and returns the corresponding QUrl. \a input is
2916 assumed to be in encoded form, containing only ASCII characters.
2917
2918 Parses the URL using \a mode. See setUrl() for more information on
2919 this parameter. QUrl::DecodedMode is not permitted in this context.
2920
2921 \note In Qt versions prior to 6.7, this function took a QByteArray, not
2922 QByteArrayView. If you experience compile errors, it's because your code
2923 is passing objects that are implicitly convertible to QByteArray, but not
2924 QByteArrayView. Wrap the corresponding argument in \c{QByteArray{~~~}} to
2925 make the cast explicit. This is backwards-compatible with old Qt versions.
2926
2927 \sa toEncoded(), setUrl()
2928*/
2929QUrl QUrl::fromEncoded(QByteArrayView input, ParsingMode mode)
2930{
2931 return QUrl(QString::fromUtf8(utf8: input), mode);
2932}
2933
2934/*!
2935 Returns a decoded copy of \a input. \a input is first decoded from
2936 percent encoding, then converted from UTF-8 to unicode.
2937
2938 \note Given invalid input (such as a string containing the sequence "%G5",
2939 which is not a valid hexadecimal number) the output will be invalid as
2940 well. As an example: the sequence "%G5" could be decoded to 'W'.
2941*/
2942QString QUrl::fromPercentEncoding(const QByteArray &input)
2943{
2944 QByteArray ba = QByteArray::fromPercentEncoding(pctEncoded: input);
2945 return QString::fromUtf8(utf8: ba, size: ba.size());
2946}
2947
2948/*!
2949 Returns an encoded copy of \a input. \a input is first converted
2950 to UTF-8, and all ASCII-characters that are not in the unreserved group
2951 are percent encoded. To prevent characters from being percent encoded
2952 pass them to \a exclude. To force characters to be percent encoded pass
2953 them to \a include.
2954
2955 Unreserved is defined as:
2956 \tt {ALPHA / DIGIT / "-" / "." / "_" / "~"}
2957
2958 \snippet code/src_corelib_io_qurl.cpp 6
2959*/
2960QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
2961{
2962 return input.toUtf8().toPercentEncoding(exclude, include);
2963}
2964
2965/*!
2966 \since 6.3
2967
2968 Returns the Unicode form of the given domain name
2969 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
2970 The output can be customized by passing flags with \a options.
2971 The result of this function is considered equivalent to \a domain.
2972
2973 If the value in \a domain cannot be encoded, it will be converted
2974 to QString and returned.
2975
2976 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2977 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
2978 of the Internationalizing Domain Names in Applications (IDNA) specification,
2979 which allows for domain names (like \c "example.com") to be written using
2980 non-US-ASCII characters.
2981*/
2982QString QUrl::fromAce(const QByteArray &domain, QUrl::AceProcessingOptions options)
2983{
2984 return qt_ACE_do(domain: QString::fromLatin1(ba: domain), op: NormalizeAce,
2985 dot: ForbidLeadingDot /*FIXME: make configurable*/, options);
2986}
2987
2988/*!
2989 \since 6.3
2990
2991 Returns the ASCII Compatible Encoding of the given domain name \a domain.
2992 The output can be customized by passing flags with \a options.
2993 The result of this function is considered equivalent to \a domain.
2994
2995 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
2996 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
2997 of the Internationalizing Domain Names in Applications (IDNA) specification,
2998 which allows for domain names (like \c "example.com") to be written using
2999 non-US-ASCII characters.
3000
3001 This function returns an empty QByteArray if \a domain is not a valid
3002 hostname. Note, in particular, that IPv6 literals are not valid domain
3003 names.
3004*/
3005QByteArray QUrl::toAce(const QString &domain, AceProcessingOptions options)
3006{
3007 return qt_ACE_do(domain, op: ToAceOnly, dot: ForbidLeadingDot /*FIXME: make configurable*/, options)
3008 .toLatin1();
3009}
3010
3011/*!
3012 \internal
3013
3014 \fn bool QUrl::operator<(const QUrl &lhs, const QUrl &rhs)
3015
3016 Returns \c true if URL \a lhs is "less than" URL \a rhs. This
3017 provides a means of ordering URLs.
3018*/
3019
3020Qt::weak_ordering compareThreeWay(const QUrl &lhs, const QUrl &rhs)
3021{
3022 if (!lhs.d || !rhs.d) {
3023 bool thisIsEmpty = !lhs.d || lhs.d->isEmpty();
3024 bool thatIsEmpty = !rhs.d || rhs.d->isEmpty();
3025
3026 // sort an empty URL first
3027 if (thisIsEmpty) {
3028 if (!thatIsEmpty)
3029 return Qt::weak_ordering::less;
3030 else
3031 return Qt::weak_ordering::equivalent;
3032 } else {
3033 return Qt::weak_ordering::greater;
3034 }
3035 }
3036
3037 int cmp;
3038 cmp = lhs.d->scheme.compare(s: rhs.d->scheme);
3039 if (cmp != 0)
3040 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3041
3042 cmp = lhs.d->userName.compare(s: rhs.d->userName);
3043 if (cmp != 0)
3044 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3045
3046 cmp = lhs.d->password.compare(s: rhs.d->password);
3047 if (cmp != 0)
3048 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3049
3050 cmp = lhs.d->host.compare(s: rhs.d->host);
3051 if (cmp != 0)
3052 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3053
3054 if (lhs.d->port != rhs.d->port)
3055 return Qt::compareThreeWay(lhs: lhs.d->port, rhs: rhs.d->port);
3056
3057 cmp = lhs.d->path.compare(s: rhs.d->path);
3058 if (cmp != 0)
3059 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3060
3061 if (lhs.d->hasQuery() != rhs.d->hasQuery())
3062 return rhs.d->hasQuery() ? Qt::weak_ordering::less : Qt::weak_ordering::greater;
3063
3064 cmp = lhs.d->query.compare(s: rhs.d->query);
3065 if (cmp != 0)
3066 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3067
3068 if (lhs.d->hasFragment() != rhs.d->hasFragment())
3069 return rhs.d->hasFragment() ? Qt::weak_ordering::less : Qt::weak_ordering::greater;
3070
3071 cmp = lhs.d->fragment.compare(s: rhs.d->fragment);
3072 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3073}
3074
3075/*!
3076 \fn bool QUrl::operator==(const QUrl &lhs, const QUrl &rhs)
3077
3078 Returns \c true if \a lhs and \a rhs URLs are equivalent;
3079 otherwise returns \c false.
3080
3081 \sa matches()
3082*/
3083
3084bool comparesEqual(const QUrl &lhs, const QUrl &rhs)
3085{
3086 if (!lhs.d && !rhs.d)
3087 return true;
3088 if (!lhs.d)
3089 return rhs.d->isEmpty();
3090 if (!rhs.d)
3091 return lhs.d->isEmpty();
3092
3093 // First, compare which sections are present, since it speeds up the
3094 // processing considerably. We just have to ignore the host-is-present flag
3095 // for local files (the "file" protocol), due to the requirements of the
3096 // XDG file URI specification.
3097 int mask = QUrlPrivate::FullUrl;
3098 if (lhs.isLocalFile())
3099 mask &= ~QUrlPrivate::Host;
3100 return (lhs.d->sectionIsPresent & mask) == (rhs.d->sectionIsPresent & mask) &&
3101 lhs.d->scheme == rhs.d->scheme &&
3102 lhs.d->userName == rhs.d->userName &&
3103 lhs.d->password == rhs.d->password &&
3104 lhs.d->host == rhs.d->host &&
3105 lhs.d->port == rhs.d->port &&
3106 lhs.d->path == rhs.d->path &&
3107 lhs.d->query == rhs.d->query &&
3108 lhs.d->fragment == rhs.d->fragment;
3109}
3110
3111/*!
3112 \since 5.2
3113
3114 Returns \c true if this URL and the given \a url are equal after
3115 applying \a options to both; otherwise returns \c false.
3116
3117 This is equivalent to calling adjusted(options) on both URLs
3118 and comparing the resulting urls, but faster.
3119
3120*/
3121bool QUrl::matches(const QUrl &url, FormattingOptions options) const
3122{
3123 if (!d && !url.d)
3124 return true;
3125 if (!d)
3126 return url.d->isEmpty();
3127 if (!url.d)
3128 return d->isEmpty();
3129
3130 // First, compare which sections are present, since it speeds up the
3131 // processing considerably. We just have to ignore the host-is-present flag
3132 // for local files (the "file" protocol), due to the requirements of the
3133 // XDG file URI specification.
3134 int mask = QUrlPrivate::FullUrl;
3135 if (isLocalFile())
3136 mask &= ~QUrlPrivate::Host;
3137
3138 if (options.testFlag(f: QUrl::RemoveScheme))
3139 mask &= ~QUrlPrivate::Scheme;
3140 else if (d->scheme != url.d->scheme)
3141 return false;
3142
3143 if (options.testFlag(f: QUrl::RemovePassword))
3144 mask &= ~QUrlPrivate::Password;
3145 else if (d->password != url.d->password)
3146 return false;
3147
3148 if (options.testFlag(f: QUrl::RemoveUserInfo))
3149 mask &= ~QUrlPrivate::UserName;
3150 else if (d->userName != url.d->userName)
3151 return false;
3152
3153 if (options.testFlag(f: QUrl::RemovePort))
3154 mask &= ~QUrlPrivate::Port;
3155 else if (d->port != url.d->port)
3156 return false;
3157
3158 if (options.testFlag(f: QUrl::RemoveAuthority))
3159 mask &= ~QUrlPrivate::Host;
3160 else if (d->host != url.d->host)
3161 return false;
3162
3163 if (options.testFlag(f: QUrl::RemoveQuery))
3164 mask &= ~QUrlPrivate::Query;
3165 else if (d->query != url.d->query)
3166 return false;
3167
3168 if (options.testFlag(f: QUrl::RemoveFragment))
3169 mask &= ~QUrlPrivate::Fragment;
3170 else if (d->fragment != url.d->fragment)
3171 return false;
3172
3173 if ((d->sectionIsPresent & mask) != (url.d->sectionIsPresent & mask))
3174 return false;
3175
3176 if (options.testFlag(f: QUrl::RemovePath))
3177 return true;
3178
3179 // Compare paths, after applying path-related options
3180 QString path1;
3181 d->appendPath(appendTo&: path1, options, appendingTo: QUrlPrivate::Path);
3182 QString path2;
3183 url.d->appendPath(appendTo&: path2, options, appendingTo: QUrlPrivate::Path);
3184 return path1 == path2;
3185}
3186
3187/*!
3188 \fn bool QUrl::operator !=(const QUrl &lhs, const QUrl &rhs)
3189
3190 Returns \c true if \a lhs and \a rhs URLs are not equal;
3191 otherwise returns \c false.
3192
3193 \sa matches()
3194*/
3195
3196/*!
3197 Assigns the specified \a url to this object.
3198*/
3199QUrl &QUrl::operator =(const QUrl &url) noexcept
3200{
3201 if (!d) {
3202 if (url.d) {
3203 url.d->ref.ref();
3204 d = url.d;
3205 }
3206 } else {
3207 if (url.d)
3208 qAtomicAssign(d, x: url.d);
3209 else
3210 clear();
3211 }
3212 return *this;
3213}
3214
3215/*!
3216 Assigns the specified \a url to this object.
3217*/
3218QUrl &QUrl::operator =(const QString &url)
3219{
3220 if (url.isEmpty()) {
3221 clear();
3222 } else {
3223 detach();
3224 d->parse(url, parsingMode: TolerantMode);
3225 }
3226 return *this;
3227}
3228
3229/*!
3230 \fn void QUrl::swap(QUrl &other)
3231 \since 4.8
3232
3233 Swaps URL \a other with this URL. This operation is very
3234 fast and never fails.
3235*/
3236
3237/*!
3238 \internal
3239
3240 Forces a detach.
3241*/
3242void QUrl::detach()
3243{
3244 if (!d)
3245 d = new QUrlPrivate;
3246 else
3247 qAtomicDetach(d);
3248}
3249
3250/*!
3251 \internal
3252*/
3253bool QUrl::isDetached() const
3254{
3255 return !d || d->ref.loadRelaxed() == 1;
3256}
3257
3258static QString fromNativeSeparators(const QString &pathName)
3259{
3260#if defined(Q_OS_WIN)
3261 QString result(pathName);
3262 const QChar nativeSeparator = u'\\';
3263 auto i = result.indexOf(nativeSeparator);
3264 if (i != -1) {
3265 QChar * const data = result.data();
3266 const auto length = result.length();
3267 for (; i < length; ++i) {
3268 if (data[i] == nativeSeparator)
3269 data[i] = u'/';
3270 }
3271 }
3272 return result;
3273#else
3274 return pathName;
3275#endif
3276}
3277
3278/*!
3279 Returns a QUrl representation of \a localFile, interpreted as a local
3280 file. This function accepts paths separated by slashes as well as the
3281 native separator for this platform.
3282
3283 This function also accepts paths with a doubled leading slash (or
3284 backslash) to indicate a remote file, as in
3285 "//servername/path/to/file.txt". Note that only certain platforms can
3286 actually open this file using QFile::open().
3287
3288 An empty \a localFile leads to an empty URL (since Qt 5.4).
3289
3290 \snippet code/src_corelib_io_qurl.cpp 16
3291
3292 In the first line in snippet above, a file URL is constructed from a
3293 local, relative path. A file URL with a relative path only makes sense
3294 if there is a base URL to resolve it against. For example:
3295
3296 \snippet code/src_corelib_io_qurl.cpp 17
3297
3298 To resolve such a URL, it's necessary to remove the scheme beforehand:
3299
3300 \snippet code/src_corelib_io_qurl.cpp 18
3301
3302 For this reason, it is better to use a relative URL (that is, no scheme)
3303 for relative file paths:
3304
3305 \snippet code/src_corelib_io_qurl.cpp 19
3306
3307 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
3308*/
3309QUrl QUrl::fromLocalFile(const QString &localFile)
3310{
3311 QUrl url;
3312 if (localFile.isEmpty())
3313 return url;
3314 QString scheme = fileScheme();
3315 QString deslashified = fromNativeSeparators(pathName: localFile);
3316
3317 // magic for drives on windows
3318 if (deslashified.size() > 1 && deslashified.at(i: 1) == u':' && deslashified.at(i: 0) != u'/') {
3319 deslashified.prepend(c: u'/');
3320 } else if (deslashified.startsWith(s: "//"_L1)) {
3321 // magic for shared drive on windows
3322 qsizetype indexOfPath = deslashified.indexOf(c: u'/', from: 2);
3323 QStringView hostSpec = QStringView{deslashified}.mid(pos: 2, n: indexOfPath - 2);
3324 // Check for Windows-specific WebDAV specification: "//host@SSL/path".
3325 if (hostSpec.endsWith(s: webDavSslTag(), cs: Qt::CaseInsensitive)) {
3326 hostSpec.truncate(n: hostSpec.size() - 4);
3327 scheme = webDavScheme();
3328 }
3329
3330 // hosts can't be IPv6 addresses without [], so we can use QUrlPrivate::setHost
3331 url.detach();
3332 if (!url.d->setHost(value: hostSpec.toString(), from: 0, iend: hostSpec.size(), mode: StrictMode)) {
3333 if (url.d->error->code != QUrlPrivate::InvalidRegNameError)
3334 return url;
3335
3336 // Path hostname is not a valid URL host, so set it entirely in the path
3337 // (by leaving deslashified unchanged)
3338 } else if (indexOfPath > 2) {
3339 deslashified = deslashified.right(n: deslashified.size() - indexOfPath);
3340 } else {
3341 deslashified.clear();
3342 }
3343 }
3344
3345 url.setScheme(scheme);
3346 url.setPath(path: deslashified, mode: DecodedMode);
3347 return url;
3348}
3349
3350/*!
3351 Returns the path of this URL formatted as a local file path. The path
3352 returned will use forward slashes, even if it was originally created
3353 from one with backslashes.
3354
3355 If this URL contains a non-empty hostname, it will be encoded in the
3356 returned value in the form found on SMB networks (for example,
3357 "//servername/path/to/file.txt").
3358
3359 \snippet code/src_corelib_io_qurl.cpp 20
3360
3361 Note: if the path component of this URL contains a non-UTF-8 binary
3362 sequence (such as %80), the behaviour of this function is undefined.
3363
3364 \sa fromLocalFile(), isLocalFile()
3365*/
3366QString QUrl::toLocalFile() const
3367{
3368 // the call to isLocalFile() also ensures that we're parsed
3369 if (!isLocalFile())
3370 return QString();
3371
3372 return d->toLocalFile(options: QUrl::FullyDecoded);
3373}
3374
3375/*!
3376 \since 4.8
3377 Returns \c true if this URL is pointing to a local file path. A URL is a
3378 local file path if the scheme is "file".
3379
3380 Note that this function considers URLs with hostnames to be local file
3381 paths, even if the eventual file path cannot be opened with
3382 QFile::open().
3383
3384 \sa fromLocalFile(), toLocalFile()
3385*/
3386bool QUrl::isLocalFile() const
3387{
3388 return d && d->isLocalFile();
3389}
3390
3391/*!
3392 Returns \c true if this URL is a parent of \a childUrl. \a childUrl is a child
3393 of this URL if the two URLs share the same scheme and authority,
3394 and this URL's path is a parent of the path of \a childUrl.
3395*/
3396bool QUrl::isParentOf(const QUrl &childUrl) const
3397{
3398 QString childPath = childUrl.path();
3399
3400 if (!d)
3401 return ((childUrl.scheme().isEmpty())
3402 && (childUrl.authority().isEmpty())
3403 && childPath.size() > 0 && childPath.at(i: 0) == u'/');
3404
3405 QString ourPath = path();
3406
3407 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
3408 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
3409 && childPath.startsWith(s: ourPath)
3410 && ((ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size())
3411 || (!ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size()
3412 && childPath.at(i: ourPath.size()) == u'/')));
3413}
3414
3415
3416#ifndef QT_NO_DATASTREAM
3417/*! \relates QUrl
3418
3419 Writes url \a url to the stream \a out and returns a reference
3420 to the stream.
3421
3422 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3423*/
3424QDataStream &operator<<(QDataStream &out, const QUrl &url)
3425{
3426 QByteArray u;
3427 if (url.isValid())
3428 u = url.toEncoded();
3429 out << u;
3430 return out;
3431}
3432
3433/*! \relates QUrl
3434
3435 Reads a url into \a url from the stream \a in and returns a
3436 reference to the stream.
3437
3438 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3439*/
3440QDataStream &operator>>(QDataStream &in, QUrl &url)
3441{
3442 QByteArray u;
3443 in >> u;
3444 url.setUrl(url: QString::fromLatin1(ba: u));
3445 return in;
3446}
3447#endif // QT_NO_DATASTREAM
3448
3449#ifndef QT_NO_DEBUG_STREAM
3450QDebug operator<<(QDebug d, const QUrl &url)
3451{
3452 QDebugStateSaver saver(d);
3453 d.nospace() << "QUrl(" << url.toDisplayString() << ')';
3454 return d;
3455}
3456#endif
3457
3458static QString errorMessage(QUrlPrivate::ErrorCode errorCode, const QString &errorSource, qsizetype errorPosition)
3459{
3460 QChar c = size_t(errorPosition) < size_t(errorSource.size()) ?
3461 errorSource.at(i: errorPosition) : QChar(QChar::Null);
3462
3463 switch (errorCode) {
3464 case QUrlPrivate::NoError:
3465 Q_UNREACHABLE_RETURN(QString()); // QUrl::errorString should have treated this condition
3466
3467 case QUrlPrivate::InvalidSchemeError: {
3468 auto msg = "Invalid scheme (character '%1' not permitted)"_L1;
3469 return msg.arg(args&: c);
3470 }
3471
3472 case QUrlPrivate::InvalidUserNameError:
3473 return "Invalid user name (character '%1' not permitted)"_L1
3474 .arg(args&: c);
3475
3476 case QUrlPrivate::InvalidPasswordError:
3477 return "Invalid password (character '%1' not permitted)"_L1
3478 .arg(args&: c);
3479
3480 case QUrlPrivate::InvalidRegNameError:
3481 if (errorPosition >= 0)
3482 return "Invalid hostname (character '%1' not permitted)"_L1
3483 .arg(args&: c);
3484 else
3485 return QStringLiteral("Invalid hostname (contains invalid characters)");
3486 case QUrlPrivate::InvalidIPv4AddressError:
3487 return QString(); // doesn't happen yet
3488 case QUrlPrivate::InvalidIPv6AddressError:
3489 return QStringLiteral("Invalid IPv6 address");
3490 case QUrlPrivate::InvalidCharacterInIPv6Error:
3491 return "Invalid IPv6 address (character '%1' not permitted)"_L1.arg(args&: c);
3492 case QUrlPrivate::InvalidIPvFutureError:
3493 return "Invalid IPvFuture address (character '%1' not permitted)"_L1.arg(args&: c);
3494 case QUrlPrivate::HostMissingEndBracket:
3495 return QStringLiteral("Expected ']' to match '[' in hostname");
3496
3497 case QUrlPrivate::InvalidPortError:
3498 return QStringLiteral("Invalid port or port number out of range");
3499 case QUrlPrivate::PortEmptyError:
3500 return QStringLiteral("Port field was empty");
3501
3502 case QUrlPrivate::InvalidPathError:
3503 return "Invalid path (character '%1' not permitted)"_L1
3504 .arg(args&: c);
3505
3506 case QUrlPrivate::InvalidQueryError:
3507 return "Invalid query (character '%1' not permitted)"_L1
3508 .arg(args&: c);
3509
3510 case QUrlPrivate::InvalidFragmentError:
3511 return "Invalid fragment (character '%1' not permitted)"_L1
3512 .arg(args&: c);
3513
3514 case QUrlPrivate::AuthorityPresentAndPathIsRelative:
3515 return QStringLiteral("Path component is relative and authority is present");
3516 case QUrlPrivate::AuthorityAbsentAndPathIsDoubleSlash:
3517 return QStringLiteral("Path component starts with '//' and authority is absent");
3518 case QUrlPrivate::RelativeUrlPathContainsColonBeforeSlash:
3519 return QStringLiteral("Relative URL's path component contains ':' before any '/'");
3520 }
3521
3522 Q_UNREACHABLE_RETURN(QString());
3523}
3524
3525static inline void appendComponentIfPresent(QString &msg, bool present, const char *componentName,
3526 const QString &component)
3527{
3528 if (present)
3529 msg += QLatin1StringView(componentName) % u'"' % component % "\","_L1;
3530}
3531
3532/*!
3533 \since 4.2
3534
3535 Returns an error message if the last operation that modified this QUrl
3536 object ran into a parsing error. If no error was detected, this function
3537 returns an empty string and isValid() returns \c true.
3538
3539 The error message returned by this function is technical in nature and may
3540 not be understood by end users. It is mostly useful to developers trying to
3541 understand why QUrl will not accept some input.
3542
3543 \sa QUrl::ParsingMode
3544*/
3545QString QUrl::errorString() const
3546{
3547 QString msg;
3548 if (!d)
3549 return msg;
3550
3551 QString errorSource;
3552 qsizetype errorPosition = 0;
3553 QUrlPrivate::ErrorCode errorCode = d->validityError(source: &errorSource, position: &errorPosition);
3554 if (errorCode == QUrlPrivate::NoError)
3555 return msg;
3556
3557 msg += errorMessage(errorCode, errorSource, errorPosition);
3558 msg += "; source was \""_L1;
3559 msg += errorSource;
3560 msg += "\";"_L1;
3561 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Scheme,
3562 componentName: " scheme = ", component: d->scheme);
3563 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::UserInfo,
3564 componentName: " userinfo = ", component: userInfo());
3565 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Host,
3566 componentName: " host = ", component: d->host);
3567 appendComponentIfPresent(msg, present: d->port != -1,
3568 componentName: " port = ", component: QString::number(d->port));
3569 appendComponentIfPresent(msg, present: !d->path.isEmpty(),
3570 componentName: " path = ", component: d->path);
3571 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Query,
3572 componentName: " query = ", component: d->query);
3573 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Fragment,
3574 componentName: " fragment = ", component: d->fragment);
3575 if (msg.endsWith(c: u','))
3576 msg.chop(n: 1);
3577 return msg;
3578}
3579
3580/*!
3581 \since 5.1
3582
3583 Converts a list of \a urls into a list of QString objects, using toString(\a options).
3584*/
3585QStringList QUrl::toStringList(const QList<QUrl> &urls, FormattingOptions options)
3586{
3587 QStringList lst;
3588 lst.reserve(asize: urls.size());
3589 for (const QUrl &url : urls)
3590 lst.append(t: url.toString(options));
3591 return lst;
3592
3593}
3594
3595/*!
3596 \since 5.1
3597
3598 Converts a list of strings representing \a urls into a list of urls, using QUrl(str, \a mode).
3599 Note that this means all strings must be urls, not for instance local paths.
3600*/
3601QList<QUrl> QUrl::fromStringList(const QStringList &urls, ParsingMode mode)
3602{
3603 QList<QUrl> lst;
3604 lst.reserve(asize: urls.size());
3605 for (const QString &str : urls)
3606 lst.append(t: QUrl(str, mode));
3607 return lst;
3608}
3609
3610/*!
3611 \typedef QUrl::DataPtr
3612 \internal
3613*/
3614
3615/*!
3616 \fn DataPtr &QUrl::data_ptr()
3617 \internal
3618*/
3619
3620/*!
3621 Returns the hash value for the \a url. If specified, \a seed is used to
3622 initialize the hash.
3623
3624 \relates QHash
3625 \since 5.0
3626*/
3627size_t qHash(const QUrl &url, size_t seed) noexcept
3628{
3629 if (!url.d)
3630 return qHash(key: -1, seed); // the hash of an unset port (-1)
3631
3632 return qHash(key: url.d->scheme) ^
3633 qHash(key: url.d->userName) ^
3634 qHash(key: url.d->password) ^
3635 qHash(key: url.d->host) ^
3636 qHash(key: url.d->port, seed) ^
3637 qHash(key: url.d->path) ^
3638 qHash(key: url.d->query) ^
3639 qHash(key: url.d->fragment);
3640}
3641
3642static QUrl adjustFtpPath(QUrl url)
3643{
3644 if (url.scheme() == ftpScheme()) {
3645 QString path = url.path(options: QUrl::PrettyDecoded);
3646 if (path.startsWith(s: "//"_L1))
3647 url.setPath(path: "/%2F"_L1 + QStringView{path}.mid(pos: 2), mode: QUrl::TolerantMode);
3648 }
3649 return url;
3650}
3651
3652static bool isIp6(const QString &text)
3653{
3654 QIPAddressUtils::IPv6Address address;
3655 return !text.isEmpty() && QIPAddressUtils::parseIp6(address, begin: text.begin(), end: text.end()) == nullptr;
3656}
3657
3658/*!
3659 Returns a valid URL from a user supplied \a userInput string if one can be
3660 deduced. In the case that is not possible, an invalid QUrl() is returned.
3661
3662 This allows the user to input a URL or a local file path in the form of a plain
3663 string. This string can be manually typed into a location bar, obtained from
3664 the clipboard, or passed in via command line arguments.
3665
3666 When the string is not already a valid URL, a best guess is performed,
3667 making various assumptions.
3668
3669 In the case the string corresponds to a valid file path on the system,
3670 a file:// URL is constructed, using QUrl::fromLocalFile().
3671
3672 If that is not the case, an attempt is made to turn the string into a
3673 http:// or ftp:// URL. The latter in the case the string starts with
3674 'ftp'. The result is then passed through QUrl's tolerant parser, and
3675 in the case or success, a valid QUrl is returned, or else a QUrl().
3676
3677 \section1 Examples:
3678
3679 \list
3680 \li qt-project.org becomes http://qt-project.org
3681 \li ftp.qt-project.org becomes ftp://ftp.qt-project.org
3682 \li hostname becomes http://hostname
3683 \li /home/user/test.html becomes file:///home/user/test.html
3684 \endlist
3685
3686 In order to be able to handle relative paths, this method takes an optional
3687 \a workingDirectory path. This is especially useful when handling command
3688 line arguments.
3689 If \a workingDirectory is empty, no handling of relative paths will be done.
3690
3691 By default, an input string that looks like a relative path will only be treated
3692 as such if the file actually exists in the given working directory.
3693 If the application can handle files that don't exist yet, it should pass the
3694 flag AssumeLocalFile in \a options.
3695
3696 \since 5.4
3697*/
3698QUrl QUrl::fromUserInput(const QString &userInput, const QString &workingDirectory,
3699 UserInputResolutionOptions options)
3700{
3701 QString trimmedString = userInput.trimmed();
3702
3703 if (trimmedString.isEmpty())
3704 return QUrl();
3705
3706 // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
3707 // and IPv6 addresses can start with "c:" too
3708 if (isIp6(text: trimmedString)) {
3709 QUrl url;
3710 url.setHost(host: trimmedString);
3711 url.setScheme(QStringLiteral("http"));
3712 return url;
3713 }
3714
3715 const QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
3716
3717 // Check for a relative path
3718 if (!workingDirectory.isEmpty()) {
3719 const QFileInfo fileInfo(QDir(workingDirectory), userInput);
3720 if (fileInfo.exists())
3721 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3722
3723 // Check both QUrl::isRelative (to detect full URLs) and QDir::isAbsolutePath (since on Windows drive letters can be interpreted as schemes)
3724 if ((options & AssumeLocalFile) && url.isRelative() && !QDir::isAbsolutePath(path: userInput))
3725 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3726 }
3727
3728 // Check first for files, since on Windows drive letters can be interpreted as schemes
3729 if (QDir::isAbsolutePath(path: trimmedString))
3730 return QUrl::fromLocalFile(localFile: trimmedString);
3731
3732 QUrl urlPrepended = QUrl("http://"_L1 + trimmedString, QUrl::TolerantMode);
3733
3734 // Check the most common case of a valid url with a scheme
3735 // We check if the port would be valid by adding the scheme to handle the case host:port
3736 // where the host would be interpreted as the scheme
3737 if (url.isValid()
3738 && !url.scheme().isEmpty()
3739 && urlPrepended.port() == -1)
3740 return adjustFtpPath(url);
3741
3742 // Else, try the prepended one and adjust the scheme from the host name
3743 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty())) {
3744 qsizetype dotIndex = trimmedString.indexOf(c: u'.');
3745 const QStringView hostscheme = QStringView{trimmedString}.left(n: dotIndex);
3746 if (hostscheme.compare(other: ftpScheme(), cs: Qt::CaseInsensitive) == 0)
3747 urlPrepended.setScheme(ftpScheme());
3748 return adjustFtpPath(url: urlPrepended);
3749 }
3750
3751 return QUrl();
3752}
3753
3754QT_END_NAMESPACE
3755

source code of qtbase/src/corelib/io/qurl.cpp