1// Copyright (C) 2016 The Qt Company Ltd.
2// Copyright (C) 2016 Intel Corporation.
3// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR LGPL-3.0-only OR GPL-2.0-only OR GPL-3.0-only
4// Qt-Security score:critical reason:data-parser
5
6/*!
7 \class QUrl
8 \inmodule QtCore
9
10 \brief The QUrl class provides a convenient interface for working
11 with URLs.
12
13 \reentrant
14 \ingroup io
15 \ingroup network
16 \ingroup shared
17
18 \compares weak
19
20 It can parse and construct URLs in both encoded and unencoded
21 form. QUrl also has support for internationalized domain names
22 (IDNs).
23
24 The most common way to use QUrl is to initialize it via the constructor by
25 passing a QString containing a full URL. QUrl objects can also be created
26 from a QByteArray containing a full URL using QUrl::fromEncoded(), or
27 heuristically from incomplete URLs using QUrl::fromUserInput(). The URL
28 representation can be obtained from a QUrl using either QUrl::toString() or
29 QUrl::toEncoded().
30
31 URLs can be represented in two forms: encoded or unencoded. The
32 unencoded representation is suitable for showing to users, but
33 the encoded representation is typically what you would send to
34 a web server. For example, the unencoded URL
35 "http://bühler.example.com/List of applicants.xml"
36 would be sent to the server as
37 "http://xn--bhler-kva.example.com/List%20of%20applicants.xml".
38
39 A URL can also be constructed piece by piece by calling
40 setScheme(), setUserName(), setPassword(), setHost(), setPort(),
41 setPath(), setQuery() and setFragment(). Some convenience
42 functions are also available: setAuthority() sets the user name,
43 password, host and port. setUserInfo() sets the user name and
44 password at once.
45
46 Call isValid() to check if the URL is valid. This can be done at any point
47 during the constructing of a URL. If isValid() returns \c false, you should
48 clear() the URL before proceeding, or start over by parsing a new URL with
49 setUrl().
50
51 Constructing a query is particularly convenient through the use of the \l
52 QUrlQuery class and its methods QUrlQuery::setQueryItems(),
53 QUrlQuery::addQueryItem() and QUrlQuery::removeQueryItem(). Use
54 QUrlQuery::setQueryDelimiters() to customize the delimiters used for
55 generating the query string.
56
57 For the convenience of generating encoded URL strings or query
58 strings, there are two static functions called
59 fromPercentEncoding() and toPercentEncoding() which deal with
60 percent encoding and decoding of QString objects.
61
62 fromLocalFile() constructs a QUrl by parsing a local
63 file path. toLocalFile() converts a URL to a local file path.
64
65 The human readable representation of the URL is fetched with
66 toString(). This representation is appropriate for displaying a
67 URL to a user in unencoded form. The encoded form however, as
68 returned by toEncoded(), is for internal use, passing to web
69 servers, mail clients and so on. Both forms are technically correct
70 and represent the same URL unambiguously -- in fact, passing either
71 form to QUrl's constructor or to setUrl() will yield the same QUrl
72 object.
73
74 QUrl conforms to the URI specification from
75 \l{RFC 3986} (Uniform Resource Identifier: Generic Syntax), and includes
76 scheme extensions from \l{RFC 1738} (Uniform Resource Locators). Case
77 folding rules in QUrl conform to \l{RFC 3491} (Nameprep: A Stringprep
78 Profile for Internationalized Domain Names (IDN)). It is also compatible with the
79 \l{http://freedesktop.org/wiki/Specifications/file-uri-spec/}{file URI specification}
80 from freedesktop.org, provided that the locale encodes file names using
81 UTF-8 (required by IDN).
82
83 \section2 Relative URLs vs Relative Paths
84
85 Calling isRelative() will return whether or not the URL is relative.
86 A relative URL has no \l {scheme}. For example:
87
88 \snippet code/src_corelib_io_qurl.cpp 8
89
90 Notice that a URL can be absolute while containing a relative path, and
91 vice versa:
92
93 \snippet code/src_corelib_io_qurl.cpp 9
94
95 A relative URL can be resolved by passing it as an argument to resolved(),
96 which returns an absolute URL. isParentOf() is used for determining whether
97 one URL is a parent of another.
98
99 \section2 Error checking
100
101 QUrl is capable of detecting many errors in URLs while parsing it or when
102 components of the URL are set with individual setter methods (like
103 setScheme(), setHost() or setPath()). If the parsing or setter function is
104 successful, any previously recorded error conditions will be discarded.
105
106 By default, QUrl setter methods operate in QUrl::TolerantMode, which means
107 they accept some common mistakes and mis-representation of data. An
108 alternate method of parsing is QUrl::StrictMode, which applies further
109 checks. See QUrl::ParsingMode for a description of the difference of the
110 parsing modes.
111
112 QUrl only checks for conformance with the URL specification. It does not
113 try to verify that high-level protocol URLs are in the format they are
114 expected to be by handlers elsewhere. For example, the following URIs are
115 all considered valid by QUrl, even if they do not make sense when used:
116
117 \list
118 \li "http:/filename.html"
119 \li "mailto://example.com"
120 \endlist
121
122 When the parser encounters an error, it signals the event by making
123 isValid() return false and toString() / toEncoded() return an empty string.
124 If it is necessary to show the user the reason why the URL failed to parse,
125 the error condition can be obtained from QUrl by calling errorString().
126 Note that this message is highly technical and may not make sense to
127 end-users.
128
129 QUrl is capable of recording only one error condition. If more than one
130 error is found, it is undefined which error is reported.
131
132 \section2 Character Conversions
133
134 Follow these rules to avoid erroneous character conversion when
135 dealing with URLs and strings:
136
137 \list
138 \li When creating a QString to contain a URL from a QByteArray or a
139 char*, always use QString::fromUtf8().
140 \endlist
141*/
142
143/*!
144 \enum QUrl::ParsingMode
145
146 The parsing mode controls the way QUrl parses strings.
147
148 \value TolerantMode QUrl will try to correct some common errors in URLs.
149 This mode is useful for parsing URLs coming from sources
150 not known to be strictly standards-conforming.
151
152 \value StrictMode Only valid URLs are accepted. This mode is useful for
153 general URL validation.
154
155 \value DecodedMode QUrl will interpret the URL component in the fully-decoded form,
156 where percent characters stand for themselves, not as the beginning
157 of a percent-encoded sequence. This mode is only valid for the
158 setters setting components of a URL; it is not permitted in
159 the QUrl constructor, in fromEncoded() or in setUrl().
160 For more information on this mode, see the documentation for
161 \l {QUrl::ComponentFormattingOption}{QUrl::FullyDecoded}.
162
163 In TolerantMode, the parser has the following behaviour:
164
165 \list
166
167 \li Spaces and "%20": unencoded space characters will be accepted and will
168 be treated as equivalent to "%20".
169
170 \li Single "%" characters: Any occurrences of a percent character "%" not
171 followed by exactly two hexadecimal characters (e.g., "13% coverage.html")
172 will be replaced by "%25". Note that one lone "%" character will trigger
173 the correction mode for all percent characters.
174
175 \li Reserved and unreserved characters: An encoded URL should only
176 contain a few characters as literals; all other characters should
177 be percent-encoded. In TolerantMode, these characters will be
178 accepted if they are found in the URL:
179 space / double-quote / "<" / ">" / "\" /
180 "^" / "`" / "{" / "|" / "}"
181 Those same characters can be decoded again by passing QUrl::DecodeReserved
182 to toString() or toEncoded(). In the getters of individual components,
183 those characters are often returned in decoded form.
184
185 \endlist
186
187 When in StrictMode, if a parsing error is found, isValid() will return \c
188 false and errorString() will return a message describing the error.
189 If more than one error is detected, it is undefined which error gets
190 reported.
191
192 Note that TolerantMode is not usually enough for parsing user input, which
193 often contains more errors and expectations than the parser can deal with.
194 When dealing with data coming directly from the user -- as opposed to data
195 coming from data-transfer sources, such as other programs -- it is
196 recommended to use fromUserInput().
197
198 \sa fromUserInput(), setUrl(), toString(), toEncoded(), QUrl::FormattingOptions
199*/
200
201/*!
202 \enum QUrl::UrlFormattingOption
203
204 The formatting options define how the URL is formatted when written out
205 as text.
206
207 \value None The format of the URL is unchanged.
208 \value RemoveScheme The scheme is removed from the URL.
209 \value RemovePassword Any password in the URL is removed.
210 \value RemoveUserInfo Any user information in the URL is removed.
211 \value RemovePort Any specified port is removed from the URL.
212 \value RemoveAuthority Remove user name, password, host and port.
213 \value RemovePath The URL's path is removed, leaving only the scheme,
214 host address, and port (if present).
215 \value RemoveQuery The query part of the URL (following a '?' character)
216 is removed.
217 \value RemoveFragment The fragment part of the URL (including the '#' character) is removed.
218 \value RemoveFilename The filename (i.e. everything after the last '/' in the path) is removed.
219 The trailing '/' is kept, unless StripTrailingSlash is set.
220 Only valid if RemovePath is not set.
221 \value PreferLocalFile If the URL is a local file according to isLocalFile()
222 and contains no query or fragment, a local file path is returned.
223 \value StripTrailingSlash The trailing slash is removed from the path, if one is present.
224 \value NormalizePathSegments Modifies the path to remove redundant directory separators,
225 and to resolve "."s and ".."s (as far as possible). For non-local paths, adjacent
226 slashes are preserved.
227
228 Note that the case folding rules in \l{RFC 3491}{Nameprep}, which QUrl
229 conforms to, require host names to always be converted to lower case,
230 regardless of the Qt::FormattingOptions used.
231
232 The options from QUrl::ComponentFormattingOptions are also possible.
233
234 \sa QUrl::ComponentFormattingOptions
235*/
236
237/*!
238 \enum QUrl::ComponentFormattingOption
239 \since 5.0
240
241 The component formatting options define how the components of an URL will
242 be formatted when written out as text. They can be combined with the
243 options from QUrl::FormattingOptions when used in toString() and
244 toEncoded().
245
246 \value PrettyDecoded The component is returned in a "pretty form", with
247 most percent-encoded characters decoded. The exact
248 behavior of PrettyDecoded varies from component to
249 component and may also change from Qt release to Qt
250 release. This is the default.
251
252 \value EncodeSpaces Leave space characters in their encoded form ("%20").
253
254 \value EncodeUnicode Leave non-US-ASCII characters encoded in their UTF-8
255 percent-encoded form (e.g., "%C3%A9" for the U+00E9
256 codepoint, LATIN SMALL LETTER E WITH ACUTE).
257
258 \value EncodeDelimiters Leave certain delimiters in their encoded form, as
259 would appear in the URL when the full URL is
260 represented as text. The delimiters are affected
261 by this option change from component to component.
262 This flag has no effect in toString() or toEncoded().
263
264 \value EncodeReserved Leave US-ASCII characters not permitted in the URL by
265 the specification in their encoded form. This is the
266 default on toString() and toEncoded().
267
268 \value DecodeReserved Decode the US-ASCII characters that the URL specification
269 does not allow to appear in the URL. This is the
270 default on the getters of individual components.
271
272 \value FullyEncoded Leave all characters in their properly-encoded form,
273 as this component would appear as part of a URL. When
274 used with toString(), this produces a fully-compliant
275 URL in QString form, exactly equal to the result of
276 toEncoded()
277
278 \value FullyDecoded Attempt to decode as much as possible. For individual
279 components of the URL, this decodes every percent
280 encoding sequence, including control characters (U+0000
281 to U+001F) and UTF-8 sequences found in percent-encoded form.
282 Use of this mode may cause data loss, see below for more information.
283
284 The values of EncodeReserved and DecodeReserved should not be used together
285 in one call. The behavior is undefined if that happens. They are provided
286 as separate values because the behavior of the "pretty mode" with regards
287 to reserved characters is different on certain components and specially on
288 the full URL.
289
290 \section2 Full decoding
291
292 The FullyDecoded mode is similar to the behavior of the functions returning
293 QString in Qt 4.x, in that every character represents itself and never has
294 any special meaning. This is true even for the percent character ('%'),
295 which should be interpreted to mean a literal percent, not the beginning of
296 a percent-encoded sequence. The same actual character, in all other
297 decoding modes, is represented by the sequence "%25".
298
299 Whenever re-applying data obtained with QUrl::FullyDecoded into a QUrl,
300 care must be taken to use the QUrl::DecodedMode parameter to the setters
301 (like setPath() and setUserName()). Failure to do so may cause
302 re-interpretation of the percent character ('%') as the beginning of a
303 percent-encoded sequence.
304
305 This mode is quite useful when portions of a URL are used in a non-URL
306 context. For example, to extract the username, password or file paths in an
307 FTP client application, the FullyDecoded mode should be used.
308
309 This mode should be used with care, since there are two conditions that
310 cannot be reliably represented in the returned QString. They are:
311
312 \list
313 \li \b{Non-UTF-8 sequences:} URLs may contain sequences of
314 percent-encoded characters that do not form valid UTF-8 sequences. Since
315 URLs need to be decoded using UTF-8, any decoder failure will result in
316 the QString containing one or more replacement characters where the
317 sequence existed.
318
319 \li \b{Encoded delimiters:} URLs are also allowed to make a distinction
320 between a delimiter found in its literal form and its equivalent in
321 percent-encoded form. This is most commonly found in the query, but is
322 permitted in most parts of the URL.
323 \endlist
324
325 The following example illustrates the problem:
326
327 \snippet code/src_corelib_io_qurl.cpp 10
328
329 If the two URLs were used via HTTP GET, the interpretation by the web
330 server would probably be different. In the first case, it would interpret
331 as one parameter, with a key of "q" and value "a+=b&c". In the second
332 case, it would probably interpret as two parameters, one with a key of "q"
333 and value "a =b", and the second with a key "c" and no value.
334
335 \sa QUrl::FormattingOptions
336*/
337
338/*!
339 \enum QUrl::UserInputResolutionOption
340 \since 5.4
341
342 The user input resolution options define how fromUserInput() should
343 interpret strings that could either be a relative path or the short
344 form of a HTTP URL. For instance \c{file.pl} can be either a local file
345 or the URL \c{http://file.pl}.
346
347 \value DefaultResolution The default resolution mechanism is to check
348 whether a local file exists, in the working
349 directory given to fromUserInput, and only
350 return a local path in that case. Otherwise a URL
351 is assumed.
352 \value AssumeLocalFile This option makes fromUserInput() always return
353 a local path unless the input contains a scheme, such as
354 \c{http://file.pl}. This is useful for applications
355 such as text editors, which are able to create
356 the file if it doesn't exist.
357
358 \sa fromUserInput()
359*/
360
361/*!
362 \enum QUrl::AceProcessingOption
363 \since 6.3
364
365 The ACE processing options control the way URLs are transformed to and from
366 ASCII-Compatible Encoding.
367
368 \value IgnoreIDNWhitelist Ignore the IDN whitelist when converting URLs
369 to Unicode.
370 \value AceTransitionalProcessing Use transitional processing described in UTS #46.
371 This allows better compatibility with IDNA 2003
372 specification.
373
374 The default is to use nontransitional processing and to allow non-ASCII
375 characters only inside URLs whose top-level domains are listed in the IDN whitelist.
376
377 \sa toAce(), fromAce(), idnWhitelist()
378*/
379
380/*!
381 \fn QUrl::QUrl(QUrl &&other)
382
383 Move-constructs a QUrl instance, making it point at the same
384 object that \a other was pointing to.
385
386 \since 5.2
387*/
388
389/*!
390 \fn QUrl &QUrl::operator=(QUrl &&other)
391
392 Move-assigns \a other to this QUrl instance.
393
394 \since 5.2
395*/
396
397#include "qurl.h"
398#include "qurl_p.h"
399#include "qplatformdefs.h"
400#include "qstring.h"
401#include "qstringlist.h"
402#include "qdebug.h"
403#include "qhash.h"
404#include "qdatastream.h"
405#include "private/qipaddress_p.h"
406#include "qurlquery.h"
407#include "private/qdir_p.h"
408#include <private/qtools_p.h>
409
410QT_BEGIN_NAMESPACE
411
412using namespace Qt::StringLiterals;
413using namespace QtMiscUtils;
414
415inline static bool isHex(char c)
416{
417 c |= 0x20;
418 return isAsciiDigit(c) || (c >= 'a' && c <= 'f');
419}
420
421static inline QString ftpScheme()
422{
423 return QStringLiteral("ftp");
424}
425
426static inline QString fileScheme()
427{
428 return QStringLiteral("file");
429}
430
431static inline QString webDavScheme()
432{
433 return QStringLiteral("webdavs");
434}
435
436static inline QString webDavSslTag()
437{
438 return QStringLiteral("@SSL");
439}
440
441class QUrlPrivate
442{
443public:
444 enum Section : uchar {
445 Scheme = 0x01,
446 UserName = 0x02,
447 Password = 0x04,
448 UserInfo = UserName | Password,
449 Host = 0x08,
450 Port = 0x10,
451 Authority = UserInfo | Host | Port,
452 Path = 0x20,
453 Hierarchy = Authority | Path,
454 Query = 0x40,
455 Fragment = 0x80,
456 FullUrl = 0xff
457 };
458
459 enum Flags : uchar {
460 IsLocalFile = 0x01
461 };
462
463 enum ErrorCode {
464 // the high byte of the error code matches the Section
465 // the first item in each value must be the generic "Invalid xxx Error"
466 InvalidSchemeError = Scheme << 8,
467
468 InvalidUserNameError = UserName << 8,
469
470 InvalidPasswordError = Password << 8,
471
472 InvalidRegNameError = Host << 8,
473 InvalidIPv4AddressError,
474 InvalidIPv6AddressError,
475 InvalidCharacterInIPv6Error,
476 InvalidIPvFutureError,
477 HostMissingEndBracket,
478
479 InvalidPortError = Port << 8,
480 PortEmptyError,
481
482 InvalidPathError = Path << 8,
483
484 InvalidQueryError = Query << 8,
485
486 InvalidFragmentError = Fragment << 8,
487
488 // the following three cases are only possible in combination with
489 // presence/absence of the path, authority and scheme. See validityError().
490 AuthorityPresentAndPathIsRelative = Authority << 8 | Path << 8 | 0x10000,
491 AuthorityAbsentAndPathIsDoubleSlash,
492 RelativeUrlPathContainsColonBeforeSlash = Scheme << 8 | Authority << 8 | Path << 8 | 0x10000,
493
494 NoError = 0
495 };
496
497 struct Error {
498 QString source;
499 qsizetype position;
500 ErrorCode code;
501 };
502
503 QUrlPrivate();
504 QUrlPrivate(const QUrlPrivate &copy);
505 ~QUrlPrivate();
506
507 void parse(const QString &url, QUrl::ParsingMode parsingMode);
508 bool isEmpty() const
509 { return sectionIsPresent == 0 && port == -1 && path.isEmpty(); }
510
511 std::unique_ptr<Error> cloneError() const;
512 void clearError();
513 void setError(ErrorCode errorCode, const QString &source, qsizetype supplement = -1);
514 ErrorCode validityError(QString *source = nullptr, qsizetype *position = nullptr) const;
515 bool validateComponent(Section section, const QString &input, qsizetype begin, qsizetype end);
516 bool validateComponent(Section section, const QString &input)
517 { return validateComponent(section, input, begin: 0, end: input.size()); }
518
519 // no QString scheme() const;
520 void appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
521 void appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
522 void appendUserName(QString &appendTo, QUrl::FormattingOptions options) const;
523 void appendPassword(QString &appendTo, QUrl::FormattingOptions options) const;
524 void appendHost(QString &appendTo, QUrl::FormattingOptions options) const;
525 void appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
526 void appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
527 void appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const;
528
529 // the "end" parameters are like STL iterators: they point to one past the last valid element
530 bool setScheme(const QString &value, qsizetype len, bool doSetError);
531 void setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
532 template <typename String> void setUserInfo(String &&value, QUrl::ParsingMode mode);
533 template <typename String> void setUserName(String &&value, QUrl::ParsingMode mode);
534 template <typename String> void setPassword(String &&value, QUrl::ParsingMode mode);
535 bool setHost(const QString &value, qsizetype from, qsizetype end, QUrl::ParsingMode mode);
536 template <typename String> void setPath(String &&value, QUrl::ParsingMode mode);
537 template <typename String> void setQuery(String &&value, QUrl::ParsingMode mode);
538 template <typename String> void setFragment(String &&value, QUrl::ParsingMode mode);
539
540 uint presentSections() const noexcept
541 {
542 uint s = sectionIsPresent;
543
544 // We have to ignore the host-is-present flag for local files (the
545 // "file" protocol), due to the requirements of the XDG file URI
546 // specification.
547 if (isLocalFile())
548 s &= ~Host;
549
550 // If the password was set, we must have a username too
551 if (s & Password)
552 s |= UserName;
553
554 return s;
555 }
556
557 inline bool hasScheme() const { return sectionIsPresent & Scheme; }
558 inline bool hasAuthority() const { return sectionIsPresent & Authority; }
559 inline bool hasUserInfo() const { return sectionIsPresent & UserInfo; }
560 inline bool hasUserName() const { return sectionIsPresent & UserName; }
561 inline bool hasPassword() const { return sectionIsPresent & Password; }
562 inline bool hasHost() const { return sectionIsPresent & Host; }
563 inline bool hasPort() const { return port != -1; }
564 inline bool hasPath() const { return !path.isEmpty(); }
565 inline bool hasQuery() const { return sectionIsPresent & Query; }
566 inline bool hasFragment() const { return sectionIsPresent & Fragment; }
567
568 inline bool isLocalFile() const { return flags & IsLocalFile; }
569 QString toLocalFile(QUrl::FormattingOptions options) const;
570
571 bool normalizePathSegments(QString *path) const
572 {
573 QDirPrivate::PathNormalizations mode = QDirPrivate::UrlNormalizationMode;
574 if (!isLocalFile())
575 mode |= QDirPrivate::RemotePath;
576 return qt_normalizePathSegments(path, flags: mode);
577 }
578 QString mergePaths(const QString &relativePath) const;
579
580 void clear()
581 {
582 clearError();
583 scheme = userName = password = host = path = query = fragment = QString();
584 port = -1;
585 sectionIsPresent = 0;
586 flags = 0;
587 }
588
589 QAtomicInt ref;
590 int port;
591
592 QString scheme;
593 QString userName;
594 QString password;
595 QString host;
596 QString path;
597 QString query;
598 QString fragment;
599
600 std::unique_ptr<Error> error;
601
602 // not used for:
603 // - Port (port == -1 means absence)
604 // - Path (there's no path delimiter, so we optimize its use out of existence)
605 // Schemes are never supposed to be empty, but we keep the flag anyway
606 uchar sectionIsPresent;
607 uchar flags;
608
609 // 32-bit: 2 bytes tail padding available
610 // 64-bit: 6 bytes tail padding available
611};
612
613inline QUrlPrivate::QUrlPrivate()
614 : ref(1), port(-1),
615 sectionIsPresent(0),
616 flags(0)
617{
618}
619
620inline QUrlPrivate::QUrlPrivate(const QUrlPrivate &copy)
621 : ref(1), port(copy.port),
622 scheme(copy.scheme),
623 userName(copy.userName),
624 password(copy.password),
625 host(copy.host),
626 path(copy.path),
627 query(copy.query),
628 fragment(copy.fragment),
629 error(copy.cloneError()),
630 sectionIsPresent(copy.sectionIsPresent),
631 flags(copy.flags)
632{
633}
634
635inline QUrlPrivate::~QUrlPrivate()
636 = default;
637
638std::unique_ptr<QUrlPrivate::Error> QUrlPrivate::cloneError() const
639{
640 return error ? std::make_unique<Error>(args&: *error) : nullptr;
641}
642
643inline void QUrlPrivate::clearError()
644{
645 error.reset();
646}
647
648inline void QUrlPrivate::setError(ErrorCode errorCode, const QString &source, qsizetype supplement)
649{
650 if (error) {
651 // don't overwrite an error set in a previous section during parsing
652 return;
653 }
654 error = std::make_unique<Error>();
655 error->code = errorCode;
656 error->source = source;
657 error->position = supplement;
658}
659
660// From RFC 3986, Appendix A Collected ABNF for URI
661// URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
662//[...]
663// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
664//
665// authority = [ userinfo "@" ] host [ ":" port ]
666// userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
667// host = IP-literal / IPv4address / reg-name
668// port = *DIGIT
669//[...]
670// reg-name = *( unreserved / pct-encoded / sub-delims )
671//[..]
672// pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
673//
674// query = *( pchar / "/" / "?" )
675//
676// fragment = *( pchar / "/" / "?" )
677//
678// pct-encoded = "%" HEXDIG HEXDIG
679//
680// unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
681// reserved = gen-delims / sub-delims
682// gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
683// sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
684// / "*" / "+" / "," / ";" / "="
685// the path component has a complex ABNF that basically boils down to
686// slash-separated segments of "pchar"
687
688// The above is the strict definition of the URL components and we mostly
689// adhere to it, with few exceptions. QUrl obeys the following behavior:
690// - percent-encoding sequences always use uppercase HEXDIG;
691// - unreserved characters are *always* decoded, no exceptions;
692// - the space character and bytes with the high bit set are controlled by
693// the EncodeSpaces and EncodeUnicode bits;
694// - control characters, the percent sign itself, and bytes with the high
695// bit set that don't form valid UTF-8 sequences are always encoded,
696// except in FullyDecoded mode;
697// - sub-delims are always left alone, except in FullyDecoded mode;
698// - gen-delim change behavior depending on which section of the URL (or
699// the entire URL) we're looking at; see below;
700// - characters not mentioned above, like "<", and ">", are usually
701// decoded in individual sections of the URL, but encoded when the full
702// URL is put together (we can change on subjective definition of
703// "pretty").
704//
705// The behavior for the delimiters bears some explanation. The spec says in
706// section 2.2:
707// URIs that differ in the replacement of a reserved character with its
708// corresponding percent-encoded octet are not equivalent.
709// (note: QUrl API mistakenly uses the "reserved" term, so we will refer to
710// them here as "delimiters").
711//
712// For that reason, we cannot encode delimiters found in decoded form and we
713// cannot decode the ones found in encoded form if that would change the
714// interpretation. Conversely, we *can* perform the transformation if it would
715// not change the interpretation. From the last component of a URL to the first,
716// here are the gen-delims we can unambiguously transform when the field is
717// taken in isolation:
718// - fragment: none, since it's the last
719// - query: "#" is unambiguous
720// - path: "#" and "?" are unambiguous
721// - host: completely special but never ambiguous, see setHost() below.
722// - password: the "#", "?", "/", "[", "]" and "@" characters are unambiguous
723// - username: the "#", "?", "/", "[", "]", "@", and ":" characters are unambiguous
724// - scheme: doesn't accept any delimiter, see setScheme() below.
725//
726// Internally, QUrl stores each component in the format that corresponds to the
727// default mode (PrettyDecoded). It deviates from the "strict" FullyEncoded
728// mode in the following way:
729// - spaces are decoded
730// - valid UTF-8 sequences are decoded
731// - gen-delims that can be unambiguously transformed are decoded (exception:
732// square brackets in path, query and fragment are left as they were)
733// - characters controlled by DecodeReserved are often decoded, though this behavior
734// can change depending on the subjective definition of "pretty"
735//
736// Note that the list of gen-delims that we can transform is different for the
737// user info (user name + password) and the authority (user info + host +
738// port).
739
740
741// list the recoding table modifications to be used with the recodeFromUser and
742// appendToUser functions, according to the rules above. Spaces and UTF-8
743// sequences are handled outside the tables.
744
745// the encodedXXX tables are run with the delimiters set to "leave" by default;
746// the decodedXXX tables are run with the delimiters set to "decode" by default
747// (except for the query, which doesn't use these functions)
748
749namespace {
750template <typename T> constexpr ushort decode(T x) noexcept { return ushort(x); }
751template <typename T> constexpr ushort leave(T x) noexcept { return ushort(0x100 | x); }
752template <typename T> constexpr ushort encode(T x) noexcept { return ushort(0x200 | x); }
753}
754
755static const ushort userNameInIsolation[] = {
756 decode(x: ':'), // 0
757 decode(x: '@'), // 1
758 decode(x: ']'), // 2
759 decode(x: '['), // 3
760 decode(x: '/'), // 4
761 decode(x: '?'), // 5
762 decode(x: '#'), // 6
763
764 decode(x: '"'), // 7
765 decode(x: '<'),
766 decode(x: '>'),
767 decode(x: '^'),
768 decode(x: '\\'),
769 decode(x: '|'),
770 decode(x: '{'),
771 decode(x: '}'),
772 0
773};
774static const ushort * const passwordInIsolation = userNameInIsolation + 1;
775static const ushort * const pathInIsolation = userNameInIsolation + 5;
776static const ushort * const queryInIsolation = userNameInIsolation + 6;
777static const ushort * const fragmentInIsolation = userNameInIsolation + 7;
778
779static const ushort userNameInUserInfo[] = {
780 encode(x: ':'), // 0
781 decode(x: '@'), // 1
782 decode(x: ']'), // 2
783 decode(x: '['), // 3
784 decode(x: '/'), // 4
785 decode(x: '?'), // 5
786 decode(x: '#'), // 6
787
788 decode(x: '"'), // 7
789 decode(x: '<'),
790 decode(x: '>'),
791 decode(x: '^'),
792 decode(x: '\\'),
793 decode(x: '|'),
794 decode(x: '{'),
795 decode(x: '}'),
796 0
797};
798static const ushort * const passwordInUserInfo = userNameInUserInfo + 1;
799
800static const ushort userNameInAuthority[] = {
801 encode(x: ':'), // 0
802 encode(x: '@'), // 1
803 encode(x: ']'), // 2
804 encode(x: '['), // 3
805 decode(x: '/'), // 4
806 decode(x: '?'), // 5
807 decode(x: '#'), // 6
808
809 decode(x: '"'), // 7
810 decode(x: '<'),
811 decode(x: '>'),
812 decode(x: '^'),
813 decode(x: '\\'),
814 decode(x: '|'),
815 decode(x: '{'),
816 decode(x: '}'),
817 0
818};
819static const ushort * const passwordInAuthority = userNameInAuthority + 1;
820
821static const ushort userNameInUrl[] = {
822 encode(x: ':'), // 0
823 encode(x: '@'), // 1
824 encode(x: ']'), // 2
825 encode(x: '['), // 3
826 encode(x: '/'), // 4
827 encode(x: '?'), // 5
828 encode(x: '#'), // 6
829
830 // no need to list encode(x) for the other characters
831 0
832};
833static const ushort * const passwordInUrl = userNameInUrl + 1;
834static const ushort * const pathInUrl = userNameInUrl + 5;
835static const ushort * const queryInUrl = userNameInUrl + 6;
836static const ushort * const fragmentInUrl = userNameInUrl + 6;
837
838static void
839recodeFromUser(QString &output, const QString &input, const ushort *actions, QUrl::ParsingMode mode)
840{
841 output.resize(size: 0);
842 qsizetype appended;
843 if (mode == QUrl::DecodedMode)
844 appended = qt_encodeFromUser(appendTo&: output, input, tableModifications: actions);
845 else
846 appended = qt_urlRecode(appendTo&: output, url: input, encoding: {}, tableModifications: actions);
847 if (!appended)
848 output = input;
849}
850
851static void
852recodeFromUser(QString &output, QStringView input, const ushort *actions, QUrl::ParsingMode mode)
853{
854 Q_ASSERT_X(mode != QUrl::DecodedMode, "recodeFromUser",
855 "This function should only be called when parsing encoded components");
856 Q_UNUSED(mode);
857 output.resize(size: 0);
858 if (qt_urlRecode(appendTo&: output, url: input, encoding: {}, tableModifications: actions))
859 return;
860 output.append(v: input);
861}
862
863// appendXXXX functions: copy from the internal form to the external, user form.
864// the internal value is stored in its PrettyDecoded form, so that case is easy.
865static inline void appendToUser(QString &appendTo, QStringView value, QUrl::FormattingOptions options,
866 const ushort *actions)
867{
868 // The stored value is already QUrl::PrettyDecoded, so there's nothing to
869 // do if that's what the user asked for (test only
870 // ComponentFormattingOptions, ignore FormattingOptions).
871 if ((options & 0xFFFF0000) == QUrl::PrettyDecoded ||
872 !qt_urlRecode(appendTo, url: value, encoding: options, tableModifications: actions))
873 appendTo += value;
874
875 // copy nullness, if necessary, because QString::operator+=(QStringView) doesn't
876 if (appendTo.isNull() && !value.isNull())
877 appendTo.detach();
878}
879
880inline void QUrlPrivate::appendAuthority(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
881{
882 if ((options & QUrl::RemoveUserInfo) != QUrl::RemoveUserInfo) {
883 appendUserInfo(appendTo, options, appendingTo);
884
885 // add '@' only if we added anything
886 if (hasUserName() || (hasPassword() && (options & QUrl::RemovePassword) == 0))
887 appendTo += u'@';
888 }
889 appendHost(appendTo, options);
890 if (!(options & QUrl::RemovePort) && port != -1)
891 appendTo += u':' + QString::number(port);
892}
893
894inline void QUrlPrivate::appendUserInfo(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
895{
896 if (Q_LIKELY(!hasUserInfo()))
897 return;
898
899 const ushort *userNameActions;
900 const ushort *passwordActions;
901 if (options & QUrl::EncodeDelimiters) {
902 userNameActions = userNameInUrl;
903 passwordActions = passwordInUrl;
904 } else {
905 switch (appendingTo) {
906 case UserInfo:
907 userNameActions = userNameInUserInfo;
908 passwordActions = passwordInUserInfo;
909 break;
910
911 case Authority:
912 userNameActions = userNameInAuthority;
913 passwordActions = passwordInAuthority;
914 break;
915
916 case FullUrl:
917 userNameActions = userNameInUrl;
918 passwordActions = passwordInUrl;
919 break;
920
921 default:
922 // can't happen
923 Q_UNREACHABLE();
924 break;
925 }
926 }
927
928 if (!qt_urlRecode(appendTo, url: userName, encoding: options, tableModifications: userNameActions))
929 appendTo += userName;
930 if (options & QUrl::RemovePassword || !hasPassword()) {
931 return;
932 } else {
933 appendTo += u':';
934 if (!qt_urlRecode(appendTo, url: password, encoding: options, tableModifications: passwordActions))
935 appendTo += password;
936 }
937}
938
939inline void QUrlPrivate::appendUserName(QString &appendTo, QUrl::FormattingOptions options) const
940{
941 // only called from QUrl::userName()
942 appendToUser(appendTo, value: userName, options,
943 actions: options & QUrl::EncodeDelimiters ? userNameInUrl : userNameInIsolation);
944 if (appendTo.isNull() && hasPassword())
945 appendTo.detach(); // the presence of password implies presence of username
946}
947
948inline void QUrlPrivate::appendPassword(QString &appendTo, QUrl::FormattingOptions options) const
949{
950 // only called from QUrl::password()
951 appendToUser(appendTo, value: password, options,
952 actions: options & QUrl::EncodeDelimiters ? passwordInUrl : passwordInIsolation);
953}
954
955inline void QUrlPrivate::appendPath(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
956{
957 QString thePath = path;
958 if (options & QUrl::NormalizePathSegments)
959 normalizePathSegments(path: &thePath);
960
961 QStringView thePathView(thePath);
962 if (options & QUrl::RemoveFilename) {
963 const qsizetype slash = thePathView.lastIndexOf(c: u'/');
964 if (slash == -1)
965 return;
966 thePathView = thePathView.left(n: slash + 1);
967 }
968 // check if we need to remove trailing slashes
969 if (options & QUrl::StripTrailingSlash) {
970 while (thePathView.size() > 1 && thePathView.endsWith(c: u'/'))
971 thePathView.chop(n: 1);
972 }
973
974 appendToUser(appendTo, value: thePathView, options,
975 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? pathInUrl : pathInIsolation);
976}
977
978inline void QUrlPrivate::appendFragment(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
979{
980 appendToUser(appendTo, value: fragment, options,
981 actions: options & QUrl::EncodeDelimiters ? fragmentInUrl :
982 appendingTo == FullUrl ? nullptr : fragmentInIsolation);
983}
984
985inline void QUrlPrivate::appendQuery(QString &appendTo, QUrl::FormattingOptions options, Section appendingTo) const
986{
987 appendToUser(appendTo, value: query, options,
988 actions: appendingTo == FullUrl || options & QUrl::EncodeDelimiters ? queryInUrl : queryInIsolation);
989}
990
991// setXXX functions
992
993inline bool QUrlPrivate::setScheme(const QString &value, qsizetype len, bool doSetError)
994{
995 // schemes are strictly RFC-compliant:
996 // scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
997 // we also lowercase the scheme
998
999 // schemes in URLs are not allowed to be empty, but they can be in
1000 // "Relative URIs" which QUrl also supports. QUrl::setScheme does
1001 // not call us with len == 0, so this can only be from parse()
1002 scheme.clear();
1003 if (len == 0)
1004 return false;
1005
1006 sectionIsPresent |= Scheme;
1007
1008 // validate it:
1009 qsizetype needsLowercasing = -1;
1010 const ushort *p = reinterpret_cast<const ushort *>(value.data());
1011 for (qsizetype i = 0; i < len; ++i) {
1012 if (isAsciiLower(c: p[i]))
1013 continue;
1014 if (isAsciiUpper(c: p[i])) {
1015 needsLowercasing = i;
1016 continue;
1017 }
1018 if (i) {
1019 if (isAsciiDigit(c: p[i]))
1020 continue;
1021 if (p[i] == '+' || p[i] == '-' || p[i] == '.')
1022 continue;
1023 }
1024
1025 // found something else
1026 // don't call setError needlessly:
1027 // if we've been called from parse(), it will try to recover
1028 if (doSetError)
1029 setError(errorCode: InvalidSchemeError, source: value, supplement: i);
1030 return false;
1031 }
1032
1033 scheme = value.left(n: len);
1034
1035 if (needsLowercasing != -1) {
1036 // schemes are ASCII only, so we don't need the full Unicode toLower
1037 QChar *schemeData = scheme.data(); // force detaching here
1038 for (qsizetype i = needsLowercasing; i >= 0; --i) {
1039 ushort c = schemeData[i].unicode();
1040 if (isAsciiUpper(c))
1041 schemeData[i] = QChar(c + 0x20);
1042 }
1043 }
1044
1045 // did we set to the file protocol?
1046 if (scheme == fileScheme()
1047#ifdef Q_OS_WIN
1048 || scheme == webDavScheme()
1049#endif
1050 ) {
1051 flags |= IsLocalFile;
1052 } else {
1053 flags &= ~IsLocalFile;
1054 }
1055 return true;
1056}
1057
1058inline void QUrlPrivate::setAuthority(const QString &auth, qsizetype from, qsizetype end, QUrl::ParsingMode mode)
1059{
1060 Q_ASSERT_X(mode != QUrl::DecodedMode, "setAuthority",
1061 "This function should only be called when parsing encoded components");
1062 sectionIsPresent &= ~Authority;
1063 port = -1;
1064 if (from == end && !auth.isNull())
1065 sectionIsPresent |= Host; // empty but not null authority implies host
1066
1067 // we never actually _loop_
1068 while (from != end) {
1069 qsizetype userInfoIndex = auth.indexOf(c: u'@', from);
1070 if (size_t(userInfoIndex) < size_t(end)) {
1071 setUserInfo(value: QStringView(auth).sliced(pos: from, n: userInfoIndex - from), mode);
1072 if (mode == QUrl::StrictMode && !validateComponent(section: UserInfo, input: auth, begin: from, end: userInfoIndex))
1073 break;
1074 from = userInfoIndex + 1;
1075 }
1076
1077 qsizetype colonIndex = auth.lastIndexOf(c: u':', from: end - 1);
1078 if (colonIndex < from)
1079 colonIndex = -1;
1080
1081 if (size_t(colonIndex) < size_t(end)) {
1082 if (auth.at(i: from).unicode() == '[') {
1083 // check if colonIndex isn't inside the "[...]" part
1084 qsizetype closingBracket = auth.indexOf(c: u']', from);
1085 if (size_t(closingBracket) > size_t(colonIndex))
1086 colonIndex = -1;
1087 }
1088 }
1089
1090 if (size_t(colonIndex) < size_t(end) - 1) {
1091 // found a colon with digits after it
1092 unsigned long x = 0;
1093 for (qsizetype i = colonIndex + 1; i < end; ++i) {
1094 ushort c = auth.at(i).unicode();
1095 if (isAsciiDigit(c)) {
1096 x *= 10;
1097 x += c - '0';
1098 } else {
1099 x = ulong(-1); // x != ushort(x)
1100 break;
1101 }
1102 }
1103 if (x == ushort(x)) {
1104 port = ushort(x);
1105 } else {
1106 setError(errorCode: InvalidPortError, source: auth, supplement: colonIndex + 1);
1107 if (mode == QUrl::StrictMode)
1108 break;
1109 }
1110 }
1111
1112 setHost(value: auth, from, end: qMin<size_t>(a: end, b: colonIndex), mode);
1113 if (mode == QUrl::StrictMode && !validateComponent(section: Host, input: auth, begin: from, end: qMin<size_t>(a: end, b: colonIndex))) {
1114 // clear host too
1115 sectionIsPresent &= ~Authority;
1116 break;
1117 }
1118
1119 // success
1120 return;
1121 }
1122 // clear all sections but host
1123 sectionIsPresent &= ~Authority | Host;
1124 userName.clear();
1125 password.clear();
1126 host.clear();
1127 port = -1;
1128}
1129
1130template <typename String> void QUrlPrivate::setUserInfo(String &&value, QUrl::ParsingMode mode)
1131{
1132 Q_ASSERT_X(mode != QUrl::DecodedMode, "setUserInfo",
1133 "This function should only be called when parsing encoded components");
1134 qsizetype delimIndex = value.indexOf(u':');
1135 if (delimIndex < 0) {
1136 // no password
1137 setUserName(std::move(value), mode);
1138 password.clear();
1139 sectionIsPresent &= ~Password;
1140 } else {
1141 setUserName(value.first(delimIndex), mode);
1142 setPassword(value.sliced(delimIndex + 1), mode);
1143 }
1144}
1145
1146template <typename String> inline void QUrlPrivate::setUserName(String &&value, QUrl::ParsingMode mode)
1147{
1148 sectionIsPresent |= UserName;
1149 recodeFromUser(userName, value, userNameInIsolation, mode);
1150}
1151
1152template <typename String> inline void QUrlPrivate::setPassword(String &&value, QUrl::ParsingMode mode)
1153{
1154 sectionIsPresent |= Password;
1155 recodeFromUser(password, value, passwordInIsolation, mode);
1156}
1157
1158template <typename String> inline void QUrlPrivate::setPath(String &&value, QUrl::ParsingMode mode)
1159{
1160 // sectionIsPresent |= Path; // not used, save some cycles
1161 recodeFromUser(path, value, pathInIsolation, mode);
1162}
1163
1164template <typename String> inline void QUrlPrivate::setFragment(String &&value, QUrl::ParsingMode mode)
1165{
1166 sectionIsPresent |= Fragment;
1167 recodeFromUser(fragment, value, fragmentInIsolation, mode);
1168}
1169
1170template <typename String> inline void QUrlPrivate::setQuery(String &&value, QUrl::ParsingMode mode)
1171{
1172 sectionIsPresent |= Query;
1173 recodeFromUser(query, value, queryInIsolation, mode);
1174}
1175
1176// Host handling
1177// The RFC says the host is:
1178// host = IP-literal / IPv4address / reg-name
1179// IP-literal = "[" ( IPv6address / IPvFuture ) "]"
1180// IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1181// [a strict definition of IPv6Address and IPv4Address]
1182// reg-name = *( unreserved / pct-encoded / sub-delims )
1183//
1184// We deviate from the standard in all but IPvFuture. For IPvFuture we accept
1185// and store only exactly what the RFC says we should. No percent-encoding is
1186// permitted in this field, so Unicode characters and space aren't either.
1187//
1188// For IPv4 addresses, we accept broken addresses like inet_aton does (that is,
1189// less than three dots). However, we correct the address to the proper form
1190// and store the corrected address. After correction, we comply to the RFC and
1191// it's exclusively composed of unreserved characters.
1192//
1193// For IPv6 addresses, we accept addresses including trailing (embedded) IPv4
1194// addresses, the so-called v4-compat and v4-mapped addresses. We also store
1195// those addresses like that in the hostname field, which violates the spec.
1196// IPv6 hosts are stored with the square brackets in the QString. It also
1197// requires no transformation in any way.
1198//
1199// As for registered names, it's the other way around: we accept only valid
1200// hostnames as specified by STD 3 and IDNA. That means everything we accept is
1201// valid in the RFC definition above, but there are many valid reg-names
1202// according to the RFC that we do not accept in the name of security. Since we
1203// do accept IDNA, reg-names are subject to ACE encoding and decoding, which is
1204// specified by the DecodeUnicode flag. The hostname is stored in its Unicode form.
1205
1206inline void QUrlPrivate::appendHost(QString &appendTo, QUrl::FormattingOptions options) const
1207{
1208 if (host.isEmpty()) {
1209 if ((sectionIsPresent & Host) && appendTo.isNull())
1210 appendTo.detach();
1211 return;
1212 }
1213 if (host.at(i: 0).unicode() == '[') {
1214 // IPv6 addresses might contain a zone-id which needs to be recoded
1215 if (options != 0)
1216 if (qt_urlRecode(appendTo, url: host, encoding: options, tableModifications: nullptr))
1217 return;
1218 appendTo += host;
1219 } else {
1220 // this is either an IPv4Address or a reg-name
1221 // if it is a reg-name, it is already stored in Unicode form
1222 if (options & QUrl::EncodeUnicode && !(options & 0x4000000))
1223 appendTo += qt_ACE_do(domain: host, op: ToAceOnly, dot: AllowLeadingDot, options: {});
1224 else
1225 appendTo += host;
1226 }
1227}
1228
1229// the whole IPvFuture is passed and parsed here, including brackets;
1230// returns null if the parsing was successful, or the QChar of the first failure
1231static const QChar *parseIpFuture(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1232{
1233 // IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
1234 static const char acceptable[] =
1235 "!$&'()*+,;=" // sub-delims
1236 ":" // ":"
1237 "-._~"; // unreserved
1238
1239 // the brackets and the "v" have been checked
1240 const QChar *const origBegin = begin;
1241 if (begin[3].unicode() != '.')
1242 return &begin[3];
1243 if (isHexDigit(c: begin[2].unicode())) {
1244 // this is so unlikely that we'll just go down the slow path
1245 // decode the whole string, skipping the "[vH." and "]" which we already know to be there
1246 host += QStringView(begin, 4);
1247
1248 // uppercase the version, if necessary
1249 if (begin[2].unicode() >= 'a')
1250 host[host.size() - 2] = QChar{begin[2].unicode() - 0x20};
1251
1252 begin += 4;
1253 --end;
1254
1255 QString decoded;
1256 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: decoded, url: QStringView{begin, end}, encoding: QUrl::FullyDecoded, tableModifications: nullptr)) {
1257 begin = decoded.constBegin();
1258 end = decoded.constEnd();
1259 }
1260
1261 for ( ; begin != end; ++begin) {
1262 if (isAsciiLetterOrNumber(c: begin->unicode()))
1263 host += *begin;
1264 else if (begin->unicode() < 0x80 && strchr(s: acceptable, c: begin->unicode()) != nullptr)
1265 host += *begin;
1266 else
1267 return decoded.isEmpty() ? begin : &origBegin[2];
1268 }
1269 host += u']';
1270 return nullptr;
1271 }
1272 return &origBegin[2];
1273}
1274
1275// ONLY the IPv6 address is parsed here, WITHOUT the brackets
1276static const QChar *parseIp6(QString &host, const QChar *begin, const QChar *end, QUrl::ParsingMode mode)
1277{
1278 QStringView decoded(begin, end);
1279 QString decodedBuffer;
1280 if (mode == QUrl::TolerantMode) {
1281 // this struct is kept in automatic storage because it's only 4 bytes
1282 const ushort decodeColon[] = { decode(x: ':'), 0 };
1283 if (qt_urlRecode(appendTo&: decodedBuffer, url: decoded, encoding: QUrl::ComponentFormattingOption::PrettyDecoded, tableModifications: decodeColon))
1284 decoded = decodedBuffer;
1285 }
1286
1287 const QStringView zoneIdIdentifier(u"%25");
1288 QIPAddressUtils::IPv6Address address;
1289 QStringView zoneId;
1290
1291 qsizetype zoneIdPosition = decoded.indexOf(s: zoneIdIdentifier);
1292 if ((zoneIdPosition != -1) && (decoded.lastIndexOf(s: zoneIdIdentifier) == zoneIdPosition)) {
1293 zoneId = decoded.mid(pos: zoneIdPosition + zoneIdIdentifier.size());
1294 decoded.truncate(n: zoneIdPosition);
1295
1296 // was there anything after the zone ID separator?
1297 if (zoneId.isEmpty())
1298 return end;
1299 }
1300
1301 // did the address become empty after removing the zone ID?
1302 // (it might have always been empty)
1303 if (decoded.isEmpty())
1304 return end;
1305
1306 const QChar *ret = QIPAddressUtils::parseIp6(address, begin: decoded.constBegin(), end: decoded.constEnd());
1307 if (ret)
1308 return begin + (ret - decoded.constBegin());
1309
1310 host.reserve(asize: host.size() + (end - begin) + 2); // +2 for the brackets
1311 host += u'[';
1312 QIPAddressUtils::toString(appendTo&: host, address);
1313
1314 if (!zoneId.isEmpty()) {
1315 host += zoneIdIdentifier;
1316 host += zoneId;
1317 }
1318 host += u']';
1319 return nullptr;
1320}
1321
1322inline bool
1323QUrlPrivate::setHost(const QString &value, qsizetype from, qsizetype iend, QUrl::ParsingMode mode)
1324{
1325 Q_ASSERT_X(mode != QUrl::DecodedMode, "setUserInfo",
1326 "This function should only be called when parsing encoded components");
1327 const QChar *begin = value.constData() + from;
1328 const QChar *end = value.constData() + iend;
1329
1330 const qsizetype len = end - begin;
1331 host.clear();
1332 sectionIsPresent &= ~Host;
1333 if (!value.isNull() || (sectionIsPresent & Authority))
1334 sectionIsPresent |= Host;
1335 if (len == 0)
1336 return true;
1337
1338 if (begin[0].unicode() == '[') {
1339 // IPv6Address or IPvFuture
1340 // smallest IPv6 address is "[::]" (len = 4)
1341 // smallest IPvFuture address is "[v7.X]" (len = 6)
1342 if (end[-1].unicode() != ']') {
1343 setError(errorCode: HostMissingEndBracket, source: value);
1344 return false;
1345 }
1346
1347 if (len > 5 && begin[1].unicode() == 'v') {
1348 const QChar *c = parseIpFuture(host, begin, end, mode);
1349 if (c)
1350 setError(errorCode: InvalidIPvFutureError, source: value, supplement: c - value.constData());
1351 return !c;
1352 } else if (begin[1].unicode() == 'v') {
1353 setError(errorCode: InvalidIPvFutureError, source: value, supplement: from);
1354 }
1355
1356 const QChar *c = parseIp6(host, begin: begin + 1, end: end - 1, mode);
1357 if (!c)
1358 return true;
1359
1360 if (c == end - 1)
1361 setError(errorCode: InvalidIPv6AddressError, source: value, supplement: from);
1362 else
1363 setError(errorCode: InvalidCharacterInIPv6Error, source: value, supplement: c - value.constData());
1364 return false;
1365 }
1366
1367 // check if it's an IPv4 address
1368 QIPAddressUtils::IPv4Address ip4;
1369 if (QIPAddressUtils::parseIp4(address&: ip4, begin, end)) {
1370 // yes, it was
1371 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1372 return true;
1373 }
1374
1375 // This is probably a reg-name.
1376 // But it can also be an encoded string that, when decoded becomes one
1377 // of the types above.
1378 //
1379 // Two types of encoding are possible:
1380 // percent encoding (e.g., "%31%30%2E%30%2E%30%2E%31" -> "10.0.0.1")
1381 // Unicode encoding (some non-ASCII characters case-fold to digits
1382 // when nameprepping is done)
1383 //
1384 // The qt_ACE_do function below does IDNA normalization and the STD3 check.
1385 // That means a Unicode string may become an IPv4 address, but it cannot
1386 // produce a '[' or a '%'.
1387
1388 // check for percent-encoding first
1389 QString s;
1390 if (mode == QUrl::TolerantMode && qt_urlRecode(appendTo&: s, url: QStringView{begin, end}, encoding: { }, tableModifications: nullptr)) {
1391 // something was decoded
1392 // anything encoded left?
1393 qsizetype pos = s.indexOf(c: QChar(0x25)); // '%'
1394 if (pos != -1) {
1395 setError(errorCode: InvalidRegNameError, source: s, supplement: pos);
1396 return false;
1397 }
1398
1399 // recurse
1400 return setHost(value: s, from: 0, iend: s.size(), mode: QUrl::StrictMode);
1401 }
1402
1403 s = qt_ACE_do(domain: value.mid(position: from, n: iend - from), op: NormalizeAce, dot: ForbidLeadingDot, options: {});
1404 if (s.isEmpty()) {
1405 setError(errorCode: InvalidRegNameError, source: value);
1406 return false;
1407 }
1408
1409 // check IPv4 again
1410 if (QIPAddressUtils::parseIp4(address&: ip4, begin: s.constBegin(), end: s.constEnd())) {
1411 QIPAddressUtils::toString(appendTo&: host, address: ip4);
1412 } else {
1413 host = s;
1414 }
1415 return true;
1416}
1417
1418inline void QUrlPrivate::parse(const QString &url, QUrl::ParsingMode parsingMode)
1419{
1420 // URI-reference = URI / relative-ref
1421 // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
1422 // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
1423 // hier-part = "//" authority path-abempty
1424 // / other path types
1425 // relative-part = "//" authority path-abempty
1426 // / other path types here
1427
1428 Q_ASSERT_X(parsingMode != QUrl::DecodedMode, "parse",
1429 "This function should only be called when parsing encoded URLs");
1430 Q_ASSERT(sectionIsPresent == 0);
1431 Q_ASSERT(!error);
1432
1433 // find the important delimiters
1434 qsizetype colon = -1;
1435 qsizetype question = -1;
1436 qsizetype hash = -1;
1437 const qsizetype len = url.size();
1438 const QChar *const begin = url.constData();
1439 const ushort *const data = reinterpret_cast<const ushort *>(begin);
1440
1441 for (qsizetype i = 0; i < len; ++i) {
1442 size_t uc = data[i];
1443 if (uc == '#' && hash == -1) {
1444 hash = i;
1445
1446 // nothing more to be found
1447 break;
1448 }
1449
1450 if (question == -1) {
1451 if (uc == ':' && colon == -1)
1452 colon = i;
1453 else if (uc == '?')
1454 question = i;
1455 }
1456 }
1457
1458 // check if we have a scheme
1459 qsizetype hierStart;
1460 if (colon != -1 && setScheme(value: url, len: colon, /* don't set error */ doSetError: false)) {
1461 hierStart = colon + 1;
1462 } else {
1463 // recover from a failed scheme: it might not have been a scheme at all
1464 scheme.clear();
1465 sectionIsPresent = 0;
1466 hierStart = 0;
1467 }
1468
1469 qsizetype pathStart;
1470 qsizetype hierEnd = qMin<size_t>(a: qMin<size_t>(a: question, b: hash), b: len);
1471 if (hierEnd - hierStart >= 2 && data[hierStart] == '/' && data[hierStart + 1] == '/') {
1472 // we have an authority, it ends at the first slash after these
1473 qsizetype authorityEnd = hierEnd;
1474 for (qsizetype i = hierStart + 2; i < authorityEnd ; ++i) {
1475 if (data[i] == '/') {
1476 authorityEnd = i;
1477 break;
1478 }
1479 }
1480
1481 setAuthority(auth: url, from: hierStart + 2, end: authorityEnd, mode: parsingMode);
1482
1483 // even if we failed to set the authority properly, let's try to recover
1484 pathStart = authorityEnd;
1485 setPath(value: QStringView(url).sliced(pos: pathStart, n: hierEnd - pathStart), mode: parsingMode);
1486 } else {
1487 Q_ASSERT(userName.isNull());
1488 Q_ASSERT(password.isNull());
1489 Q_ASSERT(host.isNull());
1490 Q_ASSERT(port == -1);
1491 pathStart = hierStart;
1492
1493 if (hierStart < hierEnd)
1494 setPath(value: QStringView(url).sliced(pos: hierStart, n: hierEnd - hierStart), mode: parsingMode);
1495 else
1496 path.clear();
1497 }
1498
1499 Q_ASSERT(query.isNull());
1500 if (size_t(question) < size_t(hash))
1501 setQuery(value: QStringView(url).sliced(pos: question + 1, n: qMin<size_t>(a: hash, b: len) - question - 1),
1502 mode: parsingMode);
1503
1504 Q_ASSERT(fragment.isNull());
1505 if (hash != -1)
1506 setFragment(value: QStringView(url).sliced(pos: hash + 1, n: len - hash - 1), mode: parsingMode);
1507
1508 if (error || parsingMode == QUrl::TolerantMode)
1509 return;
1510
1511 // The parsing so far was partially tolerant of errors, except for the
1512 // scheme parser (which is always strict) and the authority (which was
1513 // executed in strict mode).
1514 // If we haven't found any errors so far, continue the strict-mode parsing
1515 // from the path component onwards.
1516
1517 if (!validateComponent(section: Path, input: url, begin: pathStart, end: hierEnd))
1518 return;
1519 if (size_t(question) < size_t(hash) && !validateComponent(section: Query, input: url, begin: question + 1, end: qMin<size_t>(a: hash, b: len)))
1520 return;
1521 if (hash != -1)
1522 validateComponent(section: Fragment, input: url, begin: hash + 1, end: len);
1523}
1524
1525QString QUrlPrivate::toLocalFile(QUrl::FormattingOptions options) const
1526{
1527 QString tmp;
1528 QString ourPath;
1529 appendPath(appendTo&: ourPath, options, appendingTo: QUrlPrivate::Path);
1530
1531 // magic for shared drive on windows
1532 if (!host.isEmpty()) {
1533 tmp = "//"_L1 + host;
1534#ifdef Q_OS_WIN // QTBUG-42346, WebDAV is visible as local file on Windows only.
1535 if (scheme == webDavScheme())
1536 tmp += webDavSslTag();
1537#endif
1538 if (!ourPath.isEmpty() && !ourPath.startsWith(c: u'/'))
1539 tmp += u'/';
1540 tmp += ourPath;
1541 } else {
1542 tmp = ourPath;
1543#ifdef Q_OS_WIN
1544 // magic for drives on windows
1545 if (ourPath.length() > 2 && ourPath.at(0) == u'/' && ourPath.at(2) == u':')
1546 tmp.remove(0, 1);
1547#endif
1548 }
1549 return tmp;
1550}
1551
1552/*
1553 From http://www.ietf.org/rfc/rfc3986.txt, 5.2.3: Merge paths
1554
1555 Returns a merge of the current path with the relative path passed
1556 as argument.
1557
1558 Note: \a relativePath is relative (does not start with '/').
1559*/
1560inline QString QUrlPrivate::mergePaths(const QString &relativePath) const
1561{
1562 // If the base URI has a defined authority component and an empty
1563 // path, then return a string consisting of "/" concatenated with
1564 // the reference's path; otherwise,
1565 if (!host.isEmpty() && path.isEmpty())
1566 return u'/' + relativePath;
1567
1568 // Return a string consisting of the reference's path component
1569 // appended to all but the last segment of the base URI's path
1570 // (i.e., excluding any characters after the right-most "/" in the
1571 // base URI path, or excluding the entire base URI path if it does
1572 // not contain any "/" characters).
1573 QString newPath;
1574 if (!path.contains(c: u'/'))
1575 newPath = relativePath;
1576 else
1577 newPath = QStringView{path}.left(n: path.lastIndexOf(c: u'/') + 1) + relativePath;
1578
1579 return newPath;
1580}
1581
1582// Authority-less URLs cannot have paths starting with double slashes (see
1583// QUrlPrivate::validityError). We refuse to turn a valid URL into invalid by
1584// way of QUrl::resolved().
1585static void fixupNonAuthorityPath(QString *path)
1586{
1587 if (path->isEmpty() || path->at(i: 0) != u'/')
1588 return;
1589
1590 // Find the first non-slash character, because its position is equal to the
1591 // number of slashes. We'll remove all but one of them.
1592 qsizetype i = 0;
1593 while (i + 1 < path->size() && path->at(i: i + 1) == u'/')
1594 ++i;
1595 if (i)
1596 path->remove(i: 0, len: i);
1597}
1598
1599inline QUrlPrivate::ErrorCode QUrlPrivate::validityError(QString *source, qsizetype *position) const
1600{
1601 Q_ASSERT(!source == !position);
1602 if (error) {
1603 if (source) {
1604 *source = error->source;
1605 *position = error->position;
1606 }
1607 return error->code;
1608 }
1609
1610 // There are three more cases of invalid URLs that QUrl recognizes and they
1611 // are only possible with constructed URLs (setXXX methods), not with
1612 // parsing. Therefore, they are tested here.
1613 //
1614 // Two cases are a non-empty path that doesn't start with a slash and:
1615 // - with an authority
1616 // - without an authority, without scheme but the path with a colon before
1617 // the first slash
1618 // The third case is an empty authority and a non-empty path that starts
1619 // with "//".
1620 // Those cases are considered invalid because toString() would produce a URL
1621 // that wouldn't be parsed back to the same QUrl.
1622
1623 if (path.isEmpty())
1624 return NoError;
1625 if (path.at(i: 0) == u'/') {
1626 if (hasAuthority() || path.size() == 1 || path.at(i: 1) != u'/')
1627 return NoError;
1628 if (source) {
1629 *source = path;
1630 *position = 0;
1631 }
1632 return AuthorityAbsentAndPathIsDoubleSlash;
1633 }
1634
1635 if (sectionIsPresent & QUrlPrivate::Host) {
1636 if (source) {
1637 *source = path;
1638 *position = 0;
1639 }
1640 return AuthorityPresentAndPathIsRelative;
1641 }
1642 if (sectionIsPresent & QUrlPrivate::Scheme)
1643 return NoError;
1644
1645 // check for a path of "text:text/"
1646 for (qsizetype i = 0; i < path.size(); ++i) {
1647 ushort c = path.at(i).unicode();
1648 if (c == '/') {
1649 // found the slash before the colon
1650 return NoError;
1651 }
1652 if (c == ':') {
1653 // found the colon before the slash, it's invalid
1654 if (source) {
1655 *source = path;
1656 *position = i;
1657 }
1658 return RelativeUrlPathContainsColonBeforeSlash;
1659 }
1660 }
1661 return NoError;
1662}
1663
1664bool QUrlPrivate::validateComponent(QUrlPrivate::Section section, const QString &input,
1665 qsizetype begin, qsizetype end)
1666{
1667 // What we need to look out for, that the regular parser tolerates:
1668 // - percent signs not followed by two hex digits
1669 // - forbidden characters, which should always appear encoded
1670 // '"' / '<' / '>' / '\' / '^' / '`' / '{' / '|' / '}' / BKSP
1671 // control characters
1672 // - delimiters not allowed in certain positions
1673 // . scheme: parser is already strict
1674 // . user info: gen-delims except ":" disallowed ("/" / "?" / "#" / "[" / "]" / "@")
1675 // . host: parser is stricter than the standard
1676 // . port: parser is stricter than the standard
1677 // . path: all delimiters allowed
1678 // . fragment: all delimiters allowed
1679 // . query: all delimiters allowed
1680 static const char forbidden[] = "\"<>\\^`{|}\x7F";
1681 static const char forbiddenUserInfo[] = ":/?#[]@";
1682
1683 Q_ASSERT(section != Authority && section != Hierarchy && section != FullUrl);
1684
1685 const ushort *const data = reinterpret_cast<const ushort *>(input.constData());
1686 for (size_t i = size_t(begin); i < size_t(end); ++i) {
1687 uint uc = data[i];
1688 if (uc >= 0x80)
1689 continue;
1690
1691 bool error = false;
1692 if ((uc == '%' && (size_t(end) < i + 2 || !isHex(c: data[i + 1]) || !isHex(c: data[i + 2])))
1693 || uc <= 0x20 || strchr(s: forbidden, c: uc)) {
1694 // found an error
1695 error = true;
1696 } else if (section & UserInfo) {
1697 if (section == UserInfo && strchr(s: forbiddenUserInfo + 1, c: uc))
1698 error = true;
1699 else if (section != UserInfo && strchr(s: forbiddenUserInfo, c: uc))
1700 error = true;
1701 }
1702
1703 if (!error)
1704 continue;
1705
1706 ErrorCode errorCode = ErrorCode(int(section) << 8);
1707 if (section == UserInfo) {
1708 // is it the user name or the password?
1709 errorCode = InvalidUserNameError;
1710 for (size_t j = size_t(begin); j < i; ++j)
1711 if (data[j] == ':') {
1712 errorCode = InvalidPasswordError;
1713 break;
1714 }
1715 }
1716
1717 setError(errorCode, source: input, supplement: i);
1718 return false;
1719 }
1720
1721 // no errors
1722 return true;
1723}
1724
1725#if 0
1726inline void QUrlPrivate::validate() const
1727{
1728 QUrlPrivate *that = (QUrlPrivate *)this;
1729 that->encodedOriginal = that->toEncoded(); // may detach
1730 parse(ParseOnly);
1731
1732 QURL_SETFLAG(that->stateFlags, Validated);
1733
1734 if (!isValid)
1735 return;
1736
1737 QString auth = authority(); // causes the non-encoded forms to be valid
1738
1739 // authority() calls canonicalHost() which sets this
1740 if (!isHostValid)
1741 return;
1742
1743 if (scheme == "mailto"_L1) {
1744 if (!host.isEmpty() || port != -1 || !userName.isEmpty() || !password.isEmpty()) {
1745 that->isValid = false;
1746 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "expected empty host, username,"
1747 "port and password"),
1748 0, 0);
1749 }
1750 } else if (scheme == ftpScheme() || scheme == httpScheme()) {
1751 if (host.isEmpty() && !(path.isEmpty() && encodedPath.isEmpty())) {
1752 that->isValid = false;
1753 that->errorInfo.setParams(0, QT_TRANSLATE_NOOP(QUrl, "the host is empty, but not the path"),
1754 0, 0);
1755 }
1756 }
1757}
1758#endif
1759
1760/*!
1761 \macro QT_NO_URL_CAST_FROM_STRING
1762 \relates QUrl
1763
1764 Disables automatic conversions from QString (or char *) to QUrl.
1765
1766 Compiling your code with this define is useful when you have a lot of
1767 code that uses QString for file names and you wish to convert it to
1768 use QUrl for network transparency. In any code that uses QUrl, it can
1769 help avoid missing QUrl::resolved() calls, and other misuses of
1770 QString to QUrl conversions.
1771
1772 For example, if you have code like
1773
1774 \code
1775 url = filename; // probably not what you want
1776 \endcode
1777
1778 you can rewrite it as
1779
1780 \code
1781 url = QUrl::fromLocalFile(filename);
1782 url = baseurl.resolved(QUrl(filename));
1783 \endcode
1784
1785 \sa QT_NO_CAST_FROM_ASCII
1786*/
1787
1788
1789/*!
1790 Constructs a URL by parsing \a url. Note this constructor expects a proper
1791 URL or URL-Reference and will not attempt to guess intent. For example, the
1792 following declaration:
1793
1794 \snippet code/src_corelib_io_qurl.cpp constructor-url-reference
1795
1796 Will construct a valid URL but it may not be what one expects, as the
1797 scheme() part of the input is missing. For a string like the above,
1798 applications may want to use fromUserInput(). For this constructor or
1799 setUrl(), the following is probably what was intended:
1800
1801 \snippet code/src_corelib_io_qurl.cpp constructor-url
1802
1803 QUrl will automatically percent encode
1804 all characters that are not allowed in a URL and decode the percent-encoded
1805 sequences that represent an unreserved character (letters, digits, hyphens,
1806 underscores, dots and tildes). All other characters are left in their
1807 original forms.
1808
1809 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1810 (the default), QUrl will correct certain mistakes, notably the presence of
1811 a percent character ('%') not followed by two hexadecimal digits, and it
1812 will accept any character in any position. In StrictMode, encoding mistakes
1813 will not be tolerated and QUrl will also check that certain forbidden
1814 characters are not present in unencoded form. If an error is detected in
1815 StrictMode, isValid() will return false. The parsing mode DecodedMode is not
1816 permitted in this context.
1817
1818 Example:
1819
1820 \snippet code/src_corelib_io_qurl.cpp 0
1821
1822 To construct a URL from an encoded string, you can also use fromEncoded():
1823
1824 \snippet code/src_corelib_io_qurl.cpp 1
1825
1826 Both functions are equivalent and, in Qt 5, both functions accept encoded
1827 data. Usually, the choice of the QUrl constructor or setUrl() versus
1828 fromEncoded() will depend on the source data: the constructor and setUrl()
1829 take a QString, whereas fromEncoded takes a QByteArray.
1830
1831 \sa setUrl(), fromEncoded(), TolerantMode
1832*/
1833QUrl::QUrl(const QString &url, ParsingMode parsingMode) : d(nullptr)
1834{
1835 setUrl(url, mode: parsingMode);
1836}
1837
1838/*!
1839 Constructs an empty QUrl object.
1840*/
1841QUrl::QUrl() : d(nullptr)
1842{
1843}
1844
1845/*!
1846 Constructs a copy of \a other.
1847*/
1848QUrl::QUrl(const QUrl &other) noexcept : d(other.d)
1849{
1850 if (d)
1851 d->ref.ref();
1852}
1853
1854/*!
1855 Destructor; called immediately before the object is deleted.
1856*/
1857QUrl::~QUrl()
1858{
1859 if (d && !d->ref.deref())
1860 delete d;
1861}
1862
1863/*!
1864 Returns \c true if the URL is non-empty and valid; otherwise returns \c false.
1865
1866 The URL is run through a conformance test. Every part of the URL
1867 must conform to the standard encoding rules of the URI standard
1868 for the URL to be reported as valid.
1869
1870 \snippet code/src_corelib_io_qurl.cpp 2
1871*/
1872bool QUrl::isValid() const
1873{
1874 if (isEmpty()) {
1875 // also catches d == nullptr
1876 return false;
1877 }
1878 return d->validityError() == QUrlPrivate::NoError;
1879}
1880
1881/*!
1882 Returns \c true if the URL has no data; otherwise returns \c false.
1883
1884 \sa clear()
1885*/
1886bool QUrl::isEmpty() const
1887{
1888 if (!d) return true;
1889 return d->isEmpty();
1890}
1891
1892/*!
1893 Resets the content of the QUrl. After calling this function, the
1894 QUrl is equal to one that has been constructed with the default
1895 empty constructor.
1896
1897 \sa isEmpty()
1898*/
1899void QUrl::clear()
1900{
1901 if (d && !d->ref.deref())
1902 delete d;
1903 d = nullptr;
1904}
1905
1906/*!
1907 Parses \a url and sets this object to that value. QUrl will automatically
1908 percent encode all characters that are not allowed in a URL and decode the
1909 percent-encoded sequences that represent an unreserved character (letters,
1910 digits, hyphens, underscores, dots and tildes). All other characters are
1911 left in their original forms.
1912
1913 Parses the \a url using the parser mode \a parsingMode. In TolerantMode
1914 (the default), QUrl will correct certain mistakes, notably the presence of
1915 a percent character ('%') not followed by two hexadecimal digits, and it
1916 will accept any character in any position. In StrictMode, encoding mistakes
1917 will not be tolerated and QUrl will also check that certain forbidden
1918 characters are not present in unencoded form. If an error is detected in
1919 StrictMode, isValid() will return false. The parsing mode DecodedMode is
1920 not permitted in this context and will produce a run-time warning.
1921
1922 \sa url(), toString()
1923*/
1924void QUrl::setUrl(const QString &url, ParsingMode parsingMode)
1925{
1926 if (parsingMode == DecodedMode) {
1927 qWarning(msg: "QUrl: QUrl::DecodedMode is not permitted when parsing a full URL");
1928 } else {
1929 detachToClear();
1930 d->parse(url, parsingMode);
1931 }
1932}
1933
1934/*!
1935 Sets the scheme of the URL to \a scheme. As a scheme can only
1936 contain ASCII characters, no conversion or decoding is done on the
1937 input. It must also start with an ASCII letter.
1938
1939 The scheme describes the type (or protocol) of the URL. It's
1940 represented by one or more ASCII characters at the start the URL.
1941
1942 A scheme is strictly \l {RFC 3986}-compliant:
1943 \tt {scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )}
1944
1945 The following example shows a URL where the scheme is "ftp":
1946
1947 \image qurl-authority2.png
1948
1949 To set the scheme, the following call is used:
1950 \snippet code/src_corelib_io_qurl.cpp 11
1951
1952 The scheme can also be empty, in which case the URL is interpreted
1953 as relative.
1954
1955 \sa scheme(), isRelative()
1956*/
1957void QUrl::setScheme(const QString &scheme)
1958{
1959 detach();
1960 d->clearError();
1961 if (scheme.isEmpty()) {
1962 // schemes are not allowed to be empty
1963 d->sectionIsPresent &= ~QUrlPrivate::Scheme;
1964 d->flags &= ~QUrlPrivate::IsLocalFile;
1965 d->scheme.clear();
1966 } else {
1967 d->setScheme(value: scheme, len: scheme.size(), /* do set error */ doSetError: true);
1968 }
1969}
1970
1971/*!
1972 Returns the scheme of the URL. If an empty string is returned,
1973 this means the scheme is undefined and the URL is then relative.
1974
1975 The scheme can only contain US-ASCII letters or digits, which means it
1976 cannot contain any character that would otherwise require encoding.
1977 Additionally, schemes are always returned in lowercase form.
1978
1979 \sa setScheme(), isRelative()
1980*/
1981QString QUrl::scheme() const
1982{
1983 if (!d) return QString();
1984
1985 return d->scheme;
1986}
1987
1988/*!
1989 Sets the authority of the URL to \a authority.
1990
1991 The authority of a URL is the combination of user info, a host
1992 name and a port. All of these elements are optional; an empty
1993 authority is therefore valid.
1994
1995 The user info and host are separated by a '@', and the host and
1996 port are separated by a ':'. If the user info is empty, the '@'
1997 must be omitted; although a stray ':' is permitted if the port is
1998 empty.
1999
2000 The following example shows a valid authority string:
2001
2002 \image qurl-authority.png
2003
2004 The \a authority data is interpreted according to \a mode: in StrictMode,
2005 any '%' characters must be followed by exactly two hexadecimal characters
2006 and some characters (including space) are not allowed in undecoded form. In
2007 TolerantMode (the default), all characters are accepted in undecoded form
2008 and the tolerant parser will correct stray '%' not followed by two hex
2009 characters.
2010
2011 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2012 decoded data, call setUserName(), setPassword(), setHost() and setPort()
2013 individually.
2014
2015 \sa setUserInfo(), setHost(), setPort()
2016*/
2017void QUrl::setAuthority(const QString &authority, ParsingMode mode)
2018{
2019 detach();
2020 d->clearError();
2021
2022 if (mode == DecodedMode) {
2023 qWarning(msg: "QUrl::setAuthority(): QUrl::DecodedMode is not permitted in this function");
2024 return;
2025 }
2026
2027 d->setAuthority(auth: authority, from: 0, end: authority.size(), mode);
2028}
2029
2030/*!
2031 Returns the authority of the URL if it is defined; otherwise
2032 an empty string is returned.
2033
2034 This function returns an unambiguous value, which may contain that
2035 characters still percent-encoded, plus some control sequences not
2036 representable in decoded form in QString.
2037
2038 The \a options argument controls how to format the user info component. The
2039 value of QUrl::FullyDecoded is not permitted in this function. If you need
2040 to obtain fully decoded data, call userName(), password(), host() and
2041 port() individually.
2042
2043 \sa setAuthority(), userInfo(), userName(), password(), host(), port()
2044*/
2045QString QUrl::authority(ComponentFormattingOptions options) const
2046{
2047 QString result;
2048 if (!d)
2049 return result;
2050
2051 if (options == QUrl::FullyDecoded) {
2052 qWarning(msg: "QUrl::authority(): QUrl::FullyDecoded is not permitted in this function");
2053 return result;
2054 }
2055
2056 d->appendAuthority(appendTo&: result, options, appendingTo: QUrlPrivate::Authority);
2057 return result;
2058}
2059
2060/*!
2061 Sets the user info of the URL to \a userInfo. The user info is an
2062 optional part of the authority of the URL, as described in
2063 setAuthority().
2064
2065 The user info consists of a user name and optionally a password,
2066 separated by a ':'. If the password is empty, the colon must be
2067 omitted. The following example shows a valid user info string:
2068
2069 \image qurl-authority3.png
2070
2071 The \a userInfo data is interpreted according to \a mode: in StrictMode,
2072 any '%' characters must be followed by exactly two hexadecimal characters
2073 and some characters (including space) are not allowed in undecoded form. In
2074 TolerantMode (the default), all characters are accepted in undecoded form
2075 and the tolerant parser will correct stray '%' not followed by two hex
2076 characters.
2077
2078 This function does not allow \a mode to be QUrl::DecodedMode. To set fully
2079 decoded data, call setUserName() and setPassword() individually.
2080
2081 \sa userInfo(), setUserName(), setPassword(), setAuthority()
2082*/
2083void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
2084{
2085 detach();
2086 d->clearError();
2087 QString trimmed = userInfo.trimmed();
2088 if (mode == DecodedMode) {
2089 qWarning(msg: "QUrl::setUserInfo(): QUrl::DecodedMode is not permitted in this function");
2090 return;
2091 }
2092
2093 d->setUserInfo(value: std::move(trimmed), mode);
2094 if (userInfo.isNull()) {
2095 // QUrlPrivate::setUserInfo cleared almost everything
2096 // but it leaves the UserName bit set
2097 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2098 } else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserInfo, input: userInfo)) {
2099 d->sectionIsPresent &= ~QUrlPrivate::UserInfo;
2100 d->userName.clear();
2101 d->password.clear();
2102 }
2103}
2104
2105/*!
2106 Returns the user info of the URL, or an empty string if the user
2107 info is undefined.
2108
2109 This function returns an unambiguous value, which may contain that
2110 characters still percent-encoded, plus some control sequences not
2111 representable in decoded form in QString.
2112
2113 The \a options argument controls how to format the user info component. The
2114 value of QUrl::FullyDecoded is not permitted in this function. If you need
2115 to obtain fully decoded data, call userName() and password() individually.
2116
2117 \sa setUserInfo(), userName(), password(), authority()
2118*/
2119QString QUrl::userInfo(ComponentFormattingOptions options) const
2120{
2121 QString result;
2122 if (!d)
2123 return result;
2124
2125 if (options == QUrl::FullyDecoded) {
2126 qWarning(msg: "QUrl::userInfo(): QUrl::FullyDecoded is not permitted in this function");
2127 return result;
2128 }
2129
2130 d->appendUserInfo(appendTo&: result, options, appendingTo: QUrlPrivate::UserInfo);
2131 return result;
2132}
2133
2134/*!
2135 Sets the URL's user name to \a userName. The \a userName is part
2136 of the user info element in the authority of the URL, as described
2137 in setUserInfo().
2138
2139 The \a userName data is interpreted according to \a mode: in StrictMode,
2140 any '%' characters must be followed by exactly two hexadecimal characters
2141 and some characters (including space) are not allowed in undecoded form. In
2142 TolerantMode (the default), all characters are accepted in undecoded form
2143 and the tolerant parser will correct stray '%' not followed by two hex
2144 characters. In DecodedMode, '%' stand for themselves and encoded characters
2145 are not possible.
2146
2147 QUrl::DecodedMode should be used when setting the user name from a data
2148 source which is not a URL, such as a password dialog shown to the user or
2149 with a user name obtained by calling userName() with the QUrl::FullyDecoded
2150 formatting option.
2151
2152 \sa userName(), setUserInfo()
2153*/
2154void QUrl::setUserName(const QString &userName, ParsingMode mode)
2155{
2156 detach();
2157 d->clearError();
2158
2159 d->setUserName(value: userName, mode);
2160 if (userName.isNull())
2161 d->sectionIsPresent &= ~QUrlPrivate::UserName;
2162 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::UserName, input: userName))
2163 d->userName.clear();
2164}
2165
2166/*!
2167 Returns the user name of the URL if it is defined; otherwise
2168 an empty string is returned.
2169
2170 The \a options argument controls how to format the user name component. All
2171 values produce an unambiguous result. With QUrl::FullyDecoded, all
2172 percent-encoded sequences are decoded; otherwise, the returned value may
2173 contain some percent-encoded sequences for some control sequences not
2174 representable in decoded form in QString.
2175
2176 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2177 sequences are present. It is recommended to use that value when the result
2178 will be used in a non-URL context, such as setting in QAuthenticator or
2179 negotiating a login.
2180
2181 \sa setUserName(), userInfo()
2182*/
2183QString QUrl::userName(ComponentFormattingOptions options) const
2184{
2185 QString result;
2186 if (d)
2187 d->appendUserName(appendTo&: result, options);
2188 return result;
2189}
2190
2191/*!
2192 Sets the URL's password to \a password. The \a password is part of
2193 the user info element in the authority of the URL, as described in
2194 setUserInfo().
2195
2196 The \a password data is interpreted according to \a mode: in StrictMode,
2197 any '%' characters must be followed by exactly two hexadecimal characters
2198 and some characters (including space) are not allowed in undecoded form. In
2199 TolerantMode, all characters are accepted in undecoded form and the
2200 tolerant parser will correct stray '%' not followed by two hex characters.
2201 In DecodedMode, '%' stand for themselves and encoded characters are not
2202 possible.
2203
2204 QUrl::DecodedMode should be used when setting the password from a data
2205 source which is not a URL, such as a password dialog shown to the user or
2206 with a password obtained by calling password() with the QUrl::FullyDecoded
2207 formatting option.
2208
2209 \sa password(), setUserInfo()
2210*/
2211void QUrl::setPassword(const QString &password, ParsingMode mode)
2212{
2213 detach();
2214 d->clearError();
2215
2216 d->setPassword(value: password, mode);
2217 if (password.isNull())
2218 d->sectionIsPresent &= ~QUrlPrivate::Password;
2219 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Password, input: password))
2220 d->password.clear();
2221}
2222
2223/*!
2224 Returns the password of the URL if it is defined; otherwise
2225 an empty string is returned.
2226
2227 The \a options argument controls how to format the user name component. All
2228 values produce an unambiguous result. With QUrl::FullyDecoded, all
2229 percent-encoded sequences are decoded; otherwise, the returned value may
2230 contain some percent-encoded sequences for some control sequences not
2231 representable in decoded form in QString.
2232
2233 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2234 sequences are present. It is recommended to use that value when the result
2235 will be used in a non-URL context, such as setting in QAuthenticator or
2236 negotiating a login.
2237
2238 \sa setPassword()
2239*/
2240QString QUrl::password(ComponentFormattingOptions options) const
2241{
2242 QString result;
2243 if (d)
2244 d->appendPassword(appendTo&: result, options);
2245 return result;
2246}
2247
2248/*!
2249 Sets the host of the URL to \a host. The host is part of the
2250 authority.
2251
2252 The \a host data is interpreted according to \a mode: in StrictMode,
2253 any '%' characters must be followed by exactly two hexadecimal characters
2254 and some characters (including space) are not allowed in undecoded form. In
2255 TolerantMode, all characters are accepted in undecoded form and the
2256 tolerant parser will correct stray '%' not followed by two hex characters.
2257 In DecodedMode, '%' stand for themselves and encoded characters are not
2258 possible.
2259
2260 Note that, in all cases, the result of the parsing must be a valid hostname
2261 according to STD 3 rules, as modified by the Internationalized Resource
2262 Identifiers specification (RFC 3987). Invalid hostnames are not permitted
2263 and will cause isValid() to become false.
2264
2265 \sa host(), setAuthority()
2266*/
2267void QUrl::setHost(const QString &host, ParsingMode mode)
2268{
2269 detach();
2270 d->clearError();
2271
2272 QString data = host;
2273 if (mode == DecodedMode) {
2274 data.replace(c: u'%', after: "%25"_L1);
2275 mode = TolerantMode;
2276 }
2277
2278 if (d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2279 return;
2280 } else if (!data.startsWith(c: u'[')) {
2281 // setHost failed, it might be IPv6 or IPvFuture in need of bracketing
2282 Q_ASSERT(d->error);
2283
2284 data.prepend(c: u'[');
2285 data.append(c: u']');
2286 if (!d->setHost(value: data, from: 0, iend: data.size(), mode)) {
2287 // failed again
2288 if (data.contains(c: u':')) {
2289 // source data contains ':', so it's an IPv6 error
2290 d->error->code = QUrlPrivate::InvalidIPv6AddressError;
2291 }
2292 d->sectionIsPresent &= ~QUrlPrivate::Host;
2293 } else {
2294 // succeeded
2295 d->clearError();
2296 }
2297 }
2298}
2299
2300/*!
2301 Returns the host of the URL if it is defined; otherwise
2302 an empty string is returned.
2303
2304 The \a options argument controls how the hostname will be formatted. The
2305 QUrl::EncodeUnicode option will cause this function to return the hostname
2306 in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
2307 channels that are not 8-bit clean or that require the legacy hostname (such
2308 as DNS requests or in HTTP request headers). If that flag is not present,
2309 this function returns the International Domain Name (IDN) in Unicode form,
2310 according to the list of permissible top-level domains (see
2311 idnWhitelist()).
2312
2313 All other flags are ignored. Host names cannot contain control or percent
2314 characters, so the returned value can be considered fully decoded.
2315
2316 \sa setHost(), idnWhitelist(), setIdnWhitelist(), authority()
2317*/
2318QString QUrl::host(ComponentFormattingOptions options) const
2319{
2320 QString result;
2321 if (d) {
2322 d->appendHost(appendTo&: result, options);
2323 if (result.startsWith(c: u'['))
2324 result = result.mid(position: 1, n: result.size() - 2);
2325 }
2326 return result;
2327}
2328
2329/*!
2330 Sets the port of the URL to \a port. The port is part of the
2331 authority of the URL, as described in setAuthority().
2332
2333 \a port must be between 0 and 65535 inclusive. Setting the
2334 port to -1 indicates that the port is unspecified.
2335*/
2336void QUrl::setPort(int port)
2337{
2338 detach();
2339 d->clearError();
2340
2341 if (port < -1 || port > 65535) {
2342 d->setError(errorCode: QUrlPrivate::InvalidPortError, source: QString::number(port), supplement: 0);
2343 port = -1;
2344 }
2345
2346 d->port = port;
2347 if (port != -1)
2348 d->sectionIsPresent |= QUrlPrivate::Host;
2349}
2350
2351/*!
2352 \since 4.1
2353
2354 Returns the port of the URL, or \a defaultPort if the port is
2355 unspecified.
2356
2357 Example:
2358
2359 \snippet code/src_corelib_io_qurl.cpp 3
2360*/
2361int QUrl::port(int defaultPort) const
2362{
2363 if (!d) return defaultPort;
2364 return d->port == -1 ? defaultPort : d->port;
2365}
2366
2367/*!
2368 Sets the path of the URL to \a path. The path is the part of the
2369 URL that comes after the authority but before the query string.
2370
2371 \image qurl-ftppath.png
2372
2373 For non-hierarchical schemes, the path will be everything
2374 following the scheme declaration, as in the following example:
2375
2376 \image qurl-mailtopath.png
2377
2378 The \a path data is interpreted according to \a mode: in StrictMode,
2379 any '%' characters must be followed by exactly two hexadecimal characters
2380 and some characters (including space) are not allowed in undecoded form. In
2381 TolerantMode, all characters are accepted in undecoded form and the
2382 tolerant parser will correct stray '%' not followed by two hex characters.
2383 In DecodedMode, '%' stand for themselves and encoded characters are not
2384 possible.
2385
2386 QUrl::DecodedMode should be used when setting the path from a data source
2387 which is not a URL, such as a dialog shown to the user or with a path
2388 obtained by calling path() with the QUrl::FullyDecoded formatting option.
2389
2390 \sa path()
2391*/
2392void QUrl::setPath(const QString &path, ParsingMode mode)
2393{
2394 detach();
2395 d->clearError();
2396
2397 d->setPath(value: path, mode);
2398
2399 // optimized out, since there is no path delimiter
2400// if (path.isNull())
2401// d->sectionIsPresent &= ~QUrlPrivate::Path;
2402// else
2403 if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Path, input: path))
2404 d->path.clear();
2405}
2406
2407/*!
2408 Returns the path of the URL.
2409
2410 \snippet code/src_corelib_io_qurl.cpp 12
2411
2412 The \a options argument controls how to format the path component. All
2413 values produce an unambiguous result. With QUrl::FullyDecoded, all
2414 percent-encoded sequences are decoded; otherwise, the returned value may
2415 contain some percent-encoded sequences for some control sequences not
2416 representable in decoded form in QString.
2417
2418 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2419 sequences are present. It is recommended to use that value when the result
2420 will be used in a non-URL context, such as sending to an FTP server.
2421
2422 An example of data loss is when you have non-Unicode percent-encoded sequences
2423 and use FullyDecoded (the default):
2424
2425 \snippet code/src_corelib_io_qurl.cpp 13
2426
2427 In this example, there will be some level of data loss because the \c %FF cannot
2428 be converted.
2429
2430 Data loss can also occur when the path contains sub-delimiters (such as \c +):
2431
2432 \snippet code/src_corelib_io_qurl.cpp 14
2433
2434 Other decoding examples:
2435
2436 \snippet code/src_corelib_io_qurl.cpp 15
2437
2438 \sa setPath()
2439*/
2440QString QUrl::path(ComponentFormattingOptions options) const
2441{
2442 QString result;
2443 if (d)
2444 d->appendPath(appendTo&: result, options, appendingTo: QUrlPrivate::Path);
2445 return result;
2446}
2447
2448/*!
2449 \since 5.2
2450
2451 Returns the name of the file, excluding the directory path.
2452
2453 Note that, if this QUrl object is given a path ending in a slash, the name of the file is considered empty.
2454
2455 If the path doesn't contain any slash, it is fully returned as the fileName.
2456
2457 Example:
2458
2459 \snippet code/src_corelib_io_qurl.cpp 7
2460
2461 The \a options argument controls how to format the file name component. All
2462 values produce an unambiguous result. With QUrl::FullyDecoded, all
2463 percent-encoded sequences are decoded; otherwise, the returned value may
2464 contain some percent-encoded sequences for some control sequences not
2465 representable in decoded form in QString.
2466
2467 \sa path()
2468*/
2469QString QUrl::fileName(ComponentFormattingOptions options) const
2470{
2471 const QString ourPath = path(options);
2472 const qsizetype slash = ourPath.lastIndexOf(c: u'/');
2473 if (slash == -1)
2474 return ourPath;
2475 return ourPath.mid(position: slash + 1);
2476}
2477
2478/*!
2479 \since 4.2
2480
2481 Returns \c true if this URL contains a Query (i.e., if ? was seen on it).
2482
2483 \sa setQuery(), query(), hasFragment()
2484*/
2485bool QUrl::hasQuery() const
2486{
2487 if (!d) return false;
2488 return d->hasQuery();
2489}
2490
2491/*!
2492 Sets the query string of the URL to \a query.
2493
2494 This function is useful if you need to pass a query string that
2495 does not fit into the key-value pattern, or that uses a different
2496 scheme for encoding special characters than what is suggested by
2497 QUrl.
2498
2499 Passing a value of QString() to \a query (a null QString) unsets
2500 the query completely. However, passing a value of QString("")
2501 will set the query to an empty value, as if the original URL
2502 had a lone "?".
2503
2504 The \a query data is interpreted according to \a mode: in StrictMode,
2505 any '%' characters must be followed by exactly two hexadecimal characters
2506 and some characters (including space) are not allowed in undecoded form. In
2507 TolerantMode, all characters are accepted in undecoded form and the
2508 tolerant parser will correct stray '%' not followed by two hex characters.
2509 In DecodedMode, '%' stand for themselves and encoded characters are not
2510 possible.
2511
2512 Query strings often contain percent-encoded sequences, so use of
2513 DecodedMode is discouraged. One special sequence to be aware of is that of
2514 the plus character ('+'). QUrl does not convert spaces to plus characters,
2515 even though HTML forms posted by web browsers do. In order to represent an
2516 actual plus character in a query, the sequence "%2B" is usually used. This
2517 function will leave "%2B" sequences untouched in TolerantMode or
2518 StrictMode.
2519
2520 \sa query(), hasQuery()
2521*/
2522void QUrl::setQuery(const QString &query, ParsingMode mode)
2523{
2524 detach();
2525 d->clearError();
2526
2527 d->setQuery(value: query, mode);
2528 if (query.isNull())
2529 d->sectionIsPresent &= ~QUrlPrivate::Query;
2530 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Query, input: query))
2531 d->query.clear();
2532}
2533
2534/*!
2535 \overload
2536 \since 5.0
2537 Sets the query string of the URL to \a query.
2538
2539 This function reconstructs the query string from the QUrlQuery object and
2540 sets on this QUrl object. This function does not have parsing parameters
2541 because the QUrlQuery contains data that is already parsed.
2542
2543 \sa query(), hasQuery()
2544*/
2545void QUrl::setQuery(const QUrlQuery &query)
2546{
2547 detach();
2548 d->clearError();
2549
2550 // we know the data is in the right format
2551 d->query = query.toString();
2552 if (query.isEmpty())
2553 d->sectionIsPresent &= ~QUrlPrivate::Query;
2554 else
2555 d->sectionIsPresent |= QUrlPrivate::Query;
2556}
2557
2558/*!
2559 Returns the query string of the URL if there's a query string, or an empty
2560 result if not. To determine if the parsed URL contained a query string, use
2561 hasQuery().
2562
2563 The \a options argument controls how to format the query component. All
2564 values produce an unambiguous result. With QUrl::FullyDecoded, all
2565 percent-encoded sequences are decoded; otherwise, the returned value may
2566 contain some percent-encoded sequences for some control sequences not
2567 representable in decoded form in QString.
2568
2569 Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
2570 often contain data that is supposed to remain percent-encoded, including
2571 the use of the "%2B" sequence to represent a plus character ('+').
2572
2573 \sa setQuery(), hasQuery()
2574*/
2575QString QUrl::query(ComponentFormattingOptions options) const
2576{
2577 QString result;
2578 if (d) {
2579 d->appendQuery(appendTo&: result, options, appendingTo: QUrlPrivate::Query);
2580 if (d->hasQuery() && result.isNull())
2581 result.detach();
2582 }
2583 return result;
2584}
2585
2586/*!
2587 Sets the fragment of the URL to \a fragment. The fragment is the
2588 last part of the URL, represented by a '#' followed by a string of
2589 characters. It is typically used in HTTP for referring to a
2590 certain link or point on a page:
2591
2592 \image qurl-fragment.png
2593
2594 The fragment is sometimes also referred to as the URL "reference".
2595
2596 Passing an argument of QString() (a null QString) will unset the fragment.
2597 Passing an argument of QString("") (an empty but not null QString) will set the
2598 fragment to an empty string (as if the original URL had a lone "#").
2599
2600 The \a fragment data is interpreted according to \a mode: in StrictMode,
2601 any '%' characters must be followed by exactly two hexadecimal characters
2602 and some characters (including space) are not allowed in undecoded form. In
2603 TolerantMode, all characters are accepted in undecoded form and the
2604 tolerant parser will correct stray '%' not followed by two hex characters.
2605 In DecodedMode, '%' stand for themselves and encoded characters are not
2606 possible.
2607
2608 QUrl::DecodedMode should be used when setting the fragment from a data
2609 source which is not a URL or with a fragment obtained by calling
2610 fragment() with the QUrl::FullyDecoded formatting option.
2611
2612 \sa fragment(), hasFragment()
2613*/
2614void QUrl::setFragment(const QString &fragment, ParsingMode mode)
2615{
2616 detach();
2617 d->clearError();
2618
2619 d->setFragment(value: fragment, mode);
2620 if (fragment.isNull())
2621 d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2622 else if (mode == StrictMode && !d->validateComponent(section: QUrlPrivate::Fragment, input: fragment))
2623 d->fragment.clear();
2624}
2625
2626/*!
2627 Returns the fragment of the URL. To determine if the parsed URL contained a
2628 fragment, use hasFragment().
2629
2630 The \a options argument controls how to format the fragment component. All
2631 values produce an unambiguous result. With QUrl::FullyDecoded, all
2632 percent-encoded sequences are decoded; otherwise, the returned value may
2633 contain some percent-encoded sequences for some control sequences not
2634 representable in decoded form in QString.
2635
2636 Note that QUrl::FullyDecoded may cause data loss if those non-representable
2637 sequences are present. It is recommended to use that value when the result
2638 will be used in a non-URL context.
2639
2640 \sa setFragment(), hasFragment()
2641*/
2642QString QUrl::fragment(ComponentFormattingOptions options) const
2643{
2644 QString result;
2645 if (d) {
2646 d->appendFragment(appendTo&: result, options, appendingTo: QUrlPrivate::Fragment);
2647 if (d->hasFragment() && result.isNull())
2648 result.detach();
2649 }
2650 return result;
2651}
2652
2653/*!
2654 \since 4.2
2655
2656 Returns \c true if this URL contains a fragment (i.e., if # was seen on it).
2657
2658 \sa fragment(), setFragment()
2659*/
2660bool QUrl::hasFragment() const
2661{
2662 if (!d) return false;
2663 return d->hasFragment();
2664}
2665
2666/*!
2667 Returns the result of the merge of this URL with \a relative. This
2668 URL is used as a base to convert \a relative to an absolute URL.
2669
2670 If \a relative is not a relative URL, this function will return \a
2671 relative directly. Otherwise, the paths of the two URLs are
2672 merged, and the new URL returned has the scheme and authority of
2673 the base URL, but with the merged path, as in the following
2674 example:
2675
2676 \snippet code/src_corelib_io_qurl.cpp 5
2677
2678 Calling resolved() with ".." returns a QUrl whose directory is
2679 one level higher than the original. Similarly, calling resolved()
2680 with "../.." removes two levels from the path. If \a relative is
2681 "/", the path becomes "/".
2682
2683 \sa isRelative()
2684*/
2685QUrl QUrl::resolved(const QUrl &relative) const
2686{
2687 if (!d) return relative;
2688 if (!relative.d) return *this;
2689
2690 QUrl t;
2691 if (!relative.d->scheme.isEmpty()) {
2692 t = relative;
2693 t.detach();
2694 } else {
2695 if (relative.d->hasAuthority()) {
2696 t = relative;
2697 t.detach();
2698 } else {
2699 t.d = new QUrlPrivate;
2700
2701 // copy the authority
2702 t.d->userName = d->userName;
2703 t.d->password = d->password;
2704 t.d->host = d->host;
2705 t.d->port = d->port;
2706 t.d->sectionIsPresent = d->sectionIsPresent & QUrlPrivate::Authority;
2707
2708 if (relative.d->path.isEmpty()) {
2709 t.d->path = d->path;
2710 if (relative.d->hasQuery()) {
2711 t.d->query = relative.d->query;
2712 t.d->sectionIsPresent |= QUrlPrivate::Query;
2713 } else if (d->hasQuery()) {
2714 t.d->query = d->query;
2715 t.d->sectionIsPresent |= QUrlPrivate::Query;
2716 }
2717 } else {
2718 t.d->path = relative.d->path.startsWith(c: u'/')
2719 ? relative.d->path
2720 : d->mergePaths(relativePath: relative.d->path);
2721 if (relative.d->hasQuery()) {
2722 t.d->query = relative.d->query;
2723 t.d->sectionIsPresent |= QUrlPrivate::Query;
2724 }
2725 }
2726 }
2727 t.d->scheme = d->scheme;
2728 if (d->hasScheme())
2729 t.d->sectionIsPresent |= QUrlPrivate::Scheme;
2730 else
2731 t.d->sectionIsPresent &= ~QUrlPrivate::Scheme;
2732 t.d->flags |= d->flags & QUrlPrivate::IsLocalFile;
2733 }
2734 t.d->fragment = relative.d->fragment;
2735 if (relative.d->hasFragment())
2736 t.d->sectionIsPresent |= QUrlPrivate::Fragment;
2737 else
2738 t.d->sectionIsPresent &= ~QUrlPrivate::Fragment;
2739
2740 t.d->normalizePathSegments(path: &t.d->path);
2741 if (!t.d->hasAuthority()) {
2742 if (t.d->isLocalFile() && t.d->path.startsWith(c: u'/'))
2743 t.d->sectionIsPresent |= QUrlPrivate::Host;
2744 else
2745 fixupNonAuthorityPath(path: &t.d->path);
2746 }
2747
2748#if defined(QURL_DEBUG)
2749 qDebug("QUrl(\"%ls\").resolved(\"%ls\") = \"%ls\"",
2750 qUtf16Printable(url()),
2751 qUtf16Printable(relative.url()),
2752 qUtf16Printable(t.url()));
2753#endif
2754 return t;
2755}
2756
2757/*!
2758 Returns \c true if the URL is relative; otherwise returns \c false. A URL is
2759 relative reference if its scheme is undefined; this function is therefore
2760 equivalent to calling scheme().isEmpty().
2761
2762 Relative references are defined in RFC 3986 section 4.2.
2763
2764 \sa {Relative URLs vs Relative Paths}
2765*/
2766bool QUrl::isRelative() const
2767{
2768 if (!d) return true;
2769 return !d->hasScheme();
2770}
2771
2772/*!
2773 Returns a string representation of the URL. The output can be customized by
2774 passing flags with \a options. The option QUrl::FullyDecoded is not
2775 permitted in this function since it would generate ambiguous data.
2776
2777 The resulting QString can be passed back to a QUrl later on.
2778
2779 Synonym for toString(options).
2780
2781 \sa FormattingOptions, toEncoded(), toString()
2782*/
2783QString QUrl::url(FormattingOptions options) const
2784{
2785 return toString(options);
2786}
2787
2788/*!
2789 Returns a string representation of the URL. The output can be customized by
2790 passing flags with \a options. The option QUrl::FullyDecoded is not
2791 permitted in this function since it would generate ambiguous data.
2792
2793 The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
2794
2795 \sa FormattingOptions, url(), setUrl()
2796*/
2797QString QUrl::toString(FormattingOptions options) const
2798{
2799 QString url;
2800 if (!isValid()) {
2801 // also catches isEmpty()
2802 return url;
2803 }
2804 if ((options & QUrl::FullyDecoded) == QUrl::FullyDecoded) {
2805 qWarning(msg: "QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
2806 options &= ~QUrl::FullyDecoded;
2807 //options |= QUrl::PrettyDecoded; // no-op, value is 0
2808 }
2809
2810 // return just the path if:
2811 // - QUrl::PreferLocalFile is passed
2812 // - QUrl::RemovePath isn't passed (rather stupid if the user did...)
2813 // - there's no query or fragment to return
2814 // that is, either they aren't present, or we're removing them
2815 // - it's a local file
2816 if (options.testFlag(f: QUrl::PreferLocalFile) && !options.testFlag(f: QUrl::RemovePath)
2817 && (!d->hasQuery() || options.testFlag(f: QUrl::RemoveQuery))
2818 && (!d->hasFragment() || options.testFlag(f: QUrl::RemoveFragment))
2819 && isLocalFile()) {
2820 url = d->toLocalFile(options: options | QUrl::FullyDecoded);
2821 return url;
2822 }
2823
2824 // for the full URL, we consider that the reserved characters are prettier if encoded
2825 if (options & DecodeReserved)
2826 options &= ~EncodeReserved;
2827 else
2828 options |= EncodeReserved;
2829
2830 if (!(options & QUrl::RemoveScheme) && d->hasScheme())
2831 url += d->scheme + u':';
2832
2833 bool pathIsAbsolute = d->path.startsWith(c: u'/');
2834 if (!((options & QUrl::RemoveAuthority) == QUrl::RemoveAuthority) && d->hasAuthority()) {
2835 url += "//"_L1;
2836 d->appendAuthority(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2837 } else if (isLocalFile() && pathIsAbsolute) {
2838 // Comply with the XDG file URI spec, which requires triple slashes.
2839 url += "//"_L1;
2840 }
2841
2842 if (!(options & QUrl::RemovePath))
2843 d->appendPath(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2844
2845 if (!(options & QUrl::RemoveQuery) && d->hasQuery()) {
2846 url += u'?';
2847 d->appendQuery(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2848 }
2849 if (!(options & QUrl::RemoveFragment) && d->hasFragment()) {
2850 url += u'#';
2851 d->appendFragment(appendTo&: url, options, appendingTo: QUrlPrivate::FullUrl);
2852 }
2853
2854 return url;
2855}
2856
2857/*!
2858 \since 5.0
2859
2860 Returns a human-displayable string representation of the URL.
2861 The output can be customized by passing flags with \a options.
2862 The option RemovePassword is always enabled, since passwords
2863 should never be shown back to users.
2864
2865 With the default options, the resulting QString can be passed back
2866 to a QUrl later on, but any password that was present initially will
2867 be lost.
2868
2869 \sa FormattingOptions, toEncoded(), toString()
2870*/
2871
2872QString QUrl::toDisplayString(FormattingOptions options) const
2873{
2874 return toString(options: options | RemovePassword);
2875}
2876
2877/*!
2878 \since 5.2
2879
2880 Returns an adjusted version of the URL.
2881 The output can be customized by passing flags with \a options.
2882
2883 The encoding options from QUrl::ComponentFormattingOption don't make
2884 much sense for this method, nor does QUrl::PreferLocalFile.
2885
2886 This is always equivalent to QUrl(url.toString(options)).
2887
2888 \sa FormattingOptions, toEncoded(), toString()
2889*/
2890QUrl QUrl::adjusted(QUrl::FormattingOptions options) const
2891{
2892 if (!isValid()) {
2893 // also catches isEmpty()
2894 return QUrl();
2895 }
2896 QUrl that = *this;
2897 if (options & RemoveScheme)
2898 that.setScheme(QString());
2899 if ((options & RemoveAuthority) == RemoveAuthority) {
2900 that.setAuthority(authority: QString());
2901 } else {
2902 if ((options & RemoveUserInfo) == RemoveUserInfo)
2903 that.setUserInfo(userInfo: QString());
2904 else if (options & RemovePassword)
2905 that.setPassword(password: QString());
2906 if (options & RemovePort)
2907 that.setPort(-1);
2908 }
2909 if (options & RemoveQuery)
2910 that.setQuery(query: QString());
2911 if (options & RemoveFragment)
2912 that.setFragment(fragment: QString());
2913 if (options & RemovePath) {
2914 that.setPath(path: QString());
2915 } else if (auto pathOpts = options & (StripTrailingSlash | RemoveFilename | NormalizePathSegments)) {
2916 that.detach();
2917 that.d->path.resize(size: 0);
2918 d->appendPath(appendTo&: that.d->path, options: pathOpts, appendingTo: QUrlPrivate::Path);
2919 }
2920 if (that.d->isLocalFile() && that.d->path.startsWith(c: u'/')) {
2921 // ensure absolute file URLs have an empty authority to comply with the
2922 // XDG file spec (note this may undo a RemoveAuthority)
2923 that.d->sectionIsPresent |= QUrlPrivate::Host;
2924 }
2925 return that;
2926}
2927
2928/*!
2929 Returns the encoded representation of the URL if it's valid;
2930 otherwise an empty QByteArray is returned. The output can be
2931 customized by passing flags with \a options.
2932
2933 The user info, path and fragment are all converted to UTF-8, and
2934 all non-ASCII characters are then percent encoded. The host name
2935 is encoded using Punycode.
2936*/
2937QByteArray QUrl::toEncoded(FormattingOptions options) const
2938{
2939 options &= ~(FullyDecoded | FullyEncoded);
2940 return toString(options: options | FullyEncoded).toLatin1();
2941}
2942
2943/*!
2944 Parses \a input and returns the corresponding QUrl. \a input is
2945 assumed to be in encoded form, containing only ASCII characters.
2946
2947 Parses the URL using \a mode. See setUrl() for more information on
2948 this parameter. QUrl::DecodedMode is not permitted in this context.
2949
2950 \note In Qt versions prior to 6.7, this function took a QByteArray, not
2951 QByteArrayView. If you experience compile errors, it's because your code
2952 is passing objects that are implicitly convertible to QByteArray, but not
2953 QByteArrayView. Wrap the corresponding argument in \c{QByteArray{~~~}} to
2954 make the cast explicit. This is backwards-compatible with old Qt versions.
2955
2956 \sa toEncoded(), setUrl()
2957*/
2958QUrl QUrl::fromEncoded(QByteArrayView input, ParsingMode mode)
2959{
2960 return QUrl(QString::fromUtf8(utf8: input), mode);
2961}
2962
2963/*!
2964 Returns a decoded copy of \a input. \a input is first decoded from
2965 percent encoding, then converted from UTF-8 to unicode.
2966
2967 \note Given invalid input (such as a string containing the sequence "%G5",
2968 which is not a valid hexadecimal number) the output will be invalid as
2969 well. As an example: the sequence "%G5" could be decoded to 'W'.
2970*/
2971QString QUrl::fromPercentEncoding(const QByteArray &input)
2972{
2973 QByteArray ba = QByteArray::fromPercentEncoding(pctEncoded: input);
2974 return QString::fromUtf8(ba);
2975}
2976
2977/*!
2978 Returns an encoded copy of \a input. \a input is first converted
2979 to UTF-8, and all ASCII-characters that are not in the unreserved group
2980 are percent encoded. To prevent characters from being percent encoded
2981 pass them to \a exclude. To force characters to be percent encoded pass
2982 them to \a include.
2983
2984 Unreserved is defined as:
2985 \tt {ALPHA / DIGIT / "-" / "." / "_" / "~"}
2986
2987 \snippet code/src_corelib_io_qurl.cpp 6
2988*/
2989QByteArray QUrl::toPercentEncoding(const QString &input, const QByteArray &exclude, const QByteArray &include)
2990{
2991 return input.toUtf8().toPercentEncoding(exclude, include);
2992}
2993
2994/*!
2995 \since 6.3
2996
2997 Returns the Unicode form of the given domain name
2998 \a domain, which is encoded in the ASCII Compatible Encoding (ACE).
2999 The output can be customized by passing flags with \a options.
3000 The result of this function is considered equivalent to \a domain.
3001
3002 If the value in \a domain cannot be encoded, it will be converted
3003 to QString and returned.
3004
3005 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3006 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
3007 of the Internationalizing Domain Names in Applications (IDNA) specification,
3008 which allows for domain names (like \c "example.com") to be written using
3009 non-US-ASCII characters.
3010*/
3011QString QUrl::fromAce(const QByteArray &domain, QUrl::AceProcessingOptions options)
3012{
3013 return qt_ACE_do(domain: QString::fromLatin1(ba: domain), op: NormalizeAce,
3014 dot: ForbidLeadingDot /*FIXME: make configurable*/, options);
3015}
3016
3017/*!
3018 \since 6.3
3019
3020 Returns the ASCII Compatible Encoding of the given domain name \a domain.
3021 The output can be customized by passing flags with \a options.
3022 The result of this function is considered equivalent to \a domain.
3023
3024 The ASCII-Compatible Encoding (ACE) is defined by RFC 3490, RFC 3491
3025 and RFC 3492 and updated by the Unicode Technical Standard #46. It is part
3026 of the Internationalizing Domain Names in Applications (IDNA) specification,
3027 which allows for domain names (like \c "example.com") to be written using
3028 non-US-ASCII characters.
3029
3030 This function returns an empty QByteArray if \a domain is not a valid
3031 hostname. Note, in particular, that IPv6 literals are not valid domain
3032 names.
3033*/
3034QByteArray QUrl::toAce(const QString &domain, AceProcessingOptions options)
3035{
3036 return qt_ACE_do(domain, op: ToAceOnly, dot: ForbidLeadingDot /*FIXME: make configurable*/, options)
3037 .toLatin1();
3038}
3039
3040/*!
3041 \internal
3042
3043 \fn bool QUrl::operator<(const QUrl &lhs, const QUrl &rhs)
3044
3045 Returns \c true if URL \a lhs is "less than" URL \a rhs. This
3046 provides a means of ordering URLs.
3047*/
3048
3049Qt::weak_ordering compareThreeWay(const QUrl &lhs, const QUrl &rhs)
3050{
3051 if (!lhs.d || !rhs.d) {
3052 bool thisIsEmpty = !lhs.d || lhs.d->isEmpty();
3053 bool thatIsEmpty = !rhs.d || rhs.d->isEmpty();
3054
3055 // sort an empty URL first
3056 if (thisIsEmpty) {
3057 if (!thatIsEmpty)
3058 return Qt::weak_ordering::less;
3059 else
3060 return Qt::weak_ordering::equivalent;
3061 } else {
3062 return Qt::weak_ordering::greater;
3063 }
3064 }
3065
3066 int cmp;
3067 cmp = lhs.d->scheme.compare(s: rhs.d->scheme);
3068 if (cmp != 0)
3069 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3070
3071 cmp = lhs.d->userName.compare(s: rhs.d->userName);
3072 if (cmp != 0)
3073 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3074
3075 cmp = lhs.d->password.compare(s: rhs.d->password);
3076 if (cmp != 0)
3077 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3078
3079 cmp = lhs.d->host.compare(s: rhs.d->host);
3080 if (cmp != 0)
3081 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3082
3083 if (lhs.d->port != rhs.d->port)
3084 return Qt::compareThreeWay(lhs: lhs.d->port, rhs: rhs.d->port);
3085
3086 cmp = lhs.d->path.compare(s: rhs.d->path);
3087 if (cmp != 0)
3088 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3089
3090 if (lhs.d->hasQuery() != rhs.d->hasQuery())
3091 return rhs.d->hasQuery() ? Qt::weak_ordering::less : Qt::weak_ordering::greater;
3092
3093 cmp = lhs.d->query.compare(s: rhs.d->query);
3094 if (cmp != 0)
3095 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3096
3097 if (lhs.d->hasFragment() != rhs.d->hasFragment())
3098 return rhs.d->hasFragment() ? Qt::weak_ordering::less : Qt::weak_ordering::greater;
3099
3100 cmp = lhs.d->fragment.compare(s: rhs.d->fragment);
3101 return Qt::compareThreeWay(lhs: cmp, rhs: 0);
3102}
3103
3104/*!
3105 \fn bool QUrl::operator==(const QUrl &lhs, const QUrl &rhs)
3106
3107 Returns \c true if \a lhs and \a rhs URLs are equivalent;
3108 otherwise returns \c false.
3109
3110 \sa matches()
3111*/
3112
3113bool comparesEqual(const QUrl &lhs, const QUrl &rhs)
3114{
3115 if (!lhs.d && !rhs.d)
3116 return true;
3117 if (!lhs.d)
3118 return rhs.d->isEmpty();
3119 if (!rhs.d)
3120 return lhs.d->isEmpty();
3121
3122 return (lhs.d->presentSections() == rhs.d->presentSections()) &&
3123 lhs.d->scheme == rhs.d->scheme &&
3124 lhs.d->userName == rhs.d->userName &&
3125 lhs.d->password == rhs.d->password &&
3126 lhs.d->host == rhs.d->host &&
3127 lhs.d->port == rhs.d->port &&
3128 lhs.d->path == rhs.d->path &&
3129 lhs.d->query == rhs.d->query &&
3130 lhs.d->fragment == rhs.d->fragment;
3131}
3132
3133/*!
3134 \since 5.2
3135
3136 Returns \c true if this URL and the given \a url are equal after
3137 applying \a options to both; otherwise returns \c false.
3138
3139 This is equivalent to calling \l{adjusted()}{adjusted}(options) on both URLs
3140 and comparing the resulting urls, but faster.
3141
3142*/
3143bool QUrl::matches(const QUrl &url, FormattingOptions options) const
3144{
3145 if (!d && !url.d)
3146 return true;
3147 if (!d)
3148 return url.d->isEmpty();
3149 if (!url.d)
3150 return d->isEmpty();
3151
3152 uint mask = d->presentSections();
3153
3154 if (options.testFlag(f: QUrl::RemoveScheme))
3155 mask &= ~QUrlPrivate::Scheme;
3156 else if (d->scheme != url.d->scheme)
3157 return false;
3158
3159 if (options.testFlag(f: QUrl::RemovePassword))
3160 mask &= ~QUrlPrivate::Password;
3161 else if (d->password != url.d->password)
3162 return false;
3163
3164 if (options.testFlag(f: QUrl::RemoveUserInfo))
3165 mask &= ~QUrlPrivate::UserName;
3166 else if (d->userName != url.d->userName)
3167 return false;
3168
3169 if (options.testFlag(f: QUrl::RemovePort))
3170 mask &= ~QUrlPrivate::Port;
3171 else if (d->port != url.d->port)
3172 return false;
3173
3174 if (options.testFlag(f: QUrl::RemoveAuthority))
3175 mask &= ~QUrlPrivate::Host;
3176 else if (d->host != url.d->host)
3177 return false;
3178
3179 if (options.testFlag(f: QUrl::RemoveQuery))
3180 mask &= ~QUrlPrivate::Query;
3181 else if (d->query != url.d->query)
3182 return false;
3183
3184 if (options.testFlag(f: QUrl::RemoveFragment))
3185 mask &= ~QUrlPrivate::Fragment;
3186 else if (d->fragment != url.d->fragment)
3187 return false;
3188
3189 if ((d->sectionIsPresent & mask) != (url.d->sectionIsPresent & mask))
3190 return false;
3191
3192 if (options.testFlag(f: QUrl::RemovePath))
3193 return true;
3194
3195 // Compare paths, after applying path-related options
3196 QString path1;
3197 d->appendPath(appendTo&: path1, options, appendingTo: QUrlPrivate::Path);
3198 QString path2;
3199 url.d->appendPath(appendTo&: path2, options, appendingTo: QUrlPrivate::Path);
3200 return path1 == path2;
3201}
3202
3203/*!
3204 \fn bool QUrl::operator !=(const QUrl &lhs, const QUrl &rhs)
3205
3206 Returns \c true if \a lhs and \a rhs URLs are not equal;
3207 otherwise returns \c false.
3208
3209 \sa matches()
3210*/
3211
3212/*!
3213 Assigns the specified \a url to this object.
3214*/
3215QUrl &QUrl::operator =(const QUrl &url) noexcept
3216{
3217 if (!d) {
3218 if (url.d) {
3219 url.d->ref.ref();
3220 d = url.d;
3221 }
3222 } else {
3223 if (url.d)
3224 qAtomicAssign(d, x: url.d);
3225 else
3226 clear();
3227 }
3228 return *this;
3229}
3230
3231/*!
3232 Assigns the specified \a url to this object.
3233*/
3234QUrl &QUrl::operator =(const QString &url)
3235{
3236 detachToClear();
3237 if (!url.isEmpty())
3238 d->parse(url, parsingMode: TolerantMode);
3239 return *this;
3240}
3241
3242/*!
3243 \fn void QUrl::swap(QUrl &other)
3244 \since 4.8
3245 \memberswap{URL}
3246*/
3247
3248/*!
3249 \internal
3250
3251 Forces a detach.
3252*/
3253void QUrl::detach()
3254{
3255 if (!d)
3256 d = new QUrlPrivate;
3257 else
3258 qAtomicDetach(d);
3259}
3260
3261/*!
3262 \internal
3263
3264 Forces a detach resulting in a clear state.
3265*/
3266void QUrl::detachToClear()
3267{
3268 if (d && (d->ref.loadAcquire() == 1 || !d->ref.deref())) {
3269 // we had the only copy
3270 d->ref.storeRelaxed(newValue: 1);
3271 d->clear();
3272 } else {
3273 d = new QUrlPrivate;
3274 }
3275}
3276
3277/*!
3278 \internal
3279*/
3280bool QUrl::isDetached() const
3281{
3282 return !d || d->ref.loadRelaxed() == 1;
3283}
3284
3285static QString fromNativeSeparators(const QString &pathName)
3286{
3287#if defined(Q_OS_WIN)
3288 QString result(pathName);
3289 const QChar nativeSeparator = u'\\';
3290 auto i = result.indexOf(nativeSeparator);
3291 if (i != -1) {
3292 QChar * const data = result.data();
3293 const auto length = result.length();
3294 for (; i < length; ++i) {
3295 if (data[i] == nativeSeparator)
3296 data[i] = u'/';
3297 }
3298 }
3299 return result;
3300#else
3301 return pathName;
3302#endif
3303}
3304
3305/*!
3306 Returns a QUrl representation of \a localFile, interpreted as a local
3307 file. This function accepts paths separated by slashes as well as the
3308 native separator for this platform.
3309
3310 This function also accepts paths with a doubled leading slash (or
3311 backslash) to indicate a remote file, as in
3312 "//servername/path/to/file.txt". Note that only certain platforms can
3313 actually open this file using QFile::open().
3314
3315 An empty \a localFile leads to an empty URL (since Qt 5.4).
3316
3317 \snippet code/src_corelib_io_qurl.cpp 16
3318
3319 In the first line in snippet above, a file URL is constructed from a
3320 local, relative path. A file URL with a relative path only makes sense
3321 if there is a base URL to resolve it against. For example:
3322
3323 \snippet code/src_corelib_io_qurl.cpp 17
3324
3325 To resolve such a URL, it's necessary to remove the scheme beforehand:
3326
3327 \snippet code/src_corelib_io_qurl.cpp 18
3328
3329 For this reason, it is better to use a relative URL (that is, no scheme)
3330 for relative file paths:
3331
3332 \snippet code/src_corelib_io_qurl.cpp 19
3333
3334 \sa toLocalFile(), isLocalFile(), QDir::toNativeSeparators()
3335*/
3336QUrl QUrl::fromLocalFile(const QString &localFile)
3337{
3338 QUrl url;
3339 QString deslashified = fromNativeSeparators(pathName: localFile);
3340 if (deslashified.isEmpty())
3341 return url;
3342 QString scheme = fileScheme();
3343 char16_t firstChar = deslashified.at(i: 0).unicode();
3344 char16_t secondChar = deslashified.size() > 1 ? deslashified.at(i: 1).unicode() : u'\0';
3345
3346 // magic for drives on windows
3347 if (firstChar != u'/' && secondChar == u':') {
3348 deslashified.prepend(c: u'/');
3349 firstChar = u'/';
3350 } else if (firstChar == u'/' && secondChar == u'/') {
3351 // magic for shared drive on windows
3352 qsizetype indexOfPath = deslashified.indexOf(c: u'/', from: 2);
3353 QStringView hostSpec = QStringView{deslashified}.mid(pos: 2, n: indexOfPath - 2);
3354 // Check for Windows-specific WebDAV specification: "//host@SSL/path".
3355 if (hostSpec.endsWith(s: webDavSslTag(), cs: Qt::CaseInsensitive)) {
3356 hostSpec.truncate(n: hostSpec.size() - 4);
3357 scheme = webDavScheme();
3358 }
3359
3360 // hosts can't be IPv6 addresses without [], so we can use QUrlPrivate::setHost
3361 url.detach();
3362 if (!url.d->setHost(value: hostSpec.toString(), from: 0, iend: hostSpec.size(), mode: StrictMode)) {
3363 if (url.d->error->code != QUrlPrivate::InvalidRegNameError)
3364 return url;
3365
3366 // Path hostname is not a valid URL host, so set it entirely in the path
3367 // (by leaving deslashified unchanged)
3368 } else if (indexOfPath > 2) {
3369 deslashified = deslashified.right(n: deslashified.size() - indexOfPath);
3370 } else {
3371 deslashified.clear();
3372 }
3373 }
3374 if (firstChar == u'/') {
3375 // ensure absolute file URLs have an empty authority to comply with the XDG file spec
3376 url.detach();
3377 url.d->sectionIsPresent |= QUrlPrivate::Host;
3378 }
3379
3380 url.setScheme(scheme);
3381 url.setPath(path: deslashified, mode: DecodedMode);
3382
3383 return url;
3384}
3385
3386/*!
3387 Returns the path of this URL formatted as a local file path. The path
3388 returned will use forward slashes, even if it was originally created
3389 from one with backslashes.
3390
3391 If this URL contains a non-empty hostname, it will be encoded in the
3392 returned value in the form found on SMB networks (for example,
3393 "//servername/path/to/file.txt").
3394
3395 \snippet code/src_corelib_io_qurl.cpp 20
3396
3397 Note: if the path component of this URL contains a non-UTF-8 binary
3398 sequence (such as %80), the behaviour of this function is undefined.
3399
3400 \sa fromLocalFile(), isLocalFile()
3401*/
3402QString QUrl::toLocalFile() const
3403{
3404 // the call to isLocalFile() also ensures that we're parsed
3405 if (!isLocalFile())
3406 return QString();
3407
3408 return d->toLocalFile(options: QUrl::FullyDecoded);
3409}
3410
3411/*!
3412 \since 4.8
3413 Returns \c true if this URL is pointing to a local file path. A URL is a
3414 local file path if the scheme is "file".
3415
3416 Note that this function considers URLs with hostnames to be local file
3417 paths, even if the eventual file path cannot be opened with
3418 QFile::open().
3419
3420 \sa fromLocalFile(), toLocalFile()
3421*/
3422bool QUrl::isLocalFile() const
3423{
3424 return d && d->isLocalFile();
3425}
3426
3427/*!
3428 Returns \c true if this URL is a parent of \a childUrl. \a childUrl is a child
3429 of this URL if the two URLs share the same scheme and authority,
3430 and this URL's path is a parent of the path of \a childUrl.
3431*/
3432bool QUrl::isParentOf(const QUrl &childUrl) const
3433{
3434 QString childPath = childUrl.path();
3435
3436 if (!d)
3437 return ((childUrl.scheme().isEmpty())
3438 && (childUrl.authority().isEmpty())
3439 && childPath.size() > 0 && childPath.at(i: 0) == u'/');
3440
3441 QString ourPath = path();
3442
3443 return ((childUrl.scheme().isEmpty() || d->scheme == childUrl.scheme())
3444 && (childUrl.authority().isEmpty() || authority() == childUrl.authority())
3445 && childPath.startsWith(s: ourPath)
3446 && ((ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size())
3447 || (!ourPath.endsWith(c: u'/') && childPath.size() > ourPath.size()
3448 && childPath.at(i: ourPath.size()) == u'/')));
3449}
3450
3451
3452#ifndef QT_NO_DATASTREAM
3453/*! \relates QUrl
3454
3455 Writes url \a url to the stream \a out and returns a reference
3456 to the stream.
3457
3458 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3459*/
3460QDataStream &operator<<(QDataStream &out, const QUrl &url)
3461{
3462 QByteArray u;
3463 if (url.isValid())
3464 u = url.toEncoded();
3465 out << u;
3466 return out;
3467}
3468
3469/*! \relates QUrl
3470
3471 Reads a url into \a url from the stream \a in and returns a
3472 reference to the stream.
3473
3474 \sa{Serializing Qt Data Types}{Format of the QDataStream operators}
3475*/
3476QDataStream &operator>>(QDataStream &in, QUrl &url)
3477{
3478 QByteArray u;
3479 in >> u;
3480 url.setUrl(url: QString::fromLatin1(ba: u));
3481 return in;
3482}
3483#endif // QT_NO_DATASTREAM
3484
3485#ifndef QT_NO_DEBUG_STREAM
3486QDebug operator<<(QDebug d, const QUrl &url)
3487{
3488 QDebugStateSaver saver(d);
3489 d.nospace() << "QUrl(" << url.toDisplayString() << ')';
3490 return d;
3491}
3492#endif
3493
3494static QString errorMessage(QUrlPrivate::ErrorCode errorCode, const QString &errorSource, qsizetype errorPosition)
3495{
3496 QChar c = size_t(errorPosition) < size_t(errorSource.size()) ?
3497 errorSource.at(i: errorPosition) : QChar(QChar::Null);
3498
3499 switch (errorCode) {
3500 case QUrlPrivate::NoError:
3501 Q_UNREACHABLE_RETURN(QString()); // QUrl::errorString should have treated this condition
3502
3503 case QUrlPrivate::InvalidSchemeError: {
3504 auto msg = "Invalid scheme (character '%1' not permitted)"_L1;
3505 return msg.arg(args&: c);
3506 }
3507
3508 case QUrlPrivate::InvalidUserNameError:
3509 return "Invalid user name (character '%1' not permitted)"_L1
3510 .arg(args&: c);
3511
3512 case QUrlPrivate::InvalidPasswordError:
3513 return "Invalid password (character '%1' not permitted)"_L1
3514 .arg(args&: c);
3515
3516 case QUrlPrivate::InvalidRegNameError:
3517 if (errorPosition >= 0)
3518 return "Invalid hostname (character '%1' not permitted)"_L1
3519 .arg(args&: c);
3520 else
3521 return QStringLiteral("Invalid hostname (contains invalid characters)");
3522 case QUrlPrivate::InvalidIPv4AddressError:
3523 return QString(); // doesn't happen yet
3524 case QUrlPrivate::InvalidIPv6AddressError:
3525 return QStringLiteral("Invalid IPv6 address");
3526 case QUrlPrivate::InvalidCharacterInIPv6Error:
3527 return "Invalid IPv6 address (character '%1' not permitted)"_L1.arg(args&: c);
3528 case QUrlPrivate::InvalidIPvFutureError:
3529 return "Invalid IPvFuture address (character '%1' not permitted)"_L1.arg(args&: c);
3530 case QUrlPrivate::HostMissingEndBracket:
3531 return QStringLiteral("Expected ']' to match '[' in hostname");
3532
3533 case QUrlPrivate::InvalidPortError:
3534 return QStringLiteral("Invalid port or port number out of range");
3535 case QUrlPrivate::PortEmptyError:
3536 return QStringLiteral("Port field was empty");
3537
3538 case QUrlPrivate::InvalidPathError:
3539 return "Invalid path (character '%1' not permitted)"_L1
3540 .arg(args&: c);
3541
3542 case QUrlPrivate::InvalidQueryError:
3543 return "Invalid query (character '%1' not permitted)"_L1
3544 .arg(args&: c);
3545
3546 case QUrlPrivate::InvalidFragmentError:
3547 return "Invalid fragment (character '%1' not permitted)"_L1
3548 .arg(args&: c);
3549
3550 case QUrlPrivate::AuthorityPresentAndPathIsRelative:
3551 return QStringLiteral("Path component is relative and authority is present");
3552 case QUrlPrivate::AuthorityAbsentAndPathIsDoubleSlash:
3553 return QStringLiteral("Path component starts with '//' and authority is absent");
3554 case QUrlPrivate::RelativeUrlPathContainsColonBeforeSlash:
3555 return QStringLiteral("Relative URL's path component contains ':' before any '/'");
3556 }
3557
3558 Q_UNREACHABLE_RETURN(QString());
3559}
3560
3561static inline void appendComponentIfPresent(QString &msg, bool present, const char *componentName,
3562 const QString &component)
3563{
3564 if (present)
3565 msg += QLatin1StringView(componentName) % u'"' % component % "\","_L1;
3566}
3567
3568/*!
3569 \since 4.2
3570
3571 Returns an error message if the last operation that modified this QUrl
3572 object ran into a parsing error. If no error was detected, this function
3573 returns an empty string and isValid() returns \c true.
3574
3575 The error message returned by this function is technical in nature and may
3576 not be understood by end users. It is mostly useful to developers trying to
3577 understand why QUrl will not accept some input.
3578
3579 \sa QUrl::ParsingMode
3580*/
3581QString QUrl::errorString() const
3582{
3583 QString msg;
3584 if (!d)
3585 return msg;
3586
3587 QString errorSource;
3588 qsizetype errorPosition = 0;
3589 QUrlPrivate::ErrorCode errorCode = d->validityError(source: &errorSource, position: &errorPosition);
3590 if (errorCode == QUrlPrivate::NoError)
3591 return msg;
3592
3593 msg += errorMessage(errorCode, errorSource, errorPosition);
3594 msg += "; source was \""_L1;
3595 msg += errorSource;
3596 msg += "\";"_L1;
3597 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Scheme,
3598 componentName: " scheme = ", component: d->scheme);
3599 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::UserInfo,
3600 componentName: " userinfo = ", component: userInfo());
3601 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Host,
3602 componentName: " host = ", component: d->host);
3603 appendComponentIfPresent(msg, present: d->port != -1,
3604 componentName: " port = ", component: QString::number(d->port));
3605 appendComponentIfPresent(msg, present: !d->path.isEmpty(),
3606 componentName: " path = ", component: d->path);
3607 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Query,
3608 componentName: " query = ", component: d->query);
3609 appendComponentIfPresent(msg, present: d->sectionIsPresent & QUrlPrivate::Fragment,
3610 componentName: " fragment = ", component: d->fragment);
3611 if (msg.endsWith(c: u','))
3612 msg.chop(n: 1);
3613 return msg;
3614}
3615
3616/*!
3617 \since 5.1
3618
3619 Converts a list of \a urls into a list of QString objects, using toString(\a options).
3620*/
3621QStringList QUrl::toStringList(const QList<QUrl> &urls, FormattingOptions options)
3622{
3623 QStringList lst;
3624 lst.reserve(asize: urls.size());
3625 for (const QUrl &url : urls)
3626 lst.append(t: url.toString(options));
3627 return lst;
3628
3629}
3630
3631/*!
3632 \since 5.1
3633
3634 Converts a list of strings representing \a urls into a list of urls, using QUrl(str, \a mode).
3635 Note that this means all strings must be urls, not for instance local paths.
3636*/
3637QList<QUrl> QUrl::fromStringList(const QStringList &urls, ParsingMode mode)
3638{
3639 QList<QUrl> lst;
3640 lst.reserve(asize: urls.size());
3641 for (const QString &str : urls)
3642 lst.append(t: QUrl(str, mode));
3643 return lst;
3644}
3645
3646/*!
3647 \typedef QUrl::DataPtr
3648 \internal
3649*/
3650
3651/*!
3652 \fn DataPtr &QUrl::data_ptr()
3653 \internal
3654*/
3655
3656/*!
3657 \fn size_t qHash(const QUrl &key, size_t seed)
3658 \qhashold{QHash}
3659 \since 5.0
3660*/
3661size_t qHash(const QUrl &url, size_t seed) noexcept
3662{
3663 QtPrivate::QHashCombineWithSeed hasher(seed);
3664
3665 // non-commutative, we must hash the port first
3666 if (!url.d)
3667 return hasher(0, -1);
3668 size_t state = hasher(0, url.d->port);
3669
3670 if (url.d->hasScheme())
3671 state = hasher(state, url.d->scheme);
3672 if (url.d->hasUserInfo()) {
3673 // see presentSections(), appendUserName(), etc.
3674 state = hasher(state, url.d->userName);
3675 state = hasher(state, url.d->password);
3676 }
3677 if (url.d->hasHost() || url.d->isLocalFile()) // for XDG compatibility
3678 state = hasher(state, url.d->host);
3679 if (url.d->hasPath())
3680 state = hasher(state, url.d->path);
3681 if (url.d->hasQuery())
3682 state = hasher(state, url.d->query);
3683 if (url.d->hasFragment())
3684 state = hasher(state, url.d->fragment);
3685 return state;
3686}
3687
3688static QUrl adjustFtpPath(QUrl url)
3689{
3690 if (url.scheme() == ftpScheme()) {
3691 QString path = url.path(options: QUrl::PrettyDecoded);
3692 if (path.startsWith(s: "//"_L1))
3693 url.setPath(path: "/%2F"_L1 + QStringView{path}.mid(pos: 2), mode: QUrl::TolerantMode);
3694 }
3695 return url;
3696}
3697
3698static bool isIp6(const QString &text)
3699{
3700 QIPAddressUtils::IPv6Address address;
3701 return !text.isEmpty() && QIPAddressUtils::parseIp6(address, begin: text.begin(), end: text.end()) == nullptr;
3702}
3703
3704/*!
3705 Returns a valid URL from a user supplied \a userInput string if one can be
3706 deduced. In the case that is not possible, an invalid QUrl() is returned.
3707
3708 This allows the user to input a URL or a local file path in the form of a plain
3709 string. This string can be manually typed into a location bar, obtained from
3710 the clipboard, or passed in via command line arguments.
3711
3712 When the string is not already a valid URL, a best guess is performed,
3713 making various assumptions.
3714
3715 In the case the string corresponds to a valid file path on the system,
3716 a file:// URL is constructed, using QUrl::fromLocalFile().
3717
3718 If that is not the case, an attempt is made to turn the string into a
3719 http:// or ftp:// URL. The latter in the case the string starts with
3720 'ftp'. The result is then passed through QUrl's tolerant parser, and
3721 in the case or success, a valid QUrl is returned, or else a QUrl().
3722
3723 \section1 Examples:
3724
3725 \list
3726 \li qt-project.org becomes http://qt-project.org
3727 \li ftp.qt-project.org becomes ftp://ftp.qt-project.org
3728 \li hostname becomes http://hostname
3729 \li /home/user/test.html becomes file:///home/user/test.html
3730 \endlist
3731
3732 In order to be able to handle relative paths, this method takes an optional
3733 \a workingDirectory path. This is especially useful when handling command
3734 line arguments.
3735 If \a workingDirectory is empty, no handling of relative paths will be done.
3736
3737 By default, an input string that looks like a relative path will only be treated
3738 as such if the file actually exists in the given working directory.
3739 If the application can handle files that don't exist yet, it should pass the
3740 flag AssumeLocalFile in \a options.
3741
3742 \since 5.4
3743*/
3744QUrl QUrl::fromUserInput(const QString &userInput, const QString &workingDirectory,
3745 UserInputResolutionOptions options)
3746{
3747 QString trimmedString = userInput.trimmed();
3748
3749 if (trimmedString.isEmpty())
3750 return QUrl();
3751
3752 // Check for IPv6 addresses, since a path starting with ":" is absolute (a resource)
3753 // and IPv6 addresses can start with "c:" too
3754 if (isIp6(text: trimmedString)) {
3755 QUrl url;
3756 url.setHost(host: trimmedString);
3757 url.setScheme(QStringLiteral("http"));
3758 return url;
3759 }
3760
3761 const QUrl url = QUrl(trimmedString, QUrl::TolerantMode);
3762
3763 // Check for a relative path
3764 if (!workingDirectory.isEmpty()) {
3765 const QFileInfo fileInfo(QDir(workingDirectory), userInput);
3766 if (fileInfo.exists())
3767 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3768
3769 // Check both QUrl::isRelative (to detect full URLs) and QDir::isAbsolutePath (since on Windows drive letters can be interpreted as schemes)
3770 if ((options & AssumeLocalFile) && url.isRelative() && !QDir::isAbsolutePath(path: userInput))
3771 return QUrl::fromLocalFile(localFile: fileInfo.absoluteFilePath());
3772 }
3773
3774 // Check first for files, since on Windows drive letters can be interpreted as schemes
3775 if (QDir::isAbsolutePath(path: trimmedString))
3776 return QUrl::fromLocalFile(localFile: trimmedString);
3777
3778 QUrl urlPrepended = QUrl("http://"_L1 + trimmedString, QUrl::TolerantMode);
3779
3780 // Check the most common case of a valid url with a scheme
3781 // We check if the port would be valid by adding the scheme to handle the case host:port
3782 // where the host would be interpreted as the scheme
3783 if (url.isValid()
3784 && !url.scheme().isEmpty()
3785 && urlPrepended.port() == -1)
3786 return adjustFtpPath(url);
3787
3788 // Else, try the prepended one and adjust the scheme from the host name
3789 if (urlPrepended.isValid() && (!urlPrepended.host().isEmpty() || !urlPrepended.path().isEmpty())) {
3790 qsizetype dotIndex = trimmedString.indexOf(c: u'.');
3791 const QStringView hostscheme = QStringView{trimmedString}.left(n: dotIndex);
3792 if (hostscheme.compare(other: ftpScheme(), cs: Qt::CaseInsensitive) == 0)
3793 urlPrepended.setScheme(ftpScheme());
3794 return adjustFtpPath(url: urlPrepended);
3795 }
3796
3797 return QUrl();
3798}
3799
3800QT_END_NAMESPACE
3801

source code of qtbase/src/corelib/io/qurl.cpp