Uniform Resource Locator (URL) Functions

Uniform Resource Locator (URL) Functions


The URL functions deal with URL manipulation and access. These functions operate in a task-oriented manner. The content and format of the URL that is being used by the function is not verified. Usage of these functions should be tracked by the calling application to ensure that the data is in the format intended. For example, InternetCanonicalizeUrl would convert the unsafe character "%" into the escape sequence "%25" when using no flags. If InternetCanonicalizeUrl is used on the canonicalized URL, the escape sequence "%25" would be converted into the escape sequence "%2525", which would not work properly.
InternetCanonicalizeUrl
InternetCombineUrl
InternetCrackUrl
InternetCreateUrl
InternetOpenUrl

InternetCanonicalizeUrl

BOOL InternetCanonicalizeUrl(
    IN LPCSTR lpszUrl,
    OUT LPSTR lpszBuffer,
    IN OUT LPDWORD lpdwBufferLength,
    IN DWORD dwFlags
);

Canonicalizes a URL, which includes converting unsafe characters and spaces into escape sequences.

lpszUrl
Address of the input URL to canonicalize.
lpszBuffer
Address of the buffer that receives the resulting canonicalized URL.
lpdwBufferLength
Length, in bytes, of the lpszBuffer buffer. If the function succeeds, this parameter receives the length of the lpszBuffer buffer—the length does not include the terminating null. If the function fails, this parameter receives the required length, in bytes, of the lpszBuffer buffer—the required length includes the terminating null.
dwFlags
Flags that control canonicalization. Can be one of the following values:
ICU_BROWSER_MODE Does not encode or decode characters after "#" or "?", and does not remove trailing white space after "?". If this value is not specified, the entire URL is encoded, and trailing white space is removed.
ICU_DECODE Converts all %XX sequences to characters, including escape sequences, before the URL is parsed.
ICU_ENCODE_SPACES_ONLY Encodes spaces only.
ICU_NO_ENCODE Does not convert unsafe characters to escape sequences.
ICU_NO_META Does not remove meta sequences (such as "." and "..") from the URL.

If no flags are specified (dwFlags = 0), the function converts all unsafe characters and meta sequences (such as \.,\ .., and \...) to escape sequences.

InternetCanonicalizeUrl always encodes by default, even if the ICU_DECODE flag has been specified. To decode without re-encoding, use ICU_DECODE | ICU_NO_ENCODE. If the ICU_DECODE flag is used without ICU_NO_ENCODE, the URL is decoded before being parsed; unsafe characters then are re-encoded after parsing. This function will handle arbitrary protocol schemes, but to do so it must make inferences from the unsafe character set.

The application calling InternetCanonicalizeUrl should track the usage of this function on a particular URL. If unsafe characters in a URL have been converted to escape sequences, using InternetCanonicalizeUrl again on the URL (with no flags) will cause the escape sequences to be converted to another escape sequence. For example, a blank space in a URL would be converted to the escape sequence "%20". Calling InternetCanonicalizeUrl again on the URL would cause the escape sequence "%20" to be converted to the escape sequence "%2520", because the "%" sign is an unsafe character that is reserved for escape sequences and is replaced by the function with the escape sequence "%25".

InternetCombineUrl

BOOL InternetCombineUrl(
    IN LPCSTR lpszBaseUrl,
    IN LPCSTR lpszRelativeUrl,
    OUT LPSTR lpszBuffer,
    IN OUT LPDWORD lpdwBufferLength,
    IN DWORD dwFlags
);

Combines a base and relative URL into a single URL. The resultant URL will be canonicalized (see InternetCanonicalizeUrl).

lpszBaseUrl
Address of the base URL to be combined.
lpszRelativeUrl
Address of the relative URL to be combined.
lpszBuffer
Address of a buffer that receives the resulting URL.
lpdwBufferLength
Size, in bytes, of the lpszBuffer buffer. If the function succeeds, this parameter receives the length, in characters, of the resultant combined URL—the length does not include the null terminator. If the function fails, this parameter receives the length, in bytes, of the required buffer—the length includes the null terminator.
dwFlags
Flags controlling the operation of the function. Can be one of the following values:
ICU_BROWSER_MODE Does not encode or decode characters after "#" or "?", and does not remove trailing white space after "?". If this value is not specified, the entire URL is encoded and trailing white space is removed.
ICU_DECODE Converts all %XX sequences to characters, including escape sequences, before the URL is parsed.
ICU_ENCODE_SPACES_ONLY Encodes spaces only.
ICU_NO_ENCODE Does not convert unsafe characters to escape sequences.
ICU_NO_META Does not remove meta sequences (such as "." and "..") from the URL.

InternetCrackUrl

BOOL InternetCrackUrl(
    IN LPCSTR lpszUrl,
    IN DWORD dwUrlLength,
    IN DWORD dwFlags,
    IN OUT LPURL_COMPONENTS lpUrlComponents
);

Cracks a URL into its component parts.

lpszUrl
Address of a string that contains the canonical URL to crack.
dwUrlLength
Length of the lpszUrl string, or zero if lpszUrl is an ASCIIZ string.
dwFlags
Flags controlling the operation. Can be one of the following values:
ICU_DECODE Converts encoded characters back to their normal form. This can be used only if the user provides buffers in the URL_COMPONENTS structure to copy the components into.
ICU_ESCAPE Converts all escape sequences (%xx) to their corresponding characters. This can be used only if the user provides buffers in the URL_COMPONENTS structure to copy the components into.
lpUrlComponents
Address of a URL_COMPONENTS structure that receives the URL components.

The required components are indicated by members of the URL_COMPONENTS structure. Each component has a pointer to the value and has a member that stores the length of the stored value. If both the value and the length for a component are equal to zero, that component is not returned. If the pointer to the value of the component is NULL and the value of its corresponding length member is nonzero, the address of the first character of the corresponding component in the lpszUrl string is stored in the pointer, and the length of the component is stored in the length member.

If the pointer contains the address of the user-supplied buffer, the length member must contain the size of the buffer. InternetCrackUrl copies the component into the buffer, and the length member is set to the length of the copied component, minus 1 for the trailing string terminator.

For InternetCrackUrl to work properly, the size of the URL_COMPONENTS structure must be stored in the dwStructSize member.

See also FtpOpenFile, InternetCloseHandle, InternetFindNextFile, InternetSetStatusCallback

InternetCreateUrl

BOOL InternetCreateUrl( 
    IN LPURL_COMPONENTS lpUrlComponents,
    IN DWORD dwFlags,
    OUT LPSTR lpszUrl,
    IN OUT LPDWORD lpdwUrlLength
);

Creates a URL from its component parts.

lpUrlComponents
Address of a URL_COMPONENTS structure that contains the components from which to create the URL.
dwFlags
Flags that control the operation of this function. Can be a combination of these values:
ICU_ESCAPE Converts all escape sequences (%xx) to their corresponding characters.
ICU_USERNAME When adding the user name, uses the name that was specified at logon time.
lpszUrl
Address of a buffer that receives the URL.
lpdwUrlLength
Length, in bytes, of the lpszUrl buffer. When the function returns, this parameter receives the length, in bytes, of the URL string, minus 1 for the terminating character. If GetLastError returns ERROR_INSUFFICIENT_BUFFER, this parameter receives the number of bytes required to hold the created URL.

InternetOpenUrl

HINTERNET InternetOpenUrl(
    IN HINTERNET hInternetSession, 
    IN LPCSTR lpszUrl,
    IN LPCSTR lpszHeaders,
    IN DWORD dwHeadersLength,
    IN DWORD dwFlags,
    IN DWORD dwContext
);

Begins reading a complete FTP, Gopher, or HTTP URL. Use InternetCanonicalizeUrl first if the URL being used contains a relative URL and a base URL separated by blank spaces.

hInternetSession
Handle to the current Internet session. The handle must have been returned by a previous call to InternetOpen.
lpszUrl
Address of a string that contains the URL to begin reading. Only URLs beginning with ftp:, gopher:, http:, or https: are supported.
lpszHeaders
Address of a string that contains the headers to be sent to the HTTP server. (For more information, see the description of the lpszHeaders parameter in the HttpSendRequest function.)
dwHeadersLength
Length, in characters, of the additional headers. If this parameter is -1L and lpszHeaders is not NULL, lpszHeaders is assumed to be zero-terminated (ASCIIZ) and the length is calculated.
dwFlags
Action flags. Can be one of the following values:
INTERNET_FLAG_DONT_CACHE
Does not cache the data, either locally or in any gateways. Identical to the preferred value, INTERNET_FLAG_NO_CACHE_WRITE.
INTERNET_FLAG_EXISTING_CONNECT
If possible, reuses the existing connections to the server for new requests generated by InternetOpenUrl instead of creating a new session for each request. This flag is useful only for FTP connections, since FTP is the only protocol that typically performs multiple operations during the same session. The Win32 Internet API caches a single connection handle for each HINTERNET handle generated by InternetOpen.
INTERNET_FLAG_HYPERLINK
Forces a reload if there was no Expires time and no Last-Modified time returned from the server when determining whether to reload the item from the network.
INTERNET_FLAG_IGNORE_CERT_CN_INVALID
Disables Win32 Internet function checking of SSL/PCT-based certificates that are returned from the server against the host name given in the request. Win32 Internet functions use a simple check against certificates by comparing for matching host names and simple wildcarding rules for HTTP requests.
INTERNET_FLAG_IGNORE_CERT_DATE_INVALID
Disables Win32 Internet function checking of SSL/PCT-based certificates for proper validity dates for HTTP requests.
INTERNET_FLAG_IGNORE_REDIRECT_TO_HTTP
Disables the ability of the Win32 Internet functions to detect this special type of redirect. When this flag is used, Win32 Internet functions transparently allow redirects from HTTPS to HTTP URLs.
INTERNET_FLAG_IGNORE_REDIRECT_TO_HTTPS
Disables the ability of the Win32 Internet functions to detect this special type of redirect. When this flag is used, Win32 Internet functions transparently allow redirects from HTTP to HTTPS URLs.
INTERNET_FLAG_KEEP_CONNECTION
Uses keep-alive semantics, if available, for the connection for HTTP requests. This flag is required for Microsoft Network (MSN), NT LAN Manager (NTLM), and other types of authentication.
INTERNET_FLAG_MAKE_PERSISTENT
No longer supported.
INTERNET_FLAG_MUST_CACHE_REQUEST
Causes a temporary file to be created if the file cannot be cached. Identical to the preferred value, INTERNET_FLAG_NEED_FILE.
INTERNET_FLAG_NEED_FILE
Causes a temporary file to be created if the file cannot be cached.
INTERNET_FLAG_NO_AUTH
Does not attempt authentication automatically for HTTP requests.
INTERNET_FLAG_NO_AUTO_REDIRECT
Does not automatically handle redirection for HTTP requests only.
INTERNET_FLAG_NO_CACHE_WRITE
Does not cache the data, either locally or in any gateways.
INTERNET_FLAG_NO_COOKIES
Does not automatically add cookie headers to requests, and does not automatically add returned cookies to the cookie database for HTTP requests.
INTERNET_FLAG_NO_UI
Disables the cookie dialog box.
INTERNET_FLAG_PASSIVE
Uses passive FTP semantics for FTP files and directories.
INTERNET_FLAG_RAW_DATA
Returns the data as a GOPHER_FIND_DATA structure when retrieving Gopher directory information, or as a WIN32_FIND_DATA structure when retrieving FTP directory information. If this flag is not specified or if the call was made through a CERN proxy, InternetOpenUrl returns an HTML version of the directory.
INTERNET_FLAG_PRAGMA_NOCACHE
Forces the request to be resolved by the origin server, even if a cached copy exists on the proxy.
INTERNET_FLAG_READ_PREFETCH
This flag is currently disabled.
INTERNET_FLAG_RELOAD
Gets the data from the wire even if it is locally cached.
INTERNET_FLAG_RESYNCHRONIZE
Reloads HTTP resources if the resource has been modified since the last time it was downloaded. All FTP and Gopher resources are reloaded.
INTERNET_FLAG_SECURE
Requests secure transactions on the wire with SSL or PCT. This flag applies to HTTP requests only.
dwContext
Application-defined value that is passed, along with the returned handle, to any callback functions.

This is a general function that an application can use to retrieve data over any of the protocols that the Win32 Internet functions support. This function is particularly useful when the application does not need to access the particulars of a protocol, but only requires the data corresponding to a URL. The InternetOpenUrl function parses the URL string, establishes a connection to the server, and prepares to download the data identified by the URL. The application can then use InternetReadFile (for files) or InternetFindNextFile (for directories) to retrieve the URL data. It is not necessary to call InternetConnect before InternetOpenUrl.

InternetOpenUrl disables Gopher on ports less than 1024, except for port 70 (the standard Gopher port) and port 105 (typically used for Central Services Organization [CSO] name searches).

Use InternetCloseHandle to close the handle returned from InternetOpenUrl. However, note that closing the handle before all the URL data has been read results in the connection being terminated.

© 1997 Microsoft Corporation. All rights reserved. Terms of Use.