Chinaunix

标题: 如何用C/C++写一个判断url是否有效的函数 [打印本页]

作者: developing_T    时间: 2006-01-06 18:59
标题: 如何用C/C++写一个判断url是否有效的函数
能不能写这样一个函数, 传入一个链接(http://www.aaa.com/aa/aa/aa), 返回一个bool型值, 链接有效返回 真, 否则为假
作者: apen    时间: 2006-01-06 21:16
何为有效?一个URL可能返回一个你的期望的结果,也有可能返回一个错误信息(例如:该页面不存在的提示信息),这是你认为这个URL是有效的还是无效的?
作者: vulgate    时间: 2006-01-08 21:34
acm的练习题目,做过,但是code被哇卡卡了
注意你的题目定义的不是很准确
如果是下面这个题目,那么用有限状态机就ok了

http://acm.zju.edu.cn/show_problem.php?pid=1243




--------------------------------------------------------------------------------

URLs

--------------------------------------------------------------------------------

Time limit: 1 Seconds   Memory limit: 32768K   
Total Submit: 341   Accepted Submit: 145   

--------------------------------------------------------------------------------

In the early nineties, the World Wide Web (WWW) was invented. Nowadays, most people think that the WWW simply consists of all the pretty (or not so pretty) HTML-pages that you can read with your WWW browser. But back then, one of the main intentions behind the design of the WWW was to unify several existing communication protocols.

Then (and even now), information on the Internet was available via a multitude of channels: FTP, HTTP, E-Mail, News, Gopher, and many more. Thanks to the WWW, all these services can now be uniformly addressed via URLs (Uniform Resource Locators). The syntax of URLs is defined in the Internet standard RFC 1738. For our problem, we consider a simplified version of the syntax, which is as follows:

<protocol> "://" <host> [ ":" <port> ] [ "/" <path> ]

The square brackets [] mean that the enclosed string is optional and may or may not appear. Examples of URLs are the following:

http://www.informatik.uni-ulm.de/acm
ftp://acm.baylor.edu:1234/pub/staff/mr-p
gopher://veryold.edu

More specifically,

<protocol> is always one of http, ftp or gopher.

<host> is a string consisting of alphabetic (a-z, A-Z) or numeric (0-9) characters and points (.).

<port> is a positive integer, smaller than 65536.

<path> is a string that contains no spaces.

You are to write a program that parses an URL into its components.


Input

The input starts with a line containing a single integer n, the number of URLs in the input. The following n lines contain one URL each, in the format described above. The URLs will consist of at most 60 characters each.


Output

For each URL in the input first print the number of the URL, as shown in the sample output. Then print four lines, stating the protocol, host, port and path specified by the URL. If the port and/or path are not given in the URL, print the string <default> instead. Adhere to the format shown in the sample output.

Print a blank line after each test case.


Sample Input

3
ftp://acm.baylor.edu:1234/pub/staff/mr-p
http://www.informatik.uni-ulm.de/acm
gopher://veryold.edu


Sample Output

URL #1
Protocol = ftp
Host     = acm.baylor.edu
Port     = 1234
Path     = pub/staff/mr-p

URL #2
Protocol = http
Host     = www.informatik.uni-ulm.de
Port     = <default>
Path     = acm

URL #3
Protocol = gopher
Host     = veryold.edu
Port     = <default>
Path     = <default>



--------------------------------------------------------------------------------
Problem Source: Southwestern Europe 1997, Practice
--------------------------------------------------------------------------------

Submit   Back   Status

--------------------------------------------------------------------------------

Zhejiang University Online Judge V1.0
作者: developing_T    时间: 2006-01-09 08:51
对不起我的题目没有说清楚,
我的意思是写这样一个函数, 传入值是一个合法的 url 地址, 然后返回一个 bool 型值, 如果这个把这个 url 输入到 IE 里可以正常打开页面, 则返回 真, 如果 "该页面不存在" 则返回 假
作者: developing_T    时间: 2006-01-09 18:34
有人写过吗




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2