|
发表于 2015-12-31 14:11:03
|
显示全部楼层
致命错误:
1. 27行预编译正则表达式语法错误,需要使用re.compile()来预编译.
2. 36行open_url()函数不应该传入参数,而是应该使用该函数的参数的默认值
3. 60行print一个函数内的变量,严重错误!应该print(url_opener(url))
语法规范:
1. PEP8的规范我就不谈了,基础的基础.
2. 第9行定义函数使用的形参和外部定义的变量重名,易引起帮你维护代码的人(我 )的疑问.
3. 17和48行的reason不知道是故意还是为什么拼写成了reson
个人建议:
只是学习的话可以练练手写匹配ip地址的正则表达式,我观察了你给的获取代理的网页,代理分http和https两种,你可以重新写一个正则表达式来分开匹配这两种协议的代理ip,至于使用就不要太勉强,毕竟是免费的,不出错就很给面子了.
我的代码:
- import urllib.request as r
- from urllib.error import URLError
- import random
- import re
- RE_IP = re.compile(r'(?:(?:[0,1]?\d?\d|2[0-4]\d|25[0-5])\.){3}(?:[0,1]?\d?\d|2[0-4]\d|25[0-5])')
- def open_url(url='http://www.xicidaili.com/wn/'):
- req = r.Request(url)
- req.add_header('User-Agent',
- 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36')
- try:
- page = r.urlopen(req)
- except URLError as e:
- if hasattr(e, 'reason'):
- print('we failed to reach a server,')
- print('reson: ', e.reason)
- elif hasattr(e, 'code'):
- print('the server could not fulfill the request,')
- print('error code:', e.code)
- else:
- html = page.read().decode('utf-8')
- return html
- def get_ip(html):
- ip_list = RE_IP.findall(html)
- return ip_list
- def url_opener(url):
- req = r.Request(url)
- req.add_header('User-Agent',
- 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36')
- proxies = get_ip(open_url())
- proxy = random.choice(proxies)
- proxy_support = r.ProxyHandler({'http': proxy})
- opener = r.build_opener(proxy_support)
- r.install_opener(opener)
- try:
- response = r.urlopen(url)
- except URLError as e:
- if hasattr(e, 'reason'):
- print('we failed to reach a server,')
- print('reson: ', e.reason)
- elif hasattr(e, 'code'):
- print('the server could not fulfill the request,')
- print('error code:', e.code)
- else:
- html2 = response.read()
- return html2
- if __name__ == '__main__':
- link = input('请输入要打开的网页:')
- print(url_opener(link))
复制代码 |
评分
-
查看全部评分
|