|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
- import urllib.request
- import os
- import re
- def url_open(url):
- req = urllib.request.Request(url)
- req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.104 Safari/537.36 Core/1.53.2372.400 QQBrowser/9.5.10548.400')
- response = urllib.request.urlopen(url)
- html = response.read()
- return html
- num=list(range(1,5))
- for a in num:
- a=str(a)
- url="http://jdz.58.com/ershoufang/pn"+a+"/"
- print(url)
- html=url_open(url).decode("utf-8")
- neirong=urllib.request.urlopen(url)
- html=neirong.read().decode('utf-8')
- '''
- zongjia=re.compile(r'<div .*?qj-listright btall">.*?class="pri">(.*?)</b>(.*?) .*?(/d/d/d/d.*?)<br>.*?class="showroom">(.*?)</span>(/d{1,2}/d.*?)<br>.*?</div>')
- print(zongjia)
- '''
- zongjia=re.compile(r'<div .*?"qj-listright btall">.*?class="pri">(.*?)</b> (.*?)\s*? .*?(\d\d\d\d.*?)\s.*?class="showroom">.*?(\S*?).*?</span>.*?(\d{1,4}\S*?).*?</div>')
- zongjia_list=zongjia.findall(html)
- print(zongjia_list)
复制代码
输出:
http://jdz.58.com/ershoufang/pn1/
[]
http://jdz.58.com/ershoufang/pn2/
[]
http://jdz.58.com/ershoufang/pn3/
[]
http://jdz.58.com/ershoufang/pn4/
[]
zongjia_list一直是空的,试验了很多次了。爬妹子图什么的都没事啊。
虽然没有仔细看,但是肯定是你的正则表达式有问题
这种网页用正则干嘛,出错几率太大了,用BeautifulSoup4
|
|