|
|
发表于 2016-3-17 09:43:01
|
显示全部楼层
本帖最后由 kunaiai 于 2016-3-17 09:45 编辑
现在ooxx反爬虫很严格了 addheaders 这个要把你跟踪到的都添上
heads.append(('Host','jandan.net'))
#heads.append(('Connection','keep-alive'))
heads.append(('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'))
heads.append(('User-Agent','Mozilla/5.0 (Windows NT 5.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36'))
heads.append(('Accept-Encoding','gzip, deflate, sdch'))
heads.append(('Accept-Language','zh-CN,zh;q=0.8'))
heads.append(('Cookie','aliyungf_tc=AQAAAI0S5U1xHA0A/stacRBEuAXZ+pQe; _gat=1; _ga=GA1.2.355763900.1457580893; Hm_lvt_fd93b7fb546adcfbcf80c4fc2b54da2c=1457580893; Hm_lpvt_fd93b7fb546adcfbcf80c4fc2b54da2c=1457587243'))
这个是我的给你参考
不过这个要解码
import gzip
response = opener.open(url)
doc = response.read()
#解码
try:
html=gzip.decompress(doc)
except:
html=doc
return html |
|