|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
网页源码连接如下view-source:http://www.xbiquge.la/1/1508/1158793.html
网页url如下:http://www.xbiquge.la/1/1508/1158793.html
代码如下:
- from urllib.request import Request,urlopen
- from fake_useragent import UserAgent
- url = "http://www.xbiquge.la/1/1508/1158793.html"
- headers={
- "User-Agent":UserAgent().chrome
- }
- request =Request(url,headers=headers)
-
- response = urlopen(request)
- print(response.read())
复制代码
但奇怪的是,响应内容居然是一堆:
- b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xed|iOcY\x9a\xe6\xf7\x92\xf2?\xb8\x90\xa6+B\xca\xc0A,\x95\x19\x19\xcbH\x95U\xd2\x8c\xd4S]j\xe5h\xba\xd5j\x85\x1c\xe0\x0c\xc8 \x80\x04\x93\x91\xd9\xad\x96l\x8cW\xf0\xc2\xbe\xd8\xec\x06\x82\xd5\x06\x0cx\xf7\x7f\xa9\xf4\xb9\xcb\xa7\xf8\x0b\xf3\xbc\xe7\xbd\xbe\xbe\x06\x13e\xd7|\x9d\xd4\r\xa7\xf1=\xe7=\xef\xbe\x9ds\xef\x8b\xdf\xfe\xf1\x9f\xbe\xfd\xee_\xff\xf2\'[\xbf\xeb\xfd\xa0\xed/\xff\xfb\x0f\xff\xf8?\xbf\xb5u=\xb0\xdb\xff\xcf\xe3o\xed\xf6?~\xf7G\xdb\xbf\xfc\x8f\xef\xfe\xd7?\xdaz\xba\x1f\xda\xbe\x1bu\x0c\x8d\r\xb8\x06\x86\x87\x1c\x83v\xfb\x9f\xfe\xdce\xeb\xeaw\xb9F\xbe\xb1\xdb?|\xf8\xd0\xfd\xe1q\xf7\xf0\xe8[\xfbw\xffl\xff\x99`\xf5\xd0d\xe3\xeb\x03\x97efw\x9f\xab\xaf\xeb\xd5\x17\xbfy!W\xfc\xf9\xfd\xe0\xd0\xd8\xcb\x16pz\x9e={\xc6\xd3y\xb0\xd3\xd1G\xff\x7
复制代码
read之后,无法decode,(我试了一大堆编码方式,全不行)
麻烦各位帮忙看一下
爬虫这方面确实没啥经验
- import requests
- from lxml import html
- url = "http://www.xbiquge.la/1/1508/1158793.html"
- headers={
- 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
- }
- request = requests.get(url,headers=headers)
- request = request.content.decode("utf-8")
-
- selector = html.fromstring(request)
- txt_list = selector.xpath('//div[@id = "content"]/text()')
- txt = ''
- for i in txt_list:
- i = repr(i).replace(r'\r','').replace(r'\xa0','').replace("'",'')
- txt +=i
- print(txt)
复制代码
|
评分
-
查看全部评分
|