|
|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
一下是代码,这个代码是爬去一个网站的。可以通过bs4进行处理,但是无法进行print和write的操作。
怀疑是计算机自身的编码问题。请问有大神可以解答一下么
- # -*- coding: utf-8 -*-
- import requests,zlib,gzip
- # from pdfkit import *
- import pdfkit
- from io import StringIO
- from bs4 import BeautifulSoup
- import sys
- # sys.setdefaultencoding("utf-8")
- URL="https://daily.zhihu.com"
- def GetUrl(url):
- header={
- "Accept-Encoding":"gzip, deflate",
- "Accept-Language":"zh-CN,zh;q=0.8",
- "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3831.602 Safari/537.36"
- }
- a=requests.get(url,headers=header)
- if a.status_code ==200:
- bs=BeautifulSoup(a.text,"lxml")
- bs.prettify()
- return bs
- # def Download(path,)
- bs=GetUrl(URL)
- title=bs.find_all("a",class_="link-button")
- for i in title:
- print(i)
- uid=i["href"]
- img=i.find("img")["src"]
- name=i.find("span").text
- # ir=requests.get(img)
- # open('text.png',"wb").write(ir.content)
- break
- print(URL+uid)
- proce=GetUrl(URL+uid)
- a=open("text.html","w")
- a.write(proce.text)
复制代码
这是报错
- <a class="link-button" href="/story/9660557"><img class="preview-image" src="https://pic2.zhimg.com/v2-6be190072ce0664f1549accc74bc51c5.jpg"/><span class="title">伤口愈合这事,你以为简单吧?可是我花了博士四年都还没搞懂</span></a>
- https://daily.zhihu.com/story/9660557
- Traceback (most recent call last):
- File "D:/OFFICE/learing-program/fishc/chapion/4-4/zhihu.py", line 43, in <module>
- a.write(proce.text)
- UnicodeEncodeError: 'gbk' codec can't encode character '\xf6' in position 2050: illegal multibyte sequence
- utf-8
- Process finished with exit code 1
复制代码
|
|