Python爬虫中遇到的两个小问题求解答（print不出、reload问题）

CC木雨 · 发表于 2018-7-18 13:30:47

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

我在学习第14章《论一只爬虫的自我修养》中遇到两个问题：

1.讲解BeautifulSoup中  “爬取百度百科 ‘网络爬虫’ 的词条“，我写出代码出现了  ”name 'reload' is not defined“  和  “'ascii' codec can't encode characters in position 10-13: ordinal not in range(128)”  的问题，不知道原因是什么，具体代码如下：

import urllib.request
import re
from bs4 import BeautifulSoup

def main():
      url = "https://baike.baidu.com/item/网络爬虫/5162711?fr=aladdin"
      response = urllib.request.urlopen(url)
      html = response.read()
      soup = BeautifulSoup(html, 'html.parser')

      for each in soup.find_all(href = re.compile('view')):
            print(each.text, '->', ''.join(['https://baike.baidu.com', each['href']]))

if __name__ == '__main__':
      main()

运行结果：

Error in sitecustomize; set PYTHONVERBOSE for traceback:
NameError: name 'reload' is not defined
Traceback (most recent call last):
  File "爬虫.py", line 15, in <module>
main()
  File "爬虫.py", line 7, in main
response = urllib.request.urlopen(url)
  File "C:\Python\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
  File "C:\Python\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
  File "C:\Python\lib\urllib\request.py", line 544, in _open
'_open', req)
  File "C:\Python\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
  File "C:\Python\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
  File "C:\Python\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Python\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Python\lib\http\client.py", line 1250, in _send_request
self.putrequest(method, url, **skips)
  File "C:\Python\lib\http\client.py", line 1117, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128)

***Repl Closed***

2.在BeautifulSoup里面的第二个案例中，要求爬虫允许用户输入搜索的关键词，可以进入每一个词条，然后检测该词条是否具有副标题，如果有，就将副标题一并打印出来，我遇到的是和我上一个问题中一样的  ” name 'reload' is not defined“  问题，具体代码如下：

import urllib.request
import urllib.parse
import re
from bs4 import BeautifulSoup
import sys
import importlib
importlib.reload(sys)

def main():
      keyword = print('请输入关键词：')
      keyword = urllib.parse.urlencode({'word': keyword})
      response = urllib.request.urlopen('https://baike.baidu.com/item/%s'%keyword)
      html = response.read()
      soup = BeautifulSoup(html, 'html.parser')

      for each in soup.find_all(href = re.compile('view')):
            content = ''.join([each.text])
            url2 = ''.join(['https://baike.baidu.com', each['href']])
            response2 = urllib.request.urlopen(url2)
            html2 = response2.read()
            soup2 = BeautifulSoup(html2, 'html.parser')
            if soup2.h2:
                     content = ''.join([content, soup2.h2.text])
            content = ''.join([content, '->', url2])
            print(content)

if __name__ == "__main__":
      main()

运行结果：
Error in sitecustomize; set PYTHONVERBOSE for traceback:
NameError: name 'reload' is not defined
请输入关键词：

***Repl Closed***

我在网上查的是reload只在Python2.*中使用，但是我在具体代码中没有出现reload，我也不知道是为什么，我按照网上能找到的办法都试过了，还是出现这个错误。还希望各位朋友帮助解答！多谢！

qq335702318 · 发表于 2018-7-23 23:06:31

绝园的白色相簿 · 发表于 2018-7-29 15:37:35

大兄弟你第二个代码导入的模块不是有reload嘛，删掉试一试

账号		自动登录	找回密码
密码			立即注册