[已解决]高手帮看一下，使用python3编写的网络爬虫哪里有问题

monkeyjz · 发表于 2018-1-5 08:40:33

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

帮看一下，写的爬虫，不能用，自己新手不是很熟练

import re
import urllib.request

def getHtml(url):
page = urllib .request.urlopen(url)
html = page.read()
print(html)
return html

def getImg(html):
reg = r'src="(.*?\.jpg)"'
imgre = re.compile(reg)
imglist = re.findall(imgre,html)

for imgurl in imglist:
urllib.urlretrieve(imgurl,'1.jpg')

html = getHtml("https://tieba.baidu.com/p/3740796143#!/l/p1")
print(getImg(html))

最佳答案

月排行榜 / 总排行榜

时光不老

2018-1-5 09:32:21

py3环境

import re
import urllib.request
def getHtml(url):
page = urllib .request.urlopen(url)
html = page.read()
# print(html)
return html
def getImg(html):
# reg = 'src="(.*?\.jpg)"'
# imgre = re.compile(reg)
imglist = re.findall('src="(.*?\.jpg)"',html.decode('utf-8'))
for imgurl in imglist:
urllib.request.urlretrieve(imgurl,'1.jpg')
html = getHtml("https://tieba.baidu.com/p/3740796143#!/l/p1")
print(getImg(html))

复制代码

遇到两个报错：
AttributeError: module 'urllib' has no attribute 'urlretrieve'
通过 urllib.request.urlretrieve() 解决
TypeError: cannot use a string pattern on a bytes-like object
通过 html.decode('utf-8') 解决

跳转到最佳答案楼层

时光不老 · 发表于 2018-1-5 09:32:21

py3环境

import re
import urllib.request
def getHtml(url):
page = urllib .request.urlopen(url)
html = page.read()
# print(html)
return html
def getImg(html):
# reg = 'src="(.*?\.jpg)"'
# imgre = re.compile(reg)
imglist = re.findall('src="(.*?\.jpg)"',html.decode('utf-8'))
for imgurl in imglist:
urllib.request.urlretrieve(imgurl,'1.jpg')
html = getHtml("https://tieba.baidu.com/p/3740796143#!/l/p1")
print(getImg(html))

复制代码

遇到两个报错：
AttributeError: module 'urllib' has no attribute 'urlretrieve'
通过 urllib.request.urlretrieve() 解决
TypeError: cannot use a string pattern on a bytes-like object
通过 html.decode('utf-8') 解决

monkeyjz · 发表于 2018-1-5 19:04:01

谢谢帮助

账号		自动登录	找回密码
密码			立即注册

[已解决]高手帮看一下，使用python3编写的网络爬虫哪里有问题

马上注册，结交更多好友，享用更多功能^_^

浏览过的版块