刚学爬虫,到底问题出在那呢?求大佬指点
import urllib.request
def download_html(url):
header = {
"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64)"
"AppleWebKit/537.36 (KHTML, like Gecko)"
"Chrome/90.0.4430.85 Safari/537.36"
}
req = urllib.request.Request(url = url, headers = header)
response = urllib.request.urlopen(req)
html = response.read().decode("utf-8")
return html
html = duwnload_html("https://movie.douban.com/top250")
import re
pattern = 'https://movie.douban.com/subject/+/'
urls = re.findall(pattern,html)
urls = set(urls)
print("urls count=%d"%(len(urls)))
for url in urls:
print(url)
8wy403208 发表于 2021-5-19 23:17
明白了,可还是有问题。。。
改好之后我这运行正常 html = download_html("https://movie.douban.com/top250") suchocolate 发表于 2021-5-19 16:12
html = download_html("https://movie.douban.com/top250")
明白了,可还是有问题。。。 这爬取后的东西在哪
suchocolate 发表于 2021-5-20 00:04
改好之后我这运行正常
为啥我这里是这样的? 8wy403208 发表于 2021-5-20 22:33
为啥我这里是这样的?
函数调用名字写错了 我又试了下,可以运行了。原来是多打了个字母。。。。无语了{:10_269:}
页:
[1]