刚学爬虫，到底问题出在那呢？求大佬指点,Python交流,编程语言专区,鱼C论坛

8wy403208 发表于 2021-5-19 15:25:13

刚学爬虫，到底问题出在那呢？求大佬指点

import urllib.request

def download_html(url):

header = {
   "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64)"
   "AppleWebKit/537.36 (KHTML, like Gecko)"
   "Chrome/90.0.4430.85 Safari/537.36"

}

req = urllib.request.Request(url = url, headers = header)

response = urllib.request.urlopen(req)

html = response.read().decode("utf-8")

return html

html = duwnload_html("https://movie.douban.com/top250")

import re

pattern = 'https://movie.douban.com/subject/+/'

urls = re.findall(pattern,html)

urls = set(urls)

print("urls count=%d"%(len(urls)))
for url in urls:
print(url)

suchocolate 发表于 2021-5-19 15:25:14

8wy403208 发表于 2021-5-19 23:17
明白了，可还是有问题。。。

改好之后我这运行正常

suchocolate 发表于 2021-5-19 16:12:02

html = download_html("https://movie.douban.com/top250")

8wy403208 发表于 2021-5-19 23:17:41

suchocolate 发表于 2021-5-19 16:12
html = download_html("https://movie.douban.com/top250")

明白了，可还是有问题。。。

crfire 发表于 2021-5-20 10:22:44

这爬取后的东西在哪

8wy403208 发表于 2021-5-20 22:33:39

suchocolate 发表于 2021-5-20 00:04
改好之后我这运行正常

为啥我这里是这样的？

suchocolate 发表于 2021-5-20 22:34:31

8wy403208 发表于 2021-5-20 22:33
为啥我这里是这样的？

函数调用名字写错了

8wy403208 发表于 2021-5-20 22:40:45

我又试了下，可以运行了。原来是多打了个字母。。。。无语了{:10_269:}

页: [1]

鱼C论坛's Archiver

刚学爬虫，到底问题出在那呢？求大佬指点