新手小白学习爬虫,遇到报错问题一大堆,求大佬帮忙解答一下,万分感谢~~~
目标:爬取这个页面上的电影名称https://movie.douban.com/cinema/nowplaying/shanghai/
我的代码如下:
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
url = requests.get('https://movie.douban.com/cinema/nowplaying/shanghai/',headers=headers)
res = requests.get(url).text
content = BeautifulSoup(res, "html.parser")
data = content.find_all('li', attrs={'stitle': 'title'})
cinema_list = []
for d in data:
plist = d.find('name')['title']
cinema_list.append(plist)
print(cinema_list)
以下是报错信息:
E:\anaconda3\python.exe C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py
Traceback (most recent call last):
File "E:\anaconda3\lib\site-packages\requests\models.py", line 380, in prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url)
File "E:\anaconda3\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
return six.raise_from(LocationParseError(source_url), None)
File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: <Response >
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py", line 1690, in <module>
res = requests.get(url).text
File "E:\anaconda3\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "E:\anaconda3\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "E:\anaconda3\lib\site-packages\requests\sessions.py", line 516, in request
prep = self.prepare_request(req)
File "E:\anaconda3\lib\site-packages\requests\sessions.py", line 449, in prepare_request
p.prepare(
File "E:\anaconda3\lib\site-packages\requests\models.py", line 314, in prepare
self.prepare_url(url, params)
File "E:\anaconda3\lib\site-packages\requests\models.py", line 382, in prepare_url
raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: <Response >
Process finished with exit code 1
新手小白求助大佬,帮忙解答一下,万分感谢! 本帖最后由 isdkz 于 2023-4-19 12:27 编辑
我对原始代码进行了以下修改:
1、修改了获取网页内容的方式:在原始代码中,您首先使用requests.get()获取了一个响应对象,然后又尝试使用requests.get()获取该响应对象的文本内容。
这是不正确的,因为您已经得到了响应对象。在修改后的代码中,我直接在第一次调用requests.get()时,传入URL和请求头,并使用.text属性获取响应文本内容。
原始代码:
url = requests.get('https://movie.douban.com/cinema/nowplaying/shanghai/',headers=headers)
res = requests.get(url).text
修改后的代码:
url = 'https://movie.douban.com/cinema/nowplaying/shanghai/'
res = requests.get(url, headers=headers).text
2、修改了查找影片信息的方式:原始代码中,您试图查找具有stitle属性的li元素。但实际上,这些li元素的类名为stitle。
因此,我将查找条件从attrs={'stitle': 'title'}更改为attrs={'class': 'stitle'}。
原始代码:
data = content.find_all('li', attrs={'stitle': 'title'})
修改后的代码:
data = content.find_all('li', attrs={'class': 'stitle'})
3、修改了从每个li元素中提取影片名称的方式:原始代码中,您试图从name元素的title属性中提取影片名称。
然而,实际上,影片名称位于a元素的title属性中。因此,我将提取方式从d.find('name')['title']更改为d.find('a')['title']。
原始代码:
plist = d.find('name')['title']
修改后的代码:
plist = d.find('a')['title']
经过上述修改后,代码应该可以正确运行并输出当前上海正在上映的电影列表。
修改后的完整代码:
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
url = 'https://movie.douban.com/cinema/nowplaying/shanghai/'
res = requests.get(url, headers=headers).text
content = BeautifulSoup(res, "html.parser")
data = content.find_all('li', attrs={'class': 'stitle'})
cinema_list = []
for d in data:
plist = d.find('a')['title']
cinema_list.append(plist)
print(cinema_list)
isdkz 发表于 2023-4-19 12:06
我对原始代码进行了以下修改:
1、修改了获取网页内容的方式:在原始代码中,您首先使用requests.get() ...
我修改了代码,但是还是有报错,这是为何?
修改后代码:
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
url = requests.get('https://movie.douban.com/cinema/nowplaying/shanghai/',headers=headers)
res = requests.get(url, headers=headers).text
content = BeautifulSoup(res, "html.parser")
data = content.find_all('li', attrs={'class': 'stitle'})
cinema_list = []
for d in data:
plist = d.find('a')['title']
cinema_list.append(plist)
print(cinema_list)
报错信息如下:
E:\anaconda3\python.exe C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py
Traceback (most recent call last):
File "E:\anaconda3\lib\site-packages\requests\models.py", line 380, in prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url)
File "E:\anaconda3\lib\site-packages\urllib3\util\url.py", line 392, in parse_url
return six.raise_from(LocationParseError(source_url), None)
File "<string>", line 3, in raise_from
urllib3.exceptions.LocationParseError: Failed to parse: <Response >
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py", line 1690, in <module>
res = requests.get(url, headers=headers).text
File "E:\anaconda3\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "E:\anaconda3\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "E:\anaconda3\lib\site-packages\requests\sessions.py", line 516, in request
prep = self.prepare_request(req)
File "E:\anaconda3\lib\site-packages\requests\sessions.py", line 449, in prepare_request
p.prepare(
File "E:\anaconda3\lib\site-packages\requests\models.py", line 314, in prepare
self.prepare_url(url, params)
File "E:\anaconda3\lib\site-packages\requests\models.py", line 382, in prepare_url
raise InvalidURL(*e.args)
requests.exceptions.InvalidURL: Failed to parse: <Response >
Process finished with exit code 1
liubulong 发表于 2023-4-19 13:38
我修改了代码,但是还是有报错,这是为何?
修改后代码:
import requests
你还是没改呀,我一共改了三个地方,你就改了两个
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
url = requests.get('https://movie.douban.com/cinema/nowplaying/shanghai/',headers=headers) # 这里也要改
res = requests.get(url, headers=headers).text
content = BeautifulSoup(res, "html.parser")
data = content.find_all('li', attrs={'class': 'stitle'})
cinema_list = []
for d in data:
plist = d.find('a')['title']
cinema_list.append(plist)
print(cinema_list) isdkz 发表于 2023-4-19 13:51
你还是没改呀,我一共改了三个地方,你就改了两个
import requests
嗯,是我粗心大意了,不好意思,谢谢大佬指教~~万分感谢~~ liubulong 发表于 2023-4-19 14:29
嗯,是我粗心大意了,不好意思,谢谢大佬指教~~万分感谢~~
我现在想把他保存在excel表格中,但是报错了,麻烦再帮忙解答一些,谢谢
代码如下:
import requests
from bs4 import BeautifulSoup
import openpyxl
def open_url(url):
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
}
html = requests.get(url, headers=headers).text
return html
def get_data(html):
content = BeautifulSoup(html, "html.parser")
data = content.find_all('li', attrs={'class': 'stitle'})
cinema_list = []
for d in data:
plist = d.find('a')['title']
cinema_list.append(plist)
return cinema_list
def save_as_excel(mylist):
wb = openpyxl.Workbook()
ws = wb.active
ws['A1'] = '电影'
for d in mylist:
ws.append(d)
wb.save('2023电影排行.xlsx')
def main():
url = 'https://movie.douban.com/cinema/nowplaying/shanghai/'
html = open_url(url)
mylist = get_data(html)
save_as_excel(mylist)
if __name__ == '__main__':
main()
报错信息:
E:\anaconda3\python.exe C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py
Traceback (most recent call last):
File "C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py", line 1719, in <module>
main()
File "C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py", line 1716, in main
save_as_excel(mylist)
File "C:/Users/lyl/PycharmProjects/pythonProject1/dangdang/dangdang/spiders/abc.py", line 1708, in save_as_excel
ws.append(d)
File "E:\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 675, in append
self._invalid_row(iterable)
File "E:\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 811, in _invalid_row
raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>
Process finished with exit code 1
页:
[1]