|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
现象:
源代码:
- from selenium.webdriver import Chrome, ChromeOptions
- from bs4 import BeautifulSoup
- from pandas import DataFrame
- from time import sleep
- options = ChromeOptions()
- options.add_argument("headless")
- driver = Chrome(options=options)
- url = 'https://movie.douban.com/explore'
- driver.get(url)
- sleep(2)
- source = driver.page_source
- soup = BeautifulSoup(source, 'lxml')
- names = soup.select('.drc-subject-info-title-text')
- names = [name.text for name in names]
- types = soup.select('.drc-subject-info-subtitle')
- types = [movieType.text for movieType in types]
- data = []
- for i, name in enumerate(names):
- data.append([name, types[i]])
- df = DataFrame(data)
- df.to_csv('1.csv', encoding='utf-8')
- driver.quit()
复制代码
(曾多次运行,仍未解决)
这个问题可能是由于输出到csv文件时使用的编码不正确导致的。请尝试使用UTF-8-sig编码来保存csv文件,即将 encoding='utf-8' 改为 encoding='utf-8-sig' 。
修改后的代码如下:
- from selenium.webdriver import Chrome, ChromeOptions
- from bs4 import BeautifulSoup
- from pandas import DataFrame
- from time import sleep
- options = ChromeOptions()
- options.add_argument("headless")
- driver = Chrome(options=options)
- url = 'https://movie.douban.com/explore'
- driver.get(url)
- sleep(2)
- source = driver.page_source
- soup = BeautifulSoup(source, 'lxml')
- names = soup.select('.drc-subject-info-title-text')
- names = [name.text for name in names]
- types = soup.select('.drc-subject-info-subtitle')
- types = [movieType.text for movieType in types]
- data = []
- for i, name in enumerate(names):
- data.append([name, types[ i]])
- df = DataFrame(data)
- df.to_csv('1.csv', encoding='utf-8-sig')
- driver.quit()
复制代码
这样应该就能够正确保存csv文件了。
|
|