|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
from bs4 import BeautifulSoup
import pandas as pd
handle=driver.window_handles
driver.switch_to.window(handle[0])
driver.switch_to.frame('mainContent')
driver.switch_to.frame('detailContent')
pageSource = driver.page_source
soup = BeautifulSoup(pageSource, 'html.parser')
soup.prettify()
answer = soup.find_all('div', class_="answer")
list=[ ]
for i in range(len(answer)):
timu = answer[i].find_all('div', class_="timu")
answer = answer[i].find_all('span')
truedaan = answer[i].find_all('div', class_="truedaan")
list.append([timu,answer,truedaan])
df1 = pd.DataFrame(list)
df2 = pd.read_excel(file_name) #汇总文件
merge_x = pd.concat([df1, df2])
merge_x.to_excel(file_name, index=None)
print( '文件合并成功')
我扒下来的是题目和答案选项,想自己整理成EXCEL题库。但是上面这段代码打印出来的结果只有一行,所有题目的所有选项挤在一起。我想让它形成每题一行,每题的选项都分列显示。请问该怎么清理呢?
[<span>A.核查处理</span>, <span>B.结果审核</span>, <span class='C"' value="C">C.转回访</span>, <span>D.工单办结</span>, <span>A.投诉量</span>, <span>B.频次</span>, <span>C.传播范围</span>, <span>D.覆盖区域</span>, <span class='E"' value="E">E.以上都是</span>, <span>A.10秒</span>, <span>B.15秒</span>, <span>C.30秒</span>, <span class='D"' value="D">D.60秒</span>, <span>A.标准化</span>, <span>B.高质量</span>, <span class='C"' value="C">C.差异化</span>, <span>D.高品质</span>, <span>A.PIN1/PUK1</span>, <span>B.客户星级/会员级别</span>, <span>C.宽带用户性质</span>, <span class='D"' value="D">D.宽带上网账号</span>, <span class='A"' value="A">A.流量费争议</span>, <span>B.无法套餐变更</span>, <span>C.携号转网受限业务</span>, <span>D.擅自停机</span>, <span>A.提交二线</span>, <span class='B"' value="B">B.在线操作</span>, <span>C.提交省份</span>, <span>D.指引到营业厅</span>]
问题出在你的代码中,你将每个题目的选项都存储在一个列表中,然后将整个列表作为一个元素添加到了 list 中。所以最后生成的DataFrame只有一行,每个题目的选项都挤在了一起。
要解决这个问题,你需要将每个题目的选项都作为一个单独的元素添加到 list 中。下面是修改后的代码:
from bs4 import BeautifulSoup
import pandas as pd
handle = driver.window_handles
driver.switch_to.window(handle[0])
driver.switch_to.frame('mainContent')
driver.switch_to.frame('detailContent')
pageSource = driver.page_source
soup = BeautifulSoup(pageSource, 'html.parser')
soup.prettify()
answer_divs = soup.find_all('div', class_="answer")
list = []
for answer_div in answer_divs:
timu = answer_div.find_all('div', class_="timu")
answers = answer_div.find_all('span')
truedaan = answer_div.find_all('div', class_="truedaan")
for i in range(len(answers)):
list.append([timu, answers[ i], truedaan])
df1 = pd.DataFrame(list)
df2 = pd.read_excel(file_name)
merge_x = pd.concat([df1, df2])
merge_x.to_excel(file_name, index=None)
print('文件合并成功')
修改后的代码中,我对 answer 的遍历进行了修改,将每个选项都作为一个单独的元素添加到了 list 中。这样就可以保证每个题目的选项都分别显示在不同的列中了。
|
|