马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
本帖最后由 yinda_peng 于 2024-2-19 17:33 编辑
此案例来源于【中国大学MOOC】,或者见【哔哩哔哩网课】,bilibili在P29开始,我发现这个案例由于原url链接变化已导致失效,是以在原课程示例代码的基础上自行修改,以实现功能。 不废话,url:https://www.shanghairanking.cn/rankings/bcur/201611 代码: - import requests
- from bs4 import BeautifulSoup
- import bs4
- import pandas as pd
- def getHTMLText(url):
- try:
- r = requests.get(url)
- r.raise_for_status()
- r.encoding = r.apparent_encoding
- return r.text
- except:
- return ""
- def fillUnivList(ulist,html):
- soup = BeautifulSoup(html,"html.parser")
- for tr in soup.find('tbody').children:
- if isinstance(tr,bs4.element.Tag):
- tds = tr('div')
- td = tr('td')
- ulist.append([tds[0].string[29:-25],tds[-1].string[:-13],td[-2].string[25:-2]])
- def main():
- uinfo = []
- url = "https://www.shanghairanking.cn/rankings/bcur/201611"
- html = getHTMLText(url)
- fillUnivList(uinfo,html)
- df = pd.DataFrame(uinfo, columns=["排名", "学校名称", "总分"])
- writer = pd.ExcelWriter('C:/Users/Lenovo/Desktop/university_ranking.xlsx', engine='xlsxwriter')
- df.to_excel(writer, index=False)
- writer.close()
- print('已将数据保存到指定路径的Excel文件中')
- main()
复制代码自己运行的时候记得修改路径,在main函数witer = pd.ExcelWriter("……")那一行,我运行得到的结果如下(貌似没法上传xlsx附件?):
|