[已解决]学校的python爬虫作业抓取学校图书馆馆藏书的书名目录

skyline2333 · 发表于 2020-6-10 18:53:14

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 skyline2333 于 2020-6-10 18:57 编辑

第一次提问
本人大一经济学专业学校python课作业
题目：写程序，检索浙江财经大学图书馆自己专业的图书目录，显示图书检索目录并将目录保存到文本文件book.txt中
网址：http://opac.zufe.edu.cn:8080/browse/cls_browsing.php
目前写不下去的代码：
import requests
from bs4 import BeautifulSoup
url='http://opac.zufe.edu.cn:8080/browse/cls_browsing.php'
response=requests.get(url)
soup=BeautifulSoup(response.text, 'html.parser')
items=soup.findAll('h3')
print(response.text)
for itm in items:
title_0=itm.find('h3').find('strong').find('a').text
print(title_0)
#f=open('C:\\Users\\kingk\\Desktop\\file.txt','a',encoding='utf-8')
#f.write(title)
问题：某一专业的书要点一下左边的按钮才会显示用“查看”直接找有“” 好像是html的body里面又有一层html和title和body啥的
'''
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link type="text/css" rel="stylesheet" href="../tpl/css/style.css">
<script src="../tpl/js/jquery.js"></script>

</head>
'''
里面一层的标签直接用.find好像找不到换了好多次都不行是不是要多次请求还是其他什么方法
请大佬救救孩子

最佳答案

月排行榜 / 总排行榜

suchocolate

2020-6-11 17:13:28

import requests
import html
from lxml import etree
url = 'http://opac.zufe.edu.cn:8080/browse/cls_browsing_tree.php?s_doctype=all&cls=F&lvl=1#nodeF'
headers = {'user-agent': 'firefox'}
r = requests.get(url, headers=headers)
data = html.unescape(r.text)
with open('r.txt', 'w') as f:
f.write(data)
html = etree.HTML(data)
result = html.xpath('//text()')
final = []
for item in result:
if item.startswith('F'):
final.append(item)
print(final)

复制代码

跳转到最佳答案楼层

suchocolate · 发表于 2020-6-11 17:13:28

这个最佳答案由 suchocolate 给出，感谢 suchocolate 的回答。

单击隐藏图章

import requests
import html
from lxml import etree
url = 'http://opac.zufe.edu.cn:8080/browse/cls_browsing_tree.php?s_doctype=all&cls=F&lvl=1#nodeF'
headers = {'user-agent': 'firefox'}
r = requests.get(url, headers=headers)
data = html.unescape(r.text)
with open('r.txt', 'w') as f:
f.write(data)
html = etree.HTML(data)
result = html.xpath('//text()')
final = []
for item in result:
if item.startswith('F'):
final.append(item)
print(final)

复制代码

账号		自动登录	找回密码
密码			立即注册

[已解决]学校的python爬虫作业 抓取学校图书馆馆藏书的书名目录

马上注册，结交更多好友，享用更多功能^_^

[已解决]学校的python爬虫作业抓取学校图书馆馆藏书的书名目录