BeautifulSoup如何提取<td>下的span,Python交流,编程语言专区,鱼C论坛

shenshuai 发表于 2020-8-16 00:31:43

BeautifulSoup如何提取<td>下的span

本帖最后由 shenshuai 于 2020-8-16 00:32 编辑

大佬们，我现在想提取一段字符串（629,088 in Automotive），如图所示。但是这里貌似什么有用的标签都没有。请问应该怎么用beautifulSoup提取他呢？

这是网页地址：https://www.amazon.com/dp/B071HVG69S

suchocolate 发表于 2020-8-16 10:08:53

body绑定了js事件，直接拿不到你的html元素，估计得用selenium。

xiaosi4081 发表于 2020-8-17 14:07:59

soup.find("td").span

luxiaolan6373 发表于 2020-8-17 20:58:07

我没搜到你说的这个字符串,,如果资料在js里面的话就需要用json模块..分析字典类型的数据.用BeautifulSoup是分析xml格式的,并不能分析json如果要分析json先把接口找到来,然后发论坛,我再给你写段代码,你看我帖子,我最近有发例子

lhgzbxhz 发表于 2020-8-20 10:04:27

手写的，没经过调试，不知道行不行
#soup是BeautifulSoup对象
table = soup.find("div", id="productDetails_db_sections").find("div", class_="a-section table-padding").find("table")
trs = table.find_all("tr")
st = "" # 要提取的字符串
for tr in trs:
td = tr.find("td")
if td.text().find("Automotive") > 0:
st = td.text()
break

shenshuai 发表于 2020-8-31 01:59:21

感谢各位大佬的回答，我用了笨办法做出来了，我用re模块，给大家借鉴一下
level_category = soup.find(id='productDetails_detailBullets_sections1')# 获取一级和二级排名
content = str(level_category)
reg = '<span>#(.*?) in'
result = re.findall(reg, content)

bonst 发表于 2020-9-1 20:54:33

我觉得parsel模块也不错，用css选择器

页: [1]

鱼C论坛's Archiver

BeautifulSoup如何提取<td>下的span