bs4,Python交流,编程语言专区,鱼C论坛

歌者文明清理员 发表于 2023-5-7 14:32:25

bs4

from bs4 import BeautifulSoup
soup = BeautifulSoup('''<html>
<head>
<meta charset='utf-8'>
</head>
<body>
<div>
<p id="myid" class="test">1</p>
</div>
<div>
<span>Span</span>
</div>
</body>
</html>''')
两个问题：

[*]已有变量 tag = soup.findAll('div')，请问如何选取 tag 的父节点
[*]如何获取 tag 的子节点 p 的 id 和 class？最好多列几种方法

陶远航 发表于 2023-5-7 14:34:26

1.你可以用以下方法来选取 tag 的父节点：
tag = soup.findAll('div')
parent = tag.parent # 获取 tag 的直接父节点
parents = tag.parents # 获取 tag 的所有祖先节点，返回一个生成器
你可以用以下方法来获取 tag 的子节点 p 的 id 和 class：
tag = soup.findAll('div')
p = tag.find('p') # 获取 tag 的第一个子节点 p
p_id = p['id'] # 获取 p 的 id 属性
p_class = p['class'] # 获取 p 的 class 属性
希望这些信息对你有帮助。

isdkz 发表于 2023-5-7 14:36:31

1. 要选取tag的父节点，您可以使用parent属性。例如：

parent_node = tag.parent

2. 要获取tag的子节点p的id和class属性，有以下几种方法：

方法一：使用find函数，并通过属性作为参数直接获取属性值。

p_tag = tag.find('p')
p_id = p_tag['id']
p_class = p_tag['class']

方法二：使用find函数获取p标签，然后使用get函数来获取属性值。

p_tag = tag.find('p')
p_id = p_tag.get('id')
p_class = p_tag.get('class')

方法三：使用select_one函数获取p标签，并通过属性作为参数直接获取属性值。

p_tag = tag.select_one('p')
p_id = p_tag['id']
p_class = p_tag['class']

方法四：使用select_one函数获取p标签，然后使用get函数来获取属性值。

p_tag = tag.select_one('p')
p_id = p_tag.get('id')
p_class = p_tag.get('class')

有多种方法可以达到获取 p 标签属性值的目的，根据您的需求选择适合的方法。

歌者文明清理员 发表于 2023-5-7 14:40:23

isdkz 发表于 2023-5-7 14:36
1. 要选取tag的父节点，您可以使用parent属性。例如：

还有一个问题，就是如何判断 tag 的子元素个数（不能是孙元素等等）

isdkz 发表于 2023-5-7 14:44:16

歌者文明清理员发表于 2023-5-7 14:40
还有一个问题，就是如何判断 tag 的子元素个数（不能是孙元素等等）

要判断tag的直接子元素个数（不包括孙元素），您可以使用children属性。以下是一个例子：

number_of_children = len(list(tag.children))

这里我们首先从tag获取其直接子元素，然后将其转换为 Python 列表，最后使用len函数获得列表的长度，即子元素个数。请注意， children仅包括直接子元素，不包括孙元素。

歌者文明清理员 发表于 2023-5-7 14:45:31

isdkz 发表于 2023-5-7 14:44
要判断tag的直接子元素个数（不包括孙元素），您可以使用children属性。以下是一个例子：

最后一个问题，如何判断一个子元素在父元素中的位置，类似于index

isdkz 发表于 2023-5-7 14:46:48

歌者文明清理员发表于 2023-5-7 14:45
最后一个问题，如何判断一个子元素在父元素中的位置，类似于index

要确定子元素在父元素中的位置（类似于索引），您可以使用以下方法：

def find_child_index(parent, target_child):
children_list = list(parent.children)
for index, child in enumerate(children_list):
if child == target_child:
return index
return -1# 如果子元素不在父元素中，返回 -1

在这个函数中，我们首先将父元素的children属性转换为 Python 列表。然后，我们遍历该列表并使用enumerate获取当前子元素的索引。如果找到目标子元素，则返回其索引。如果在父元素的子元素中找不到目标子元素，返回-1 。

例如，使用以下代码获取子元素span在其父元素中的位置：

span_tag = soup.find('span')
parent = span_tag.parent
child_index = find_child_index(parent, span_tag)

歌者文明清理员 发表于 2023-5-7 14:47:25

isdkz 发表于 2023-5-7 14:46
要确定子元素在父元素中的位置（类似于索引），您可以使用以下方法：

直接children_list.index(target_child)可以吗

isdkz 发表于 2023-5-7 14:51:25

歌者文明清理员发表于 2023-5-7 14:47
直接children_list.index(target_child)可以吗

抱歉刚刚没有考虑到这个更简洁的方法。是的，您可以直接使用children_list.index(target_child)来获取子元素在父元素中的位置。下面是修改后的代码：

def find_child_index(parent, target_child):
children_list = list(parent.children)
try:
return children_list.index(target_child)
except ValueError:
return -1# 如果子元素不在父元素中，返回 -1

这个函数现在使用index方法查找目标子元素在children_list中的位置。如果找到目标子元素，将返回其索引；否则， index方法将引发ValueError异常，我们捕获这个异常并返回-1 。

页: [1]

鱼C论坛's Archiver

bs4