本帖最后由 YunGuo 于 2020-11-13 01:17 编辑
如果标签之间有空格,可以尝试parsel模块:from parsel import Selector
html = """
<tr class="">
<td class="title">
<a title="9.29演出" class="">
9.29演出
</a>
</td>
<td nowrap="nowrap">
<a class="">meu</a>
</td>
<td nowrap="nowrap" class="r-count "> </td>
<td nowrap="nowrap" class="time">2018-09-07</td>
</tr>
<tr class="">
<td class="title">
<a title="9.29演出" class="">
9.29演出
</a>
</td>
<td nowrap="nowrap">
<a class="">meu</a>
</td>
<td nowrap="nowrap" class="r-count ">50</td>
<td nowrap="nowrap" class="time">2018-09-07</td>
</tr>
"""
sel = Selector(html)
count = sel.xpath('//*[@class="r-count "]/text()').extract()
print(count)
如果标签之间没有空格,可以用尝试pyquery模块:from pyquery import PyQuery as pq
html = """
<tr class="">
<td class="title">
<a title="9.29演出" class="">
9.29演出
</a>
</td>
<td nowrap="nowrap">
<a class="">meu</a>
</td>
<td nowrap="nowrap" class="r-count "></td>
<td nowrap="nowrap" class="time">2018-09-07</td>
</tr>
<tr class="">
<td class="title">
<a title="9.29演出" class="">
9.29演出
</a>
</td>
<td nowrap="nowrap">
<a class="">meu</a>
</td>
<td nowrap="nowrap" class="r-count ">50</td>
<td nowrap="nowrap" class="time">2018-09-07</td>
</tr>
"""
doc = pq(html)
count = doc('.r-count').items()
counts = []
for i in count:
num = i.text()
counts.append(num)
print(counts)
|