一个从网页爬取文章的问题
想从一个文献网站中爬取摘要。在保证soul对象为非空是时,第一个可以打印出摘要,但后面的都不行。附上代码:for link in article_links:
abstract = []
print(link)
response = requests.get(link, headers=Headers)
soul = beautifulsoup(response.content,'lxml')
soul = soul.select('#en-abstract > p')
print('soul为:',soul)
if len (soul) == 0:
print('soul为空,该文章没有摘要!')
else:
print('文章的摘要为:{}'.format(soul.string))
附上结果:
https://pubmed.ncbi.nlm.nih.gov/32646971/
soul为: [<p> The general functions of lymphatic vessels in fluid transport and immunosurveillance are well recognized. However, accumulating evidence indicates that lymphatic vessels play active and versatile roles in a tissue- and organ-specific manner during homeostasis and in multiple disease processes. This Review discusses recent advances to understand previously unidentified functions of adult mammalian lymphatic vessels, including immunosurveillance and immunomodulation upon pathogen invasion, transport of dietary fat, drainage of cerebrospinal fluid and aqueous humor, possible contributions toward neurodegenerative and neuroinflammatory diseases, and response to anticancer therapies. </p>]
文章的摘要为: The general functions of lymphatic vessels in fluid transport and immunosurveillance are well recognized. However, accumulating evidence indicates that lymphatic vessels play active and versatile roles in a tissue- and organ-specific manner during homeostasis and in multiple disease processes. This Review discusses recent advances to understand previously unidentified functions of adult mammalian lymphatic vessels, including immunosurveillance and immunomodulation upon pathogen invasion, transport of dietary fat, drainage of cerebrospinal fluid and aqueous humor, possible contributions toward neurodegenerative and neuroinflammatory diseases, and response to anticancer therapies.
https://pubmed.ncbi.nlm.nih.gov/32646974/
soul为: []
soul为空,该文章没有摘要!
https://pubmed.ncbi.nlm.nih.gov/32646972/
soul为: [<p>Marine invertebrate ascidians display embryonic reproducibility: Their early embryonic cell lineages are considered invariant and are conserved between distantly related species, despite rapid genomic divergence. Here, we address the drivers of this reproducibility. We used light-sheet imaging and automated cell segmentation and tracking procedures to systematically quantify the behavior of individual cells every 2 minutes during <i>Phallusia mammillata</i> embryogenesis. Interindividual reproducibility was observed down to the area of individual cell contacts. We found tight links between the reproducibility of embryonic geometries and asymmetric cell divisions, controlled by differential sister cell inductions. We combined modeling and experimental manipulations to show that the area of contact between signaling and responding cells is a key determinant of cell communication. Our work establishes the geometric control of embryonic inductions as an alternative to classical morphogen gradients and suggests that the range of cell signaling sets the scale at which embryonic reproducibility is observed.</p>]
文章的摘要为:None
https://pubmed.ncbi.nlm.nih.gov/32647004/
soul为: [<p>The dentitions of extant fishes and land vertebrates vary in both pattern and type of tooth replacement. It has been argued that the common ancestral condition likely resembles the nonmarginal, radially arranged tooth files of arthrodires, an early group of armoured fishes. We used synchrotron microtomography to describe the fossil dentitions of so-called acanthothoracids, the most phylogenetically basal jawed vertebrates with teeth, belonging to the genera <i>Radotina</i>, <i>Kosoraspis</i>, and <i>Tlamaspis</i> (from the Early Devonian of the Czech Republic). Their dentitions differ fundamentally from those of arthrodires; they are marginal, carried by a cheekbone or a series of short dermal bones along the jaw edges, and teeth are added lingually as is the case in many chondrichthyans (cartilaginous fishes) and osteichthyans (bony fishes and tetrapods). We propose these characteristics as ancestral for all jawed vertebrates.</p>]
文章的摘要为:None
把代码 print('文章的摘要为:{}'.format(soul.string)) 改成 print('文章的摘要为:{}'.format(soul.text)) 试试看:
for link in article_links:
abstract = []
print(link)
response = requests.get(link, headers=Headers)
soul = beautifulsoup(response.content,'lxml')
soul = soul.select('#en-abstract > p')
print('soul为:',soul)
if len (soul) == 0:
print('soul为空,该文章没有摘要!')
else:
print('文章的摘要为:{}'.format(soul.text))
就是这个问题,查了文档好像和sring的用法有关
页:
[1]