爬虫问题,Python交流,编程语言专区,鱼C论坛

wtfitis 发表于 2021-9-14 16:24:21

爬虫问题

https://www.ncbi.nlm.nih.gov/biosample/SAMEA1030995
         oresp=requests.get(url="https://www.ncbi.nlm.nih.gov/biosample/SAMEA1030995",headers=headers)
         print(oresp.text)
         html1=etree.HTML(oresp.content)
         htmldata1=html1.xpath('//div[@id="maincontent"]/div/div/div/div/dl/dd/table/tbody/tr/td/text()')

为什么爬不出中间的'GP580'

suchocolate 发表于 2021-9-14 16:24:22

我尝试了etree，貌似读这个html会报错，用re取到了。
import requests
import re

def main():
url = 'https://www.ncbi.nlm.nih.gov/biosample/SAMEA1030995'
headers = {'user-agent': 'firefox'}
r = requests.get(url, headers=headers)
result = re.findall(r'sample name</th><td>(.*?)</td>', r.text)
print(result)

if __name__ == '__main__':
main()

wtfitis 发表于 2021-9-14 17:25:57

自己顶一下有没有帮帮我~

wtfitis 发表于 2021-9-14 17:42:36

suchocolate 发表于 2021-9-14 16:24
我尝试了etree，貌似读这个html会报错，用re取到了。

其实我也用re解决了。。但还是想试试etree这种结构找到。。不过我没去试beautifulsoup。。应该可以

页: [1]

鱼C论坛's Archiver

爬虫问题