[已解决]爬虫问题求助

isdkz · 发表于 2022-4-15 18:10:32

本帖最后由 isdkz 于 2022-4-16 08:23 编辑

第一个问题：

你可以继续对 tip 标签的内容使用 BeautifulSoup 解析

第二个问题：

想打印的更整齐一点可以使用 prettytable 这个库，使用之前先执行以下命令安装。

pip install prettytable -i https://mirrors.aliyun.com/pypi/simple
复制代码

对你的代码修改如下：

import requests

from bs4 import BeautifulSoup

from prettytable import PrettyTable

def gethttpText(url) : #获取网页数据

try :

      r = requests.get(url ,timeout = 30)

      r.raise_for_status()

      r.encoding = r.apparent_encoding

      return r.text

except :

      print("Funation getHttpText 代码出错！")

def parserPage(itl,html): #解析网页数据

try :

      soup = BeautifulSoup(html,"html.parser")

      find_shuju = soup.find('div', attrs={'class': "subjectbox"})  #查找标签div 内容为class="tit_replay"  这是最新的回复

      find_a = find_shuju.find_all('a')    #进一步查找到标签  a 内容

      for each in find_a :                #迭代标签 a  的取内容

         tip_soup = BeautifulSoup(each['tip'],"html.parser")

         ls_name = tip_soup.find('strong').text

         zz_name = tip_soup.find('br').next_element.split(':')[1].split('(')[0]

         itl.append([ls_name,zz_name,each.get("href")]) #放进itl 参数列表中

except:

      print("Funation parserPage 程序出错！")

def main():

try:

      url = 'http://bbs.lwhfishing.com/forum.php'

      html = gethttpText(url)

      itl = []

      parserPage(itl,html)

      table = PrettyTable(field_names=("序号","题目","作者","链接"))

      for i,j in enumerate(itl):

         table.add_row([i] + j)

      print(table)

except :

      print("Funation main 程序出错！")

if __name__ =="__main__" :

main()
复制代码

账号		自动登录	找回密码
密码			立即注册

[已解决]爬虫问题求助

浏览过的版块