|
马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
如题,网页源代码如下:
我的代码
- def find_company(res):
- soup = bs4.BeautifulSoup(res.text, 'html.parser')
- # 公司名
- company = []
- targets = soup.find_all("span", class_="ttspan")
- for each in targets:
- company.append(each.text)
- # 主营产品
- products = []
- targets = soup.find_all("span", class_="clr")
- for each in targets:
- products.append(each.text)
- # 经营模式
- busmdl = []
- targets = soup.find_all("span", class_="clr")
- for each in targets:
- busmdl.append(each.text)
- # 成立时间
- estabtime = []
- targets = soup.find_all("span", class_="clr")
- for each in targets:
- estabtime.append(each.text)
- # 公司地址
- address = []
- targets = soup.find_all("span", class_="clr")
- for each in targets:
- address.append(each.text)
- result = []
- length = len(company)
- for i in range(length):
- result.append([company[i], products[i], busmdl[i], estabtime[i], address[i]])
- return result
复制代码
这样下来每个信息都会被同样的循环4遍导致输出内容一样,如何使用bs4找到特定的数据呢?
本帖最后由 歌者文明清理员 于 2023-6-1 14:53 编辑
爬取一次不就行了吗
- def find_company(res):
- soup = bs4.BeautifulSoup(res.text, 'html.parser')
- company = [each.text for each in soup.find_all("span", class_="ttspan")]
- ress = [each.text for each in soup.find_all("span", class_="clr")]
- products = ress[::4]
- busmdl = ress[1::4]
- estabtime = ress[2::4]
- address = ress[3::4]
- result = [list(i) for i in zip(company, products, busmdl, estabtime, address)]
- return result
复制代码
|
|