马上注册,结交更多好友,享用更多功能^_^
您需要 登录 才可以下载或查看,没有账号?立即注册
x
如题,网页源代码如下:
我的代码
def find_company(res):
soup = bs4.BeautifulSoup(res.text, 'html.parser')
# 公司名
company = []
targets = soup.find_all("span", class_="ttspan")
for each in targets:
company.append(each.text)
# 主营产品
products = []
targets = soup.find_all("span", class_="clr")
for each in targets:
products.append(each.text)
# 经营模式
busmdl = []
targets = soup.find_all("span", class_="clr")
for each in targets:
busmdl.append(each.text)
# 成立时间
estabtime = []
targets = soup.find_all("span", class_="clr")
for each in targets:
estabtime.append(each.text)
# 公司地址
address = []
targets = soup.find_all("span", class_="clr")
for each in targets:
address.append(each.text)
result = []
length = len(company)
for i in range(length):
result.append([company[i], products[i], busmdl[i], estabtime[i], address[i]])
return result
这样下来每个信息都会被同样的循环4遍导致输出内容一样,如何使用bs4找到特定的数据呢?
本帖最后由 歌者文明清理员 于 2023-6-1 14:53 编辑
爬取一次不就行了吗def find_company(res):
soup = bs4.BeautifulSoup(res.text, 'html.parser')
company = [each.text for each in soup.find_all("span", class_="ttspan")]
ress = [each.text for each in soup.find_all("span", class_="clr")]
products = ress[::4]
busmdl = ress[1::4]
estabtime = ress[2::4]
address = ress[3::4]
result = [list(i) for i in zip(company, products, busmdl, estabtime, address)]
return result
|