|
20鱼币
requtes爬取天气后获取如下图dataframe,要求将【b】列中数据分割成【天气,最高温度,最低温度】三列。
- import bs4
- import requests
- import numpy as np
- import pandas as pd
- url = 'http://m.apporid.com/xian/1yue.html'
- headers = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, '
- 'like Gecko) Chrome/91.0.4472.164 Mobile Safari/537.36',
- 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,'
- '*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 ',
- }
- res = requests.get(url, headers=headers)
- temperature = []
- soup = bs4.BeautifulSoup(res.text, "html.parser")
- targets = soup.find("ul", class_="tqullist tqicon28x20")
- targets = targets.find_all('li')
- for each in targets:
- temperature.append(each.text.split())
- t1 = np.array(temperature).reshape(len(temperature), 2)
- df = pd.DataFrame(t1, columns=["日期", "b"])
- print(df)
复制代码
图片
不要按下标索引分组方法。
df.b.str.split('/', expand=True) #通常可以。 可你这前面中文天气和低温连在了一起咋分都不方便。。看看有没有正则高手,可以帮你掰开,哈哈
话说,你直接爬取的时候,分好3列,不好些吗,比如
- import bs4, requests, re
- import numpy as np
- import pandas as pd
- url = 'http://m.apporid.com/xian/1yue.html'
- res = requests.get(url,)
- temperature = []
- soup = bs4.BeautifulSoup(res.text, "html.parser")
- targets = soup.find("ul", class_="tqullist tqicon28x20")
- dt = targets.find_all('b')
- tq = targets.find_all('div')
- temp = targets.find_all('span')
- for d, i, j in zip(dt, tq, temp):
- temperature.append([d.text, i.text.strip(), j.text])
- # print([d.text, i.text.strip(), j.text])
- t1 = np.array(temperature).reshape(len(temperature), 3)
- df = pd.DataFrame(t1, columns=["日期", "天气", "温度"])
- print(df)
复制代码
|
最佳答案
查看完整内容
df.b.str.split('/', expand=True) #通常可以。 可你这前面中文天气和低温连在了一起咋分都不方便。。看看有没有正则高手,可以帮你掰开,哈哈
话说,你直接爬取的时候,分好3列,不好些吗,比如
|