Python 爬取论坛徽章

qiuyouzhi · 发表于 2020-3-27 10:48:47

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

本帖最后由 qiuyouzhi 于 2020-3-27 10:50 编辑

Python 爬取论坛徽章

昨天看了一下，其他的表情都很好爬，那个阿狸的，

就是ali1,ali2，写个循环遍历下载下来就行，这里不多赘述。

用到的模块：

pypinyin，因为徽章都是中文的，直接用徽章拼音命名。
requests: 不多说，获取网页神器。
bs4：分析网页，提取数据。

直接说思路：

先导入该用的模块，并写出下载网页的函数：

from pypinyin import lazy_pinyin as l # 用lazy_pinyin，去掉声调
from requests import get
from bs4 import BeautifulSoup as BS
def open_url(url):
res = get(url) # 我们亲爱的鱼C不需要User-Agent
return res

复制代码

分析网页：

（电脑渣，这里不放图片

）

翻翻源代码，可以发现：

<p>活跃小鱼</p>
<p class="mtn">
自主申请
</p>
</div>
</div>
<div id="medal_34" class="mg_img" onmouseover="showMenu({'ctrlid':this.id, 'menuid':'medal_34_menu', 'pos':'12!'});"><img src="static/image/common/huoyuexiaoyu.gif" alt="活跃小鱼" style="margin-top: 20px;width:auto; height: auto;" /></div>
<p class="xw1">活跃小鱼</p>
<p>
已拥有
</p>
</li>
<li>

复制代码

我去，全在<p>标签里面啊！

直接写代码：

def get_pinyin(name): # 获取拼音并保存
pinyin = [l(each) for each in name]
return pinyin
def zhizun():
url = 'https://fishc.com.cn/static/image/common/vip.gif'
res = open_url(url)
with open("zhizunvip.gif", "wb") as f:
f.write(res.content)
def find_name(res): # 找出来勋章的名字
name = []
soup = BS(res.text, "html.parser")
target = soup.find_all('p', class_='xw1')
for each in target:
name.append(each.text)
return name

复制代码

那个zhizun可能大家看不明白，是因为

勋章的名字叫做至尊VIP,而它的URL

则是vip（没有至尊），所以只能单独给它搞一个

（用切片也行）。

现在就是最后一步，保存图片！

def get_Img():
res = open_url("https://fishc.com.cn/home.php?mod=medal")
name = find_name(res)
pinyin = get_pinyin(name)
for each in pinyin:
each = "".join(each).lower() # each 不是字符串，要用 "".join(each)
url = 'https://fishc.com.cn/static/image/common/%s.gif' % each
print(url)
res = open_url(url)
with open(f'{each}.gif', 'wb') as f:
f.write(res.content)
zhizun()
if __name__ == "__main__":
get_Img()
print("DONE!")

复制代码

大功告成！

致谢

感谢zltzlt，没有他，我就会一直卡在get_Img()函数里。

（有点尴尬）

完整代码:

from pypinyin import lazy_pinyin as l
from requests import get
from bs4 import BeautifulSoup as BS
def open_url(url):
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
res = get(url, headers = headers)
return res
def get_pinyin(name): # 获取拼音并保存
pinyin = [l(each) for each in name]
return pinyin
def zhizun():
url = 'https://fishc.com.cn/static/image/common/vip.gif'
res = open_url(url)
with open("zhizunvip.gif", "wb") as f:
f.write(res.content)
def find_name(res): # 找出来勋章的名字
name = []
soup = BS(res.text, "html.parser")
target = soup.find_all('p', class_='xw1')
for each in target:
name.append(each.text)
return name
def get_Img():
res = open_url("https://fishc.com.cn/home.php?mod=medal")
name = find_name(res)
pinyin = get_pinyin(name)
for each in pinyin:
each = "".join(each).lower() # each 不是字符串，要用 "".join(each)
url = 'https://fishc.com.cn/static/image/common/%s.gif' % each
print(url)
res = open_url(url)
with open(f'{each}.gif', 'wb') as f:
f.write(res.content)
zhizun()
if __name__ == "__main__":
get_Img()
print("DONE!")

复制代码

qiuyouzhi · 发表于 2020-3-27 10:53:24

有些拿不到的，只能看着过过瘾了

账号		自动登录	找回密码
密码			立即注册

[技术交流] Python 爬取论坛徽章

马上注册，结交更多好友，享用更多功能^_^

评分

本帖被以下淘专辑推荐:

浏览过的版块