找出B站最受欢迎的编程课程,《极客Python之效率革命》,Python交流,鱼C论坛

逃兵发表于 2020-9-9 18:02:36

朕想知道

Sasuke1989 发表于 2020-9-9 21:32:27

吱吱吱吱

opengjie 发表于 2020-9-16 18:30:17

为什么照着抄，在IDLE的交互页面看不到结果。。。。
res.text是有输出，但是过滤后的却是一直不现实。就是是包装到文件中

opengjie 发表于 2020-9-16 18:51:36

你好，小甲鱼，我根据这个来写，为什么不管在交互页面还是pycha，都没办法把数据读出来。
另外，直接copy你的全部完整的代码，弄出来的表格都是空的？为啥？

import requests
import bs4

res = requests.get("https://search.bilibili.com/all?keyword=编程&order=totalrank&duration=0&tids_1=0")
print(res.text)
soup = bs4.BeautifulSoup(res.text,"html.parser")
titles = soup.find_all("li",class_="video matrix")
for each in titles:
print(each.a["title"])

opengjie 发表于 2020-9-16 18:57:55

原来是因为B站的class的标签名字改了。。。。。
改成下面就可以了。
诶，死读书的人。。。。。
titles = soup.find_all("li",class_="video-item matrix")

opengjie 发表于 2020-9-18 15:05:31

好多不懂：
请大神指教
1：
tags = video.select("div > span")这个是什么意思？select是一种什么方法？
   for tag in tags:
         datas.append(''.join(tag.text.split())) 这个更不懂？空格后的.点的方法是join？内部还有带切片？
2：
"s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
return zip(**n) 这个直接懵逼，怎么去理解这个？

3：这个更不懂
for page in range(1, pages+1):
         url = "https://search.bilibili.com/all?keyword={}&order={}&duration=4&tids_1=36&page={}".format(keyword, order, page)
         text = get_html(url)
         datas = get_datas(text)
         # 为每种排序创建一个文本文件单独存放
         with open(order_name+'.txt', 'a', encoding="utf-8") as file:
            for video_title, video_URL, video_watch, video_dm, video_time, video_up in grouped(datas, 6):
               file.write(' + '.join())
               index += 1
         # 做一只善意的爬虫，不要给服务器带来负担
         time.sleep(1)
求指导啊

wrw5192 发表于 2020-9-19 10:31:11

崴，幺幺玲嘛，有人再装b，场面快控制不住了
{:10_279:}

okkboss 发表于 2020-9-20 18:26:24

这

种花家兔子 发表于 2020-9-24 14:25:37

1111

agusth 发表于 2020-9-27 17:44:55

get_tags

qq1484730945 发表于 2020-10-23 17:14:47

知道

qq1484730945 发表于 2020-10-23 17:24:25

opengjie 发表于 2020-9-18 15:05
好多不懂：
请大神指教
1：

啊。老哥懂了吗，你第一个问题，Select是那个Bs4模块里面用来解析网页的方法，就是解析网页，得到想要的内容，你可以去看看小龟龟的文章，你去过一遍，就知道大致了，熟练还得多去动手，

第二个问题，第一个是join拼接字符串，然后里面的内容再试空白字符作为分隔符对字符串进行分割

第三个问题，啊，我也不太懂，不过好像Zip（）是用来返回各个可迭代参数共同组成的元祖

第四个问题。你不单独存放那岂不是文件的内容挤在一个文件了，
它就是创建4个文件，把每个对应的文件的内容放到里面，
{:10_256:}{:10_256:}至于代码嘛，只能自己多敲多看才能懂

qq1484730945 发表于 2020-10-23 17:26:14

{:10_257:}{:10_257:}{:10_257:}{:10_254:}{:10_254:}为什么看着小龟龟的代码都好复杂啊，这是我的，，，，，import bs4
import requests
import json

def get_url(url):
res = requests.get(url)
res = requests.get(url).text

soup = bs4.BeautifulSoup(res,"html.parser")

titles = soup.find_all("li",class_="video-item matrix")
b = ""
for i in titles:
   b += "视频名字:" + i.a["title"] + "---->" # 视频名字
   #print(i.a["title"])

   b += "视频地址:"+ i.a["href"] + "---->" # 视频连接
   #print(i.a["href"])

   b += "时长:" + i.find_all("span").text.strip() + "---->" # 视频时长
   #print(i.find_all("span").text.strip())

   b += "标签:" + i.find_all("span").text.strip() + "---->"
   #print(i.find_all("span").text.strip())

   b += "播放数量:" + i.find_all("span").text.strip() + "---->"
   #print(i.find_all("span").text.strip())

   b += "弹幕数量:" + i.find_all("span").text.strip() + "---->"
   #print(i.find_all("span").text.strip())

   b += "上传时间:" + i.find_all("span").text.strip() + "---->"
   #print(i.find_all("span").text.strip())

   b += "up主名字:" + i.find_all("span").text.strip() + "\n\n\n"
   #print(i.find_all("span").text.strip())

return b

def main():
key = input("请输入要搜索的关键词：")
yeshu = int(input("请输入要搜索多少页(请输入整数):"))
conut = 0
name_1 = ["最多点击","最新发布","最多弹幕","最多收藏"]
name = ["&order=click&duration=4&tids_1=0","&order=pubdate&duration=4&tids_1=0","&order=dm&duration=4&tids_1=0","&order=stow&duration=4&tids_1=0"]
# 注意for 循环用来迭代的对象名不要重复名了
for i in range(1,yeshu+1):# 因为range从0开始的话会网页错误，所有开头为1
   print("正在爬取中............当前第%d页"%i)
   for j in name_1:

         url = "https://search.bilibili.com/all?keyword=%s%s&page=%s"%(key,name,i)
         b = get_url(url)
         with open("%s.txt"%j,"a",encoding = "utf-8") as f:
            for each in b:
               f.write(each)
         conut +=1

   conut = 0 # 变量返回为0，用来查列表下标
print("程序结束")

if __name__ == "__main__":
main()
# i.find_all("span").text.strip()
# 根据要求查找1234567个下标的内容
""" 下标0 = 时长
下标1 = 无
下标2 = 搜索标签？
下标3 = 观看数量
下标4 = 弹幕数量
下标5 = 上传时间
下标6 = up主名字
"""

嘉岳呀 发表于 2020-10-31 17:32:26

朕想知道

a761530700 发表于 2020-11-6 11:30:53

{:5_109:}{:5_109:}{:5_109:}{:5_109:}{:5_109:}{:5_109:}{:5_109:}{:5_109:}

叼辣条闯世界 发表于 2020-11-8 09:19:39

11223344

zzayy0120 发表于 2020-11-11 10:57:15

白影如光 发表于 2020-11-17 21:50:35

学习

Raymand 发表于 2020-12-2 23:14:15

{:5_109:}

小健不会Python 发表于 2020-12-3 20:17:17

朕想知道

页: 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16

鱼C论坛's Archiver