闲来无事写了一个爬bilibili的代码，看自己学了多少时长,Python交流,编程语言专区,鱼C论坛

Daniel_Zhang 发表于 2021-1-7 04:13:24

闲来无事写了一个爬bilibili的代码，看自己学了多少时长

本帖最后由 Daniel_Zhang 于 2021-1-8 21:27 编辑

模块在此：

文件名 get_data_set.py
注意修改一下 LOCATION 变成自己的文件夹
import pickle, os
import easygui as g
import get_total_time as data

LOCATION = os.getcwd() + '/html_css_js_flask/learning_progress_count/'

url = input('enter the url address\n')

my_time = []
my_time = data.get_durations(url)

# make a new binary file to store the data in list my_time
def add_data_set(my_time):
# auto create a new file if it not exists, or write into the file directly if the file exists
pickle_file = open(LOCATION + 'html_learn_progress.testing','wb') # wb is write binary, do not mind the file name, it can be anything
pickle.dump(my_time,pickle_file) # dump the list into the file
pickle_file.close()

def data_set_read():
if __name__ != '__main__':
   add_data_set(my_time)
pickle_file = open(LOCATION + 'html_learn_progress.testing','rb') # rb is read binary
my_list2 = pickle.load(pickle_file) # load the binary data
if __name__ == '__main__':
   print(my_list2) # show the data
length_data_set = len(my_list2)
g.msgbox(msg = 'data set insert successful :)' + '\n\n' + 'total insert: ' + str(length_data_set),title='System Warning',ok_button='Get it !')
return my_list2

if __name__ == '__main__':
add_data_set(my_time)
data_set_read()

又是一个模块，爬虫模块，自动获取数据，文件名: get_total_time.py
import re, ssl
import requests

def open_url(url):
# encoding: utf-8

headers = {
'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
'Accept': 'text/html',
'Cookie': "_uuid=1DBA4F96-2E63-8488-DC25-B8623EFF40E773841infoc; buvid3=FE0D3174-E871-4A3E-877C-A4ED86E20523155831infoc; LIVE_BUVID=AUTO8515670521735348; sid=l765gx48; DedeUserID=33717177; DedeUserID__ckMd5=be4de02fd64f0e56; SESSDATA=cf65a5e0%2C1569644183%2Cc4de7381; bili_jct=1e8cdbb5755b4ecd0346761a121650f5; CURRENT_FNVAL=16; stardustvideo=1; rpdid=|(umY))|ukl~0J'ulY~uJm)kJ; UM_distinctid=16ce0e51cf0abc-02da63c2df0b4b-5373e62-1fa400-16ce0e51cf18d8; stardustpgcv=0606; im_notify_type_33717177=0; finger=b3372c5f; CURRENT_QUALITY=112; bp_t_offset_33717177=300203628285382610"

}
f = open('testing_new.txt','w')
ssl._create_default_https_context = ssl._create_unverified_context
html = requests.get(url,headers=headers).text # 获取url内容
f.write(html) # 写入 url内容到文件，决定如何写下面的正则表达式
f.close()
return html

def get_durations(url):
html = open_url(url)
m = r'"cid":.+?,'
match = str(re.findall(m, html))
match = match.split(':')[-1]
match = match.split(',') # 获得第一个视频的 cid 用来辅助获取完整的播放列表
p = r'\[{"cid":' + match + '.+?]'
pic = re.findall(p, html)# 获取完整的播放列表
final_result = []
q = r'"duration":.+?,'
pic = str(pic)
duration = re.findall(q, pic) # 获取每一个视频的播放时长的列表（此处包含了 class 名称，需要进一步进行处理）
duration = str(duration)
y = r':.+?,'
time_get = re.findall(y, duration)# 获得每一个视频的播放时长列表（进一步进行处理）

for each in range(len(time_get)): # 清除所有不必要的内容
   time_get = time_get.split(':')[-1]
   time_get = time_get.split(',')
   temp = time_get
   final_result.append()# 将时间转换成分钟:秒
return final_result

if __name__ == '__main__':
url = input('enter\n')
#open_url(url)
get_durations(url)

主程序在此，文件名 time_adding.py
import pickle
import easygui as g
import get_data_set as data_set

sum_second = 0# time already used, initial be zero

# calculate the progress
def calculate(my_list2):
global sum_second
total_time = 0
for each in range(len(my_list2)):
   total_time += 60 * int(my_list2) + int(my_list2)
   if each < already_take:
         sum_second += 60 * int(my_list2) + int(my_list2)
string1 = "hours already take: " + str(sum_second/(60*60)) +' / ' + str(total_time/(60*60))
string2 = "current percentage: " + str((sum_second/total_time) * 100) + ' %'
#g.msgbox(msg = string1 + '\n\n' + string2,title='System Warning',ok_button='Get it !')
g.msgbox(msg = string1 + '\n\n' + string2,title='System Warning',ok_button='Get it !')

if __name__ == "__main__":
already_take = input('how many unit you already take up to now?\n')
try:
   already_take = int(already_take)
except ValueError:
   g.msgbox(msg = 'seems enter is wrong, please check it and enter again!', title='System Warning',ok_button='Get it !')
   exit()
my_list2 = data_set.data_set_read()
calculate(my_list2)

p.s 三个文件请放在同一个文件夹下面
只需要输入指定的 bilibili 视频链接，含有 BV号或者 a v 号的那个链接，以及自己学完了多少个章节（学完了第一讲就输入1，以此类推）

Daniel_Zhang 发表于 2021-1-7 04:15:35

另外发现一个挺有意思的api，可以用BV号和cid获取需要的东西，文章链接如下：

https://www.bilibili.com/read/cv5245087

三柒得拾 发表于 2021-1-7 08:50:56

向楼主学习啊，我才刚开始学，不知道什么时候可以有这么厉害

hornwong 发表于 2021-1-7 10:51:19

{:5_95:}

19924269098 发表于 2021-1-7 11:21:15

学习

守望星星 发表于 2021-1-7 11:53:50

{:10_254:}

愷龍发表于 2021-1-7 13:19:59

学习一下

a64021a 发表于 2021-1-7 16:51:12

刚学习，希望向楼主一样厉害

Daniel_Zhang 发表于 2021-1-7 17:14:26

提高了一点中奖概率，没有获得鱼币的可以多尝试几次呀{:10_254:}

小古比鱼 发表于 2021-1-7 18:34:46

{:10_254:}

luckyboyzzz 发表于 2021-1-7 22:12:46

刚学习，好多看不懂

Daniel_Zhang 发表于 2021-1-7 22:16:05

luckyboyzzz 发表于 2021-1-7 22:12
刚学习，好多看不懂

加油，等你认真学完了小甲鱼的课程，相信你一定能看懂的~我也才刚学完没多久

Daniel_Zhang 发表于 2021-1-8 21:27:35

别沉啊，还有币呢，改了爆率了{:10_245:}

石泊远 发表于 2021-1-8 22:08:58

学习

Daniel_Zhang 发表于 2021-1-10 03:46:50

别沉啊，还有鱼币{:10_266:}

sinaop 发表于 2021-1-10 09:54:09

{:5_111:}

昨非发表于 2021-1-10 17:59:47

捞一手

呆萌的月饼不呆 发表于 2021-1-10 21:19:22

{:10_257:}

红桃J 发表于 2021-1-10 23:01:06

只学会了print

zgwzgw 发表于 2021-1-13 20:20:04

学习

页: [1] 2

鱼C论坛's Archiver

闲来无事写了一个爬bilibili的代码，看自己学了多少时长