|
100鱼币
本帖最后由 mskyer 于 2017-12-2 18:33 编辑
我用wordpress搭建了一个电子书的分享网站http://kindleshare.cn,依靠python进行更新
主要是把网上下载的电子书放到空间里去,然后 用walk.py来爬取文件名和文件路径
walk.py内容如下:
- # -*- coding: utf-8 -*-
- import os
- count = 0
- path = raw_input("input path:")
- for root, dirs, files in os.walk(path):
-
-
- for file in files:
- count += 1
- print os.path.join(root, file)
- with open('bookinfo.txt', 'a') as fp:fp.write(os.path.join(root, file) +'\r\n')
- fp.close()
- print count
复制代码
然后根据gengxin.py来读取书籍信息:
- #-*-coding:utf-8-*-
- from function import *
- if __name__ == '__main__':
-
- i = int(input("请输入要更新文章的数目:"))
- while i > 0:
- f = open("tmp_book.txt","r",encoding='UTF-8')
-
- books = f.readline()
- url = books.replace('\r\n','').replace("/www/wwwroot/","http://kindleshare.cn/").replace('\n','')
- book = books.replace('/n','')
- (filepath,book_name) = os.path.split(book)
- (file_name,book_detail) = os.path.splitext(book_name)
- lines = f.readlines()
- f = open("tmp_book.txt","w",encoding='UTF-8')
- for line in lines:
- f.write(line)
- print(book_name)
- try:
- dizhi = search(file_name)
- (info , tagg) = get_single_book_data(dizhi)
- tagg.append(book_detail.replace('.',''))
-
- info += "\n\n"+"<span style="font-size: 18pt";>"+"<a href=" + """+ str(url).replace('\n','')+ """ + ">点击下载</a></span>"
- print(info)
- for s in tagg:
- print(s)
-
-
- fabu(file_name,info,list(map(str,tagg)))
- print ("++++++++++++++++++" + "电子书《" + book_name + "》" + "发布成功" + "++++++++++++++++++")
- except Exception as e:
- with open('Errors.txt', 'a') as fp:fp.write(book)
-
- print(e)
- i = i - 1
复制代码
那么问题来了,我现在需要完善两个函数,一个是search(file_name),可以根据文件名来搜索对应电子书在豆瓣上的链接
第二个函数就是get_single_book_data(dizhi),它是需要根据第一个函数的结果来采集电子书的相关信息。
我自己写了一遍,但是效果不好,正确率不是很高,看有没有高手能帮一下忙哈
|
|