|  | 
 
100鱼币 
| 本帖最后由 mskyer 于 2017-12-2 18:33 编辑 
 我用wordpress搭建了一个电子书的分享网站http://kindleshare.cn,依靠python进行更新
 主要是把网上下载的电子书放到空间里去,然后 用walk.py来爬取文件名和文件路径
 walk.py内容如下:
 
 复制代码# -*- coding: utf-8 -*-
import os
count = 0
path = raw_input("input path:")
for root, dirs, files in os.walk(path):
 
    
  for file in files:
    count += 1
    print os.path.join(root, file)
    with open('bookinfo.txt', 'a') as fp:fp.write(os.path.join(root, file) +'\r\n')
    fp.close()
print count
 然后根据gengxin.py来读取书籍信息:
 
 复制代码#-*-coding:utf-8-*-
from function import *
if __name__ == '__main__':
 
  i = int(input("请输入要更新文章的数目:"))
  while i > 0:
    f = open("tmp_book.txt","r",encoding='UTF-8')
    
    books = f.readline()
    url = books.replace('\r\n','').replace("/www/wwwroot/","http://kindleshare.cn/").replace('\n','')
    book = books.replace('/n','')
    (filepath,book_name) = os.path.split(book)
    (file_name,book_detail) = os.path.splitext(book_name)
    lines = f.readlines()        
    f = open("tmp_book.txt","w",encoding='UTF-8')
    for line in lines:
        f.write(line)
    print(book_name)
    try:
        dizhi = search(file_name)
        (info , tagg) = get_single_book_data(dizhi)
        tagg.append(book_detail.replace('.',''))
    
        info += "\n\n"+"<span style="font-size: 18pt";>"+"<a href=" + """+ str(url).replace('\n','')+ """  + ">点击下载</a></span>"
        print(info)
        for s in tagg:
            print(s)
        
       
        fabu(file_name,info,list(map(str,tagg)))
        print ("++++++++++++++++++" + "电子书《" + book_name + "》" + "发布成功" + "++++++++++++++++++")
    except Exception as e:
        with open('Errors.txt', 'a') as fp:fp.write(book)
        
        print(e)
    i = i - 1
 那么问题来了,我现在需要完善两个函数,一个是search(file_name),可以根据文件名来搜索对应电子书在豆瓣上的链接
 第二个函数就是get_single_book_data(dizhi),它是需要根据第一个函数的结果来采集电子书的相关信息。
 我自己写了一遍,但是效果不好,正确率不是很高,看有没有高手能帮一下忙哈
 
 | 
 |