Python调用encrypt命令后依然显示PdfReadError: File has not been decrypted,Python交流,编程语言专区,鱼C论坛

大麦1 发表于 2021-8-6 23:27:04

Python调用encrypt命令后依然显示PdfReadError: File has not been decrypted

最近在做《Python编程快速上手-让繁琐工作自动化》第13章13.6.1PDF偏执狂习题，题目如下：利用第9 章的os.walk()函数编写一个脚本，遍历文件夹中的所有PDF（包含子文件夹），用命令行提供的口令对这些PDF 加密。用原来的文件名加上_encrypted.pdf后缀，保存每个加密的PDF。在删除原来的文件之前，尝试用一个程序读取并解密该文件，确保它被正确的加密。然后编写一个程序，找到文件夹中所有加密的PDF 文件（包括它的子文件夹），利用提供的口令，创建PDF 的解密拷贝。如果口令不对，程序应该打印一条消息，并继续处理下一个PDF 文件。
我的程序如下（用jupyter notebook)：
第一步，加密：程序正常运行，生成了后缀为'_encrypted.pdf‘的文件
import os,PyPDF2
for foldername,subforders,filenames in os.walk(r'D:\python_workdzx\automate_boring_stuff'):
for file in filenames:
   if not file.endswith('.pdf'):
         continue
   pdfFileobj=open(os.path.join(foldername,file),'rb')
   pdfreader=PyPDF2.PdfFileReader(pdfFileobj)
   pdfWriter=PyPDF2.PdfFileWriter()
   for i in range(pdfreader.numPages):
         pdfWriter.addPage(pdfreader.getPage(i))
   #encrypt the file
   pdfWriter.encrypt('entrypted')
   newname=file.split('.pdf')+'_encrypted.pdf'
   with open(os.path.join(foldername,newname),'wb') as f:
         pdfWriter.write(f)
   pdfFileobj.close()
第二步，解密，程序报错，说没有解密PdfReadError: File has not been decrypted

import os,PyPDF2
for foldername,subforders,filenames in os.walk(r'D:\python_workdzx\automate_boring_stuff'):
for file in filenames:
   if not file.endswith('_encrypted.pdf'):
         continue
   f1=open(os.path.join(foldername,file),'rb')
   pdfreader=PyPDF2.PdfFileReader(f1)
   try:
         pdfreader.decrypt('entrypted')
   except UnicodeEncodeError:
         print('the passport is incorrect.')
   else:
         pdfwriter=PyPDF2.PdfFileWriter()
         for i in range(pdfreader.numPages):
            pdfwriter.addPage(pdfreader.getPage(i))
         newname=file.split('_encrypted')+'_decrypted.pdf'
         with open(os.path.join(foldername,newname),'wb') as f:
            pdfwriter.write(f)

   f1.close()

问题是我明明已经解密了，而且单个文件加密解密也是没问题的，为什么循环后就出问题了？错误码如下：

UnicodeEncodeError                      Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in getNumPages(self)
1146             self._override_encryption = True
-> 1147             self.decrypt('')
1148             return self.trailer["/Root"]["/Pages"]["/Count"]

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in decrypt(self, password)
1986       try:
-> 1987          return self._decrypt(password)
1988       finally:

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in _decrypt(self, password)
2016                      new_key += b_(chr(utils.ord_(key) ^ i))
-> 2017                   val = utils.RC4_encrypt(new_key, val)
2018             userpass = val

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\utils.py in RC4_encrypt(key, plaintext)
180       t = S[(S + S) % 256]
--> 181       retval += b_(chr(ord_(plaintext) ^ t))
182 return retval

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\utils.py in b_(s)
237       else:
--> 238          r = s.encode('latin-1')
239          if len(s) < 2:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u016c' in position 0: ordinal not in range(256)

During handling of the above exception, another exception occurred:

PdfReadError                            Traceback (most recent call last)
<ipython-input-6-042e54025b54> in <module>
12       else:
13          pdfwriter=PyPDF2.PdfFileWriter()
---> 14          for i in range(pdfreader.numPages):
15             pdfwriter.addPage(pdfreader.getPage(i))
16          newname=file.split('_encrypted')+'_decrypted.pdf'

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in <lambda>(self)
1156          return len(self.flattenedPages)
1157
-> 1158 numPages = property(lambda self: self.getNumPages(), None, None)
1159 """
1160 Read-only property that accesses the

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in getNumPages(self)
1148             return self.trailer["/Root"]["/Pages"]["/Count"]
1149          except:
-> 1150             raise utils.PdfReadError("File has not been decrypted")
1151          finally:
1152             self._override_encryption = False

PdfReadError: File has not been decrypted

txxcat 发表于 2021-8-7 02:30:28

这个问题和循环没关系，是PyPDF的问题，对于一些加密的PDF文件，使用了正确密码，就象你的pdfreader.decrypt('entrypted')，返回值是1，没有报错，但实际上并没有解密，所以下一句中pdfreader.numPages就会导致“PdfReadError: File has not been decrypted”错误，这个问题靠PyPDF来说是无解，人家好久没更新了，恐怕是不会修正这个bug了。
如果只是作业，就pass这题吧。如果实在需要，网上的解决方法是下载一个第三方的命令行工具qpdf在python里调用解密PDF文件然后再进行下一步处理，貌似是在linux下的，windows下不知道有没有这个工具。

大麦1 发表于 2021-8-9 20:38:27

txxcat 发表于 2021-8-7 02:30
这个问题和循环没关系，是PyPDF的问题，对于一些加密的PDF文件，使用了正确密码，就象你的pdfreader.decryp ...

好吧，PyPDF功能不太好用，今天试了下图片另存为pdf文件，里面的文字也提取不出来，只能提取本来就可以直接复制的文字。

页: [1]

鱼C论坛's Archiver

Python调用encrypt命令后依然显示PdfReadError: File has not been decrypted