Python调用encrypt命令后依然显示PdfReadError: File has not been decrypted

大麦1 · 发表于 2021-8-6 23:27:04

马上注册，结交更多好友，享用更多功能^_^

您需要登录才可以下载或查看，没有账号？立即注册

x

最近在做《Python编程快速上手-让繁琐工作自动化》第13章13.6.1PDF偏执狂习题，题目如下：利用第9 章的os.walk()函数编写一个脚本，遍历文件夹中的所有PDF（包含子文件夹），用命令行提供的口令对这些PDF 加密。用原来的文件名加上_encrypted.pdf后缀，保存每个加密的PDF。在删除原来的文件之前，尝试用一个程序读取并解密该文件，确保它被正确的加密。然后编写一个程序，找到文件夹中所有加密的PDF 文件（包括它的子文件夹），利用提供的口令，创建PDF 的解密拷贝。如果口令不对，程序应该打印一条消息，并继续处理下一个PDF 文件。
我的程序如下（用jupyter notebook)：
第一步，加密：程序正常运行，生成了后缀为'_encrypted.pdf‘的文件
import os,PyPDF2
for foldername,subforders,filenames in os.walk(r'D:\python_workdzx\automate_boring_stuff'):
for file in filenames:
      if not file.endswith('.pdf'):
         continue
      pdfFileobj=open(os.path.join(foldername,file),'rb')
      pdfreader=PyPDF2.PdfFileReader(pdfFileobj)
      pdfWriter=PyPDF2.PdfFileWriter()
      for i in range(pdfreader.numPages):
         pdfWriter.addPage(pdfreader.getPage(i))
      #encrypt the file
      pdfWriter.encrypt('entrypted')
      newname=file.split('.pdf')[0]+'_encrypted.pdf'
      with open(os.path.join(foldername,newname),'wb') as f:
         pdfWriter.write(f)
      pdfFileobj.close()
第二步，解密，程序报错，说没有解密PdfReadError: File has not been decrypted

import os,PyPDF2
for foldername,subforders,filenames in os.walk(r'D:\python_workdzx\automate_boring_stuff'):
for file in filenames:
      if not file.endswith('_encrypted.pdf'):
         continue
      f1=open(os.path.join(foldername,file),'rb')
      pdfreader=PyPDF2.PdfFileReader(f1)
      try:
         pdfreader.decrypt('entrypted')
      except UnicodeEncodeError:
         print('the passport is incorrect.')
      else:
         pdfwriter=PyPDF2.PdfFileWriter()
         for i in range(pdfreader.numPages):
            pdfwriter.addPage(pdfreader.getPage(i))
         newname=file.split('_encrypted')[0]+'_decrypted.pdf'
         with open(os.path.join(foldername,newname),'wb') as f:
            pdfwriter.write(f)

      f1.close()

问题是我明明已经解密了，而且单个文件加密解密也是没问题的，为什么循环后就出问题了？错误码如下：

UnicodeEncodeError                      Traceback (most recent call last)
D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in getNumPages(self)
1146                self._override_encryption = True
-> 1147                self.decrypt('')
1148                return self.trailer["/Root"]["/Pages"]["/Count"]

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in decrypt(self, password)
1986       try:
-> 1987          return self._decrypt(password)
1988       finally:

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in _decrypt(self, password)
2016                      new_key += b_(chr(utils.ord_(key[l]) ^ i))
-> 2017                   val = utils.RC4_encrypt(new_key, val)
2018                userpass = val

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\utils.py in RC4_encrypt(key, plaintext)
180       t = S[(S + S[j]) % 256]
--> 181       retval += b_(chr(ord_(plaintext[x]) ^ t))
182    return retval

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\utils.py in b_(s)
237       else:
--> 238          r = s.encode('latin-1')
239          if len(s) < 2:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u016c' in position 0: ordinal not in range(256)

During handling of the above exception, another exception occurred:

PdfReadError                            Traceback (most recent call last)
<ipython-input-6-042e54025b54> in <module>
   12       else:
   13          pdfwriter=PyPDF2.PdfFileWriter()
---> 14          for i in range(pdfreader.numPages):
   15                pdfwriter.addPage(pdfreader.getPage(i))
   16          newname=file.split('_encrypted')[0]+'_decrypted.pdf'

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in <lambda>(self)
1156          return len(self.flattenedPages)
1157
-> 1158    numPages = property(lambda self: self.getNumPages(), None, None)
1159    """
1160    Read-only property that accesses the

D:\ProgramData\Anaconda3\lib\site-packages\PyPDF2\pdf.py in getNumPages(self)
1148                return self.trailer["/Root"]["/Pages"]["/Count"]
1149          except:
-> 1150                raise utils.PdfReadError("File has not been decrypted")
1151          finally:
1152                self._override_encryption = False

PdfReadError: File has not been decrypted

txxcat · 发表于 2021-8-7 02:30:28

这个问题和循环没关系，是PyPDF的问题，对于一些加密的PDF文件，使用了正确密码，就象你的pdfreader.decrypt('entrypted') ，返回值是1，没有报错，但实际上并没有解密，所以下一句中pdfreader.numPages就会导致“PdfReadError: File has not been decrypted”错误，这个问题靠PyPDF来说是无解，人家好久没更新了，恐怕是不会修正这个bug了。
如果只是作业，就pass这题吧。如果实在需要，网上的解决方法是下载一个第三方的命令行工具qpdf在python里调用解密PDF文件然后再进行下一步处理，貌似是在linux下的，windows下不知道有没有这个工具。

大麦1 · 发表于 2021-8-9 20:38:27

txxcat 发表于 2021-8-7 02:30
这个问题和循环没关系，是PyPDF的问题，对于一些加密的PDF文件，使用了正确密码，就象你的pdfreader.decryp ...

好吧，PyPDF功能不太好用，今天试了下图片另存为pdf文件，里面的文字也提取不出来，只能提取本来就可以直接复制的文字。

账号		自动登录	找回密码
密码			立即注册