齐大胖 发表于 2020-11-29 17:42:07

读取pdf文件出错:AttributeError: 'PDFDocument' object has no attribute 'seek'

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfdevice import PDFDevice
from pdfminer.pdfinterp import PDFResourceManager,PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LTTextBoxHorizontal,LAParams
from pdfminer.pdfpage import PDFTextExtractionNotAllowed

pdf0 = open('E:\\齐大胖\\参考文献\\仅由两条供气管线驱动的气动双腔细管尺蠖机构.pdf','rb')

parser = PDFParser(pdf0)
doc = PDFDocument(parser)

parser.set_document(doc)


resources = PDFResourceManager()
laparam = LAParams()

device = PDFPageAggregator(resources,laparam)

interpreter = PDFPageInterpreter(resources,device)

for i,page in PDFPage.get_pages(doc):
    interpreter.process_page(page)
    layout = device.get_result()

    for out in layout:
      if hasattr(out,'get_text'):
            print(out.get_text())


Traceback (most recent call last):
File "E:\齐大胖\try.py", line 25, in <module>
    for i,page in PDFPage.get_pages(doc):
File "E:\PYTHON\lib\site-packages\pdfminer\pdfpage.py", line 120, in get_pages
    parser = PDFParser(fp)
File "E:\PYTHON\lib\site-packages\pdfminer\pdfparser.py", line 43, in __init__
    PSStackParser.__init__(self, fp)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 515, in __init__
    PSBaseParser.__init__(self, fp)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 169, in __init__
    self.seek(0)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 527, in seek
    PSBaseParser.seek(self, pos)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 199, in seek
    self.fp.seek(pos)
AttributeError: 'PDFDocument' object has no attribute 'seek'

笨鸟学飞 发表于 2020-11-29 19:35:00

Traceback (most recent call last):
File "E:\齐大胖\try.py", line 25, in <module>      #错误在第25行
    for i,page in PDFPage.get_pages(doc):         # 错误代码
File "E:\PYTHON\lib\site-packages\pdfminer\pdfpage.py", line 120, in get_pages
    parser = PDFParser(fp)
File "E:\PYTHON\lib\site-packages\pdfminer\pdfparser.py", line 43, in __init__
    PSStackParser.__init__(self, fp)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 515, in __init__
    PSBaseParser.__init__(self, fp)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 169, in __init__
    self.seek(0)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 527, in seek
    PSBaseParser.seek(self, pos)
File "E:\PYTHON\lib\site-packages\pdfminer\psparser.py", line 199, in seek
    self.fp.seek(pos)
AttributeError: 'PDFDocument' object has no attribute 'seek'# 属性错误:“PDFDocument”对象没有属性“seek”

齐大胖 发表于 2020-12-10 09:26:53

{:10_249:}
页: [1]
查看完整版本: 读取pdf文件出错:AttributeError: 'PDFDocument' object has no attribute 'seek'