|
楼主 |
发表于 2023-7-18 21:40:51
|
显示全部楼层
from pdf2image import convert_from_path
import pytesseract
from PIL import Image
import io
# 你的 PDF 文件路径
pdf_path = 'f:\\'+input('请输入要转换的文件名')+'.pdf'
# 将 PDF 文件转换为 PIL Image 对象列表
images = convert_from_path(pdf_path)
# 初始化一个空字符串用于存储文本
result_text = ''
# 遍历所有的图片
for i, img in enumerate(images):
# 将图片转化为文本
text = pytesseract.image_to_string(img, lang='chi_sim') # 使用'chi_sim'参数进行中文识别
# 将识别后的文本添加到结果中
result_text += text
# 将结果存储到 txt 文件中
with open('f:\\'+input('请输入要保存的文件名')+'.txt', 'w', encoding='utf-8') as file:
file.write(result_text)
错误信息
请输入要转换的文件名456
Traceback (most recent call last):
File "C:\Users\ssq\AppData\Roaming\Python\Python39\site-packages\pdf2image\pdf2image.py", line 568, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "C:\Program Files (x86)\Python39-32\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files (x86)\Python39-32\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python\pdf2ocr.py", line 10, in <module>
images = convert_from_path(pdf_path)
File "C:\Users\ssq\AppData\Roaming\Python\Python39\site-packages\pdf2image\pdf2image.py", line 127, in convert_from_path
page_count = pdfinfo_from_path(
File "C:\Users\ssq\AppData\Roaming\Python\Python39\site-packages\pdf2image\pdf2image.py", line 594, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
进程已结束,退出代码1
|
|