python PDF 转 image

前言

最近项目需要 PDF 中提取内容, PDF 是扫描版, 想通过转成图片, 通过图像识别区分出段落, 然后进行 ocr 识别, 得到结构化数据

所以第一步需要搞定的就是 PDF 转图片了

环境: Mac 10.12.6 (16G29)

正文

安装依赖

注意 ImageMagick, 目前不支持最新的 7 版本, 所以只能装 6

brew install freetype
brew install GhostScript
brew install ImageMagick@6
brew link --overwrite ImageMagick@6
echo 'export MAGICK_HOME=/usr/local/opt/imagemagick@6'>> ~/.bash_profile
echo 'export PATH="$MAGICK_HOME/bin:$PATH"'>> ~/.bash_profile
pip install Wand

python 脚本

from wand.image import Image
# Converting first page into JPG
with Image(filename="/thumbnail.pdf[0]") as img:
     img.save(filename="/temp.jpg")

来源: https://www.qcloud.com/developer/article/1359230

与本文相关文章

python实现jpg转换pdf
《Python 机器学习经典实例》(中文 + 英文电子版 PDF + 源代码)
Python 3 极简教程.pdf
Python 应用 [PDF 处理 - pypdf2]
Python 解leetcode：48. Rotate Image
Python 图像处理 scikit-image 极简安装 2019
《Python 教程》pdf 下载
是程序员, 就用 python 导出 PDF

暂无,快来抢沙发吧！