Ubuntu 安装部署PaddleOCR过程
安装部署参考的官方文档在这里https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/quick_start.html#211
1.搭建一个python的虚拟环境.
mkdir paddleocc
cd paddleocr
python -m venv .
cd bin
source activate
然后在命令行的提示语中就有(paddleocr)这个提示信息了.
2.安装相应的依赖
pip install --upgrade pip
pip install pysocks -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddleocr -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pymupdf -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyfftw -i https://pypi.tuna.tsinghua.edu.cn/simple
sudo apt install -y ccache
sudo apt install libgomp1
3.验证安装是否正确
paddleocr -h
4.转码pdf文件
paddleocr --image_dir ./2.pdf --use_angle_cls true --use_gpu false
paddleocr --image_dir ./2.pdf --use_angle_cls true --use_gpu false --savefile true
输出如下:
[2025/04/22 11:11:53] ppocr INFO: for usage help, please use
paddleocr --help
[2025/04/22 11:11:53] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, use_mlu=False, use_gcu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir='./2.pdf', page_num=0, det_algorithm='DB', det_model_dir='/home/hesy/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/home/hesy/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/media/hesy/Elements/python-all/paddleocr/lib/python3.11/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='/home/hesy/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, onnx_providers=False, onnx_sess_options=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, formula_algorithm='LaTeXOCR', formula_model_dir=None, formula_char_dict_path=None, formula_batch_num=1, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, formula=False, ocr=True, recovery=False, recovery_to_markdown=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='ch', det=True, rec=True, type='ocr', savefile=True, ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
[2025/04/22 11:11:54] ppocr INFO: ./2.pdf
最后生成的文件在output目录下,如下所示:
head 2.txt
[[[132.0, 132.0], [552.0, 132.0], [552.0, 211.0], [132.0, 211.0]],
('财富的真相', 0.9918341636657715)] [[[129.0, 232.0], [555.0, 233.0],
[555.0, 262.0], [129.0, 261.0]], ('一种学校不教却人人需要的知识',
0.9969227910041809)] [[[300.0, 326.0], [384.0, 326.0], [384.0, 348.0], [300.0, 348.0]], ('李笑来著', 0.9979466795921326)] [[[248.0, 942.0],
[309.0, 942.0], [309.0, 965.0], [248.0, 965.0]], ('WGS',
0.6117124557495117)] [[[320.0, 948.0], [436.0, 948.0], [436.0, 966.0], [320.0, 966.0]], ('广东经济出版社', 0.9972420930862427)]
出现的几个错误解决:
错误1:
ERROR: Could not install packages due to an OSError: Missing dependencies for SOCKS support.
WARNING: There was an error checking the latest version of pip.
解决办法:
unset all_proxy
unset ALL_PROXY
pip install pysocks -i https://pypi.tuna.tsinghua.edu.cn/simple
错误2:
运行paddleocr -h的时候报错,错误如下:
/home/hesy/paddleocr/lib/python3.11/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
解决办法:
sudo apt install -y ccache
错误3:
运行paddleocr -h的时候报错,错误如下:
ImportError: libgomp-24e2ab19.so.1.0.0: cannot open shared object file: No such file or directory
解决办法:
pip install pyfftw
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/paddleocr/lib/python3.11/site-packages/pyFFTW.libs