const 发布的文章

“const”

通义万相2.1在Ubuntu 22上部署使用

作者: const
时间: 2025-02-28
分类: Ubuntu
评论

在Ubuntu 22.04上部署使用,显示NVIDIA GeForce RTX 3090,显存24Gb,根据官方的说法,消费级的显卡就可以部署了.

官方模型下载地址:https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B
github开源地址:https://github.com/Wan-Video/Wan2.1

安装说明:

git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
pip install -r requirements.txt

在进行pip 安装的时候,会遇到安装编译 flash-attn半天没反应的问题.
Building wheel for flash-attn (setup.py)

在网上也有很多人有类似的问题,解决办法是去网站上面下载一个编译后的版本来得最快.

下载地址在这儿:https://github.com/Dao-AILab/flash-attention/releases

根据自己的设备情况来选择合适的版本:
https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

设备情况是指以下几项内容的版本,torch的版本,python的版本,cuda的版本,像上面的链接表示torch大于2.4,cuda是12,python是3.10就都可以.

查看torch版本的方法

import torch
print(torch.__version__)
2.6.0+cu124

查看cuda版本的方法如下,其实,上面也已经显示cuda版本为12.4了.

import torch
print(torch.version.cuda)
12.4

下载完成后,直接pip install 就可以了

pip install flash_attn-2.7.4.post1+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

保险起见,再次运行

pip install -r requirements.txt

接下来是下载模型,家用电脑上测试就只能体验1.3B的模型了(说是只要求8G的显存,实际使用的时候发现用的18G),14B的模型咱搞不了.

下载模型:

pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-1.3B --local_dir ./Wan2.1-T2V-1.3B

接下来慢慢等,这个包不大不小,有17G左右的样子

~/wan/Wan2.1-T2V-1.3B$ du -h -d1
252K    ./examples
6.3M    ./assets
20K    ./._____temp
21M    ./google
17G    .

等待模型都下载完成了,就可以开炉炼了.

python3 generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "纪实摄影风格, 中景镜头,一位20岁精致妆容的韩国美女，露出白皙的皮肤, 充满青春与活力。半身特写,锐利的边缘" --frame_num=81 --save_file=15.mp4

参数简单说明:
../Wan2.1-T2V-1.3B是模型的保存目录
frame_num是帧数,看下面的日志fps是16帧,不知道在哪里看的,说的帧数要是4n+1还是n+1,这个n就是指fps, 反正如果不指定这个帧数,默认就是81帧
save_file是要保存的文件名,如果不指定的话,生成的文件名老长了,就是指示词直接命名.
下面是程序的各种输出:

[2025-02-28 10:23:09,245] INFO: offload_model is not specified, set to True.
[2025-02-28 10:23:09,245] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='../Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=False, dit_fsdp=False, save_file='15.mp4', prompt='纪实摄影风格,中景镜头,一位20岁精致妆容的韩国美女，露出白皙的皮肤, 充满青春与活力。半身特写,锐利的边缘', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=6044133464219050096, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-02-28 10:23:09,246] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-02-28 10:23:09,246] INFO: Input prompt: 纪实摄影风格,中景镜头,一位20岁精致妆容的韩国美女，露出白皙的皮肤, 充满青春与活力。半身特写,锐利的边缘
[2025-02-28 10:23:09,246] INFO: Creating WanT2V pipeline.
[2025-02-28 10:24:11,819] INFO: loading ../Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-02-28 10:24:19,237] INFO: loading ../Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-02-28 10:24:19,634] INFO: Creating WanModel from ../Wan2.1-T2V-1.3B
[2025-02-28 10:24:21,254] INFO: Generating video ...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [08:03<00:00, 9.67s/it]
[2025-02-28 10:32:55,448] INFO: Saving generated video to 15.mp4
[2025-02-28 10:32:56,941] INFO: Finished.

用时8分3秒,只生成了个5秒钟的视频,效果如下:

ffmpeg -i 15.mp4 -hide_banner
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '15.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf61.1.100
  Duration: 00:00:05.06, start: 0.000000, bitrate: 4443 kb/s
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 832x480, 4440 kb/s, 16 fps, 16 tbr, 16384 tbn, 32 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc61.3.100 libx264

源视频如下:https://const.net.cn/ai/15.mp4

截图如下:
截图 2025-02-28 10-37-04.png

发现脸太大了,我们想要半身的视频,可以调整提示语.重新炼

python3 generate.py  --task t2v-1.3B --size 832*480 --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "纪实摄影风格, 中景镜头,一位20岁精致妆容的韩国美女，穿着白色的T-shirt,露出白皙的皮肤,充满青春与活力。半身特写,锐利的边缘" --frame_num=81 --save_file=16.mp4

在这个等待的时间,也可以看看显卡的占用情况

nvidia-smi 
Fri Feb 28 10:40:36 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:65:00.0 Off |                  N/A |
| 72%   65C    P2             348W / 350W |  18472MiB / 24576MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1282546      C   python3                                   18466MiB |
+---------------------------------------------------------------------------------------+

发现显存确实是占用了18G.

生成的视频如下:https://const.net.cn/ai/16.mp4
视频截图:

截图 2025-02-28 10-50-17.png
不想自己搭建模型的,也可以在阿里的网站和app体验,用手机注册就可以了,每天登录有50灵感值,生成一个视频会扣除一定的灵感值,手机使用通义app每天也可以加50灵感值,体验一下应该是够了.官网地址:https://tongyi.aliyun.com/wanxiang/creation

使用Atlas 推理系列产品时，DVPP视频解码性能问题

作者: const
时间: 2025-03-20
分类: 云笔记
评论

使用Atlas 推理系列产品时，视频解码性能问题
update:2025-3-20
问题现象描述
业务场景：DVPP
适用处理器：Atlas 推理系列产品
处理器形态：EP、RC
问题现象：

VDEC解码性能下降，小于VDEC对外发布的性能规格，导致发生卡顿等现象。

原因分析
分析VDEC性能下降导致的卡顿故障，可能原因有：

视频解码回调函数中耗时过长，影响解码性能。
输入码流中I帧比例过大，解码I帧耗时比解码P帧耗时长，影响解码性能。
输入码流中存在异常帧，影响解码性能。
解决措施
针对上述可能的故障原因，可以参考以下方式进行处理：

在回调函数中打点测试耗时，查看耗时是否过长（回调函数允许的最大耗时和帧率相关，计算公式为：最大耗时=1/帧率，例如帧率=30fps，则最大耗时=1/（30fps）=0.033s）。
使用第三方工具打开输入码流，查看I帧比例是否过大。一般GOP值为30（即I帧间隔为30），如果I帧比例过大，则需要替换成正常码流进行性能测试。
使用第三方工具打开输入码流，查看是否存在异常帧（如用第三方工具打开显示花屏或解码报错），如果存在异常帧，会造成不满足规格的现象。
Referenced from:https://www.hiascend.com/document/caselibrary/detail/topic_0000001321265632

eseye u 视频工具

作者: const
时间: 2025-03-20
分类: 云笔记
评论

eseye u 视频工具全称是： Elecard StreamEye Tools
分析视频帧的工具： Elecard StreamEye Tools eseye_u.exe

在华为的昇腾经常会看到,如下:

检查输入的源码流是否有问题。
使用第三方工具（如：eseye u）对输入码流进行检查，查看码流是否异常。

DVPP媒体数据处理视频解码问题案例(https://www.hiascend.com/forum/thread-0281137042944264034-1-1.html)

StreamEye: 这是Elecard公司的一款产品，用于分析和调试视频压缩算法。它可以分析各种不同的视频流格式，包括H.264/AVC, MPEG-2, VP9等。它的官网地址是 https://www.elecard.com/zh/products/video-analysis/streameye

lecard StreamEye Tools是一个强大的视频序列或码流分析软件，YUV分析，264文件分析软件，H.264视频编解码学习必备的东西，Elecard StreamEye Suite是一套用于专业视频压缩领域的功能强大的工具，能够帮助用户进行有效的对于视频序列的深入分析。感觉STREAM EYE的界面更加亲民，而且他的视频窗口可缩放，比较好操作，但是功能上面还是不如VISA强大，不过初学的话也是可以接受了。编码视频的可视化表现，流结构分析，这些流可以是MPEG-1/2/4 or AVC/H.264 VES(视频基本流)、SS（MPEG1的系统流）、，PS(MPEG2的程序流)、TS(mpeg2的传输流)。

破解方法，复制以下两个文件到C:Program FilesCommon FilesElecard文件夹下，覆盖原来的LC.dll，然后运行Registrator.exe，选中Elecard StreamEye Tools点击Activate，提示Enter Serial number，什么都不用填，直接OK即可.

eseye u 下载地址:http://www.rsdown.cn/down/164892.html

昇腾软件包数字签名验证

作者: const
时间: 2025-03-28
分类: 云笔记
评论

ls -lh Ascend-cann-*
-rwxrwxrwx 1 hesy hesy 698M  3月 28 17:23 Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run
-rwxrwxrwx 1 hesy hesy  490  3月 28 17:17 Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.asc
-rwxrwxrwx 1 hesy hesy 8.5K  3月 28 17:17 Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.p7s
-rwxrwxrwx 1 hesy hesy 2.2G  3月 28 17:39 Ascend-cann-toolkit_8.1.RC1.alpha001_linux-aarch64.run
-rwxrwxrwx 1 hesy hesy  490  3月 28 17:17 Ascend-cann-toolkit_8.1.RC1.alpha001_linux-aarch64.run.asc
-rwxrwxrwx 1 hesy hesy 8.5K  3月 28 17:17 Ascend-cann-toolkit_8.1.RC1.alpha001_linux-aarch64.run.p7s

gpg --verify Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.asc

gpg:
假定被签名的数据在‘Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run’
gpg: 签名建立于 2025年03月14日星期五 18时02分40秒 CST gpg: 使用 RSA 密钥
99AD81DF27A74824 gpg: 无法检查签名：缺少公钥

gpg --keyserver hkp://keyserver.ubuntu.com --recv-keys 99AD81DF27A74824

gpg: 密钥 99AD81DF27A74824：公钥 “OpenPGP signature key for Huawei software (created on 30th Dec,2013) <support@huawei.com>” 已导入
gpg: 处理的总数：1
gpg: 已导入：1

gpg --verify Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.asc

gpg: 假定被签名的数据在‘Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run’
gpg: 签名建立于 2025年03月14日星期五 18时02分40秒 CST
gpg: 使用 RSA 密钥 99AD81DF27A74824
gpg: 完好的签名，来自于 “OpenPGP signature key for Huawei software (created on 30th Dec,2013) <support@huawei.com>” [未知]
gpg: 警告：此密钥未被受信任签名认证！
gpg: 没有证据表明此签名属于其声称的所有者。
主密钥指纹： B100 0AC3 8C41 525A 19BD C087 99AD 81DF 27A7 4824

gpg --verify Ascend-cann-toolkit_8.1.RC1.alpha001_linux-aarch64.run.asc

gpg: 假定被签名的数据在‘Ascend-cann-toolkit_8.1.RC1.alpha001_linux-aarch64.run’
gpg: 签名建立于 2025年03月14日星期五 18时04分50秒 CST
gpg: 使用 RSA 密钥 99AD81DF27A74824
gpg: 完好的签名，来自于 “OpenPGP signature key for Huawei software (created on 30th Dec,2013) <support@huawei.com>” [未知]
gpg: 警告：此密钥未被受信任签名认证！
gpg: 没有证据表明此签名属于其声称的所有者。
主密钥指纹： B100 0AC3 8C41 525A 19BD C087 99AD 81DF 27A7 4824

列出本地的所有 Key
执行 gpg --list-keys 列出本地所有的密钥

gpg --list-keys

/home/hesy/.gnupg/pubring.kbx
pub rsa2048 2013-12-30 [SC]

  B1000AC38C41525A19BDC08799AD81DF27A74824

uid [ 未知 ] OpenPGP signature key for Huawei software (created on 30th Dec,2013) <support@huawei.com>

导出公钥

gpg --output rsa_pub.pgp --armor --export B1000AC38C41525A19BDC08799AD81DF27A74824

armor参数的意思是以ASCII导出,默认是二进制格式导出.

cat rsa_pub.pgp

-----BEGIN PGP PUBLIC KEY BLOCK-----

mQENBFLBPjsBCACtQyXqecsm1a3GvRoPHpfrB9ITrYeN0vfSFlJeL4esOgQuA5l3
ILTS9bdGH9OsLuWnryVcGVRVHrpmjhuqSJycYPn/3VXtUWMm27zVSHrKhOTmm3z2
rIeuOdJzaKpgwtkwRzeuutDPW8GqpwRQaGEOGLNcMv+FRYdjmtVru6SKcC7zjhfY
2/TIj4nGGVCP+ebjQGoLBMjC2o9fJi+UV5IxHnB896YpcRKs+JTaV5KSnQ8fG23D
lDofBgwrl9icaNXm17bPx2WYRutaURS6HKh17Vffv4T4t3BoaKj0xE/cF761FPrD
NIFxSpRqQ3As4kS3UOUFg1/5NUBkev4MI5ifABEBAAG0WU9wZW5QR1Agc2lnbmF0
dXJlIGtleSBmb3IgSHVhd2VpIHNvZnR3YXJlIChjcmVhdGVkIG9uIDMwdGggRGVj
LDIwMTMpIDxzdXBwb3J0QGh1YXdlaS5jb20+iQE3BBMBAgAhBQJSwT47AhsDBgsJ
CAcDAgYVCAIJCgsDFgIBAh4BAheAAAoJEJmtgd8np0gk0GQIAIJKdFLMivJdxlS8
INxZHejGaqToh9GqK1u6HQ3Hp59OKRPINBgd61NFuuOwcO0WqBArXfGLLQpBSWLa
5cIOHrEi+Pq2XkdxL3hZhnw/G/GHIHJTjIHNamTikalCz4B+BcsQ0UnFVKZDTkBA
F9a9Md21dJsgzEaAyqpozd1MKsed4Jcj7gY75L/DSWDdPEfeJCQ1j5RQpxbDn4sg
KiZU314DCz8+Iiz30c3l+WHryYC+g1gMeRWKYlS1waltoqpKJeb5b6VEarJKDJXm
LkV3kCmB8AMq8yS0y92uBR58WNlxw37o01grXaQMvTl7GkBQ20xLFHvIdU9UUzVX
86Xyy7A=
=0zUT
-----END PGP PUBLIC KEY BLOCK-----

普及一个数字签名的基本公式： signature = privkey_encrypt(hash_alg(content))

openssl pkcs7 -inform DER -in Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.p7s -out Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.pkcs7
cat Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.pkcs7

-----BEGIN PKCS7-----
...
-----END PKCS7-----

openssl pkcs7 -print_certs -in Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.pkcs7 -out Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.pkcs7.cert

cat Ascend-cann-kernels-310b_8.1.RC1.alpha001_linux-aarch64.run.pkcs7.cert

subject=C = CN, O = Huawei Technologies, OU = Huawei Trust Service, CN = Huawei Release-Signing Authority 1 - G2
issuer=C = CN, O = Huawei Technologies, OU = Huawei Certification Authority, CN = Huawei Integrity CA 1 - G2
-----BEGIN CERTIFICATE-----
...

Ubuntu 安装部署PaddleOCR过程

作者: const
时间: 2025-04-22
分类: Ubuntu
评论

安装部署参考的官方文档在这里https://paddlepaddle.github.io/PaddleOCR/latest/ppocr/quick_start.html#211

1.搭建一个python的虚拟环境.

mkdir paddleocc
cd paddleocr
python -m venv .
cd bin
source activate

然后在命令行的提示语中就有(paddleocr)这个提示信息了.

2.安装相应的依赖

pip install --upgrade pip
pip install pysocks -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddlepaddle -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddleocr -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pymupdf -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyfftw -i https://pypi.tuna.tsinghua.edu.cn/simple
sudo apt install -y ccache
sudo apt install libgomp1

3.验证安装是否正确

paddleocr -h

4.转码pdf文件

paddleocr --image_dir ./2.pdf --use_angle_cls true --use_gpu false
paddleocr --image_dir ./2.pdf --use_angle_cls true --use_gpu false --savefile true

输出如下:

[2025/04/22 11:11:53] ppocr INFO: for usage help, please use paddleocr --help
[2025/04/22 11:11:53] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, use_mlu=False, use_gcu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir='./2.pdf', page_num=0, det_algorithm='DB', det_model_dir='/home/hesy/.paddleocr/whl/det/ch/ch_PP-OCRv4_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/home/hesy/.paddleocr/whl/rec/ch/ch_PP-OCRv4_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/media/hesy/Elements/python-all/paddleocr/lib/python3.11/site-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='/home/hesy/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, onnx_providers=False, onnx_sess_options=False, return_word_box=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, formula_algorithm='LaTeXOCR', formula_model_dir=None, formula_char_dict_path=None, formula_batch_num=1, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, formula=False, ocr=True, recovery=False, recovery_to_markdown=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='ch', det=True, rec=True, type='ocr', savefile=True, ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
[2025/04/22 11:11:54] ppocr INFO: ./2.pdf

最后生成的文件在output目录下,如下所示:

head 2.txt

[[[132.0, 132.0], [552.0, 132.0], [552.0, 211.0], [132.0, 211.0]],
('财富的真相', 0.9918341636657715)] [[[129.0, 232.0], [555.0, 233.0],
[555.0, 262.0], [129.0, 261.0]], ('一种学校不教却人人需要的知识',
0.9969227910041809)] [[[300.0, 326.0], [384.0, 326.0], [384.0, 348.0], [300.0, 348.0]], ('李笑来著', 0.9979466795921326)] [[[248.0, 942.0],
[309.0, 942.0], [309.0, 965.0], [248.0, 965.0]], ('WGS',
0.6117124557495117)] [[[320.0, 948.0], [436.0, 948.0], [436.0, 966.0], [320.0, 966.0]], ('广东经济出版社', 0.9972420930862427)]

出现的几个错误解决:
错误１:
ERROR: Could not install packages due to an OSError: Missing dependencies for SOCKS support.
WARNING: There was an error checking the latest version of pip.

解决办法:

unset all_proxy
unset ALL_PROXY
pip install pysocks -i https://pypi.tuna.tsinghua.edu.cn/simple

错误2:
运行paddleocr -h的时候报错,错误如下:
/home/hesy/paddleocr/lib/python3.11/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)

解决办法:

sudo apt install -y ccache

错误3:
运行paddleocr -h的时候报错,错误如下:
ImportError: libgomp-24e2ab19.so.1.0.0: cannot open shared object file: No such file or directory

解决办法:

pip install pyfftw
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/paddleocr/lib/python3.11/site-packages/pyFFTW.libs