免费、绿色、专业的手机游戏中心下载安装平台-游家吧

当前位置: 首页 > 教程攻略 > 基于PaddleOCR2.4的天池街景字符编码识别Baseline

基于PaddleOCR2.4的天池街景字符编码识别Baseline

更新时间:2026-02-15 11:15:32

爱情和生活模拟rpg手机版
  • 类型:体育竞技
  • 大小:87.5mb
  • 语言:简体中文
  • 评分:
查看详情

基于PaddleOCR2.4的天池街景字符编码识别Baseline

该内容为天池街景字符编码识别比赛的实现过程,介绍赛题数据来自SVHN数据集,包含张训练集和张验证集。使用PaddleOCR进行模型训练,采用CRNN算法结合MobileNetV干网,并涉及数据准备、参数配置、评估与预测等步骤,最终生成提交结果,基础跑分。

一、 天池街景字符编码识别比赛

比赛地址:https://tianchi.aliyun.com/competition/entrance/531795/information

1.数据来源

赛题来源自Google街景图像中的门牌号数据集(The Street View House Numbers Dataset, SVHN),并根据一定方式采样得到比赛数据集。

2.数据基本情况

该数据来源于实际场景中的门牌号。训练集包含张图片,验证集包含张图片,每张图片不仅包含颜色图像还有对应的编码类别和精确位置信息;为了确保比赛的公平性,测试集A包含张图片,而测试集B同样也涵盖了张图片。

enter image description here

3.数据集样本展示

4.字段表

所有的数据(训练集、验证集和测试集)的标注采用JSON格式,通过文件名实现索引。若一个文件包含多个字符,则将其字段组合成列表。以下是Field描述:- `top`:左上角坐标Y - `height`:字符高度 - `left`:左上角坐标X - `width`:字符宽度 - `label`:字符编码

注:数据源自SVHN,匿名处理与噪音处理,请选手使用官方提供的数据集训练。

二、环境设置

PaddleOCRhttps://github.com/paddlepaddle/PaddleOCR是一款全宇宙最强的用的OCR工具库,开箱即用,速度杠杠的。 In []

# 从gitee上下载PaddleOCR代码,也可以从GitHub链接下载!git clone https://gitee.com/paddlepaddle/PaddleOCR.git --depth=1# 升级pip!pip install -U pip # 安装依赖%cd ~/PaddleOCR %pip install -r requirements.txt登录后复制 In []

%cd ~/PaddleOCR/ !tree -L 1登录后复制

创建一个PaddleOCR项目的结构: ```bash /home/aistudio/PaddleOCR ├── benchmark ├── configs ├── deploy ├── doc ├── __init__.py ├── LICENSE ├── MANIFEST.in ├── paddleocr.py ├── ppocr ├── PPOCRLabel ├── ppstructure ├── README_ch.md ├── README.md ├── requirements.txt ├── setup.py ├── StyleText ├── test_tipc └── tools ```该项目包含目录和文件,旨在构建一个基于PaddleOCR的项目。

三、数据准备

据悉train数据集共10万张,解压,并划分出10000张作为测试集。

1.数据下载解压

In []

# 解压缩数据集%cd ~ !unzip -qoa data/data124095/street_code_rec_data.zip -d ~/data/登录后复制

/home/aistudio登录后复制 In []

# 重命名文件夹!mv data/街景编码识别 data/street_code_rec_data登录后复制 In []

# 解压test数据集!unzip -qoa data/street_code_rec_data/mchar_test_a.zip -d data/street_code_rec_data/登录后复制 In []

# 解压eval据集!unzip -qoa data/street_code_rec_data/mchar_val.zip -d data/street_code_rec_data/登录后复制 In []

# 解压train数据集!unzip -qoa data/street_code_rec_data/mchar_train.zip -d data/street_code_rec_data/登录后复制 In []

# 使用命令查看训练数据文件夹下数据量是否是3张!cd data/street_code_rec_data/mchar_train && ls -l | grep "^-" | wc -l登录后复制

- 录后复制 In []

# 使用命令查看test数据文件夹下数据量是否是4万张!cd data/street_code_rec_data/mchar_test_a && ls -l | grep "^-" | wc -l登录后复制

- 录后复制 In []

# 使用命令查看test数据文件夹下数据量是否是1万张!cd data/street_code_rec_data/mchar_val && ls -l | grep "^-" | wc -l登录后复制

- 录后复制 In []

%cd data/street_code_rec_data !rm *.zip%cd ~登录后复制

/home/aistudio/data/street_code_rec_data /home/aistudio登录后复制

2. 数据标签处理

In []

import jsondef trans(path): with open(path + '.json', 'r') as f: json_data = json.load(f) print(len(json_data)) with open(path + '.csv', 'w') as ff: for item in json_data: label = json_data[item]['label'] label = [str(x) for x in label] label = ''.join(label) ff.write(item + '\t' + label + '\n')登录后复制 In []

trans('data/street_code_rec_data/mchar_val') trans('data/street_code_rec_data/mchar_train')登录后复制

- 30000登录后复制

3. 数据查看

In []

!head data/street_code_rec_data/mchar_val.csv登录后复制

- png 5 000001.png 210 000002.png 6 000003.png 1 000004.png 9 000005.png 1 000006.png 183 000007.png 65 000008.png 144 000009.png 16登录后复制 In []

!head data/street_code_rec_data/mchar_train.csv登录后复制

- png 19 000001.png 23 000002.png 25 000003.png 93 000004.png 31 000005.png 33 000006.png 28 000007.png 744 000008.png 128 000009.png 16登录后复制 In []

from PIL import Image img=Image.open('data/street_code_rec_data/mchar_train/000000.png')print(img.size) img登录后复制

(741, 350)登录后复制

<PIL.PngImagePlugin.PngImageFile image mode=RGB size=741x350 at 0x7F134A1CAB10>登录后复制

四、配置训练参数

以PaddleOCR/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml为基准进行配置

1.配置模型网络

使用CRNN算法,backbone是MobileNetV3,损失函数是CTCLoss

Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride: [1, 2, 2, 2] Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 0.00001登录后复制

2.配置数据

对Train.data_dir, Train.label_file_list, Eval.data_dir, Eval.label_file_list进行配置

Train: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_train label_file_list: ["/home/aistudio/data/street_code_rec_data/mchar_train.csv"] ... ...Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_val label_file_list: ["/home/aistudio/data/street_code_rec_data/mchar_val.csv"]登录后复制

3. 显卡、评估设置

use_gpu、cal_metric_during_train分别是GPU、评估开关

Global: use_gpu: false # true 使用GPU ..... cal_metric_during_train: False # true 打开评估登录后复制

4. 多线程任务

Train.loader.num_workers:4Eval.loader.num_workers: 4登录后复制

5.完整配置

Global: use_gpu: True epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_en_number_lite save_epoch_step: 3 # evaluation is run every 5000 iterations after the 4000th iteration eval_batch_step: [1000, 100] # if pretrained_model is saved in static mode, load_static_weights must set to True cal_metric_during_train: True pretrained_model: ./en_number_mobile_v2.0_rec_train/best_accuracy.pdparams checkpoints: save_inference_dir: use_visualdl: False infer_img: # for data or label process character_dict_path: ppocr/utils/en_dict.txt max_text_length: 25 infer_mode: False use_space_char: TrueOptimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00001Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride: [1, 2, 2, 2] Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 0.00001Loss: name: CTCLossPostProcess: name: CTCLabelDecodeMetric: name: RecMetric main_indicator: accTrain: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_train label_file_list: ["/home/aistudio/data/street_code_rec_data/mchar_train.csv"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - RecAug: - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: True batch_size_per_card: 256 drop_last: True num_workers: 8Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_val label_file_list: ["/home/aistudio/data/street_code_rec_data/mchar_val.csv"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: False drop_last: False batch_size_per_card: 256 num_workers: 8登录后复制 In [1]

# 已配置好的文件,直接覆盖替换(-f)!cp -f ~/rec_en_number_lite_train.yml ~/PaddleOCR/configs/rec/multi_language/rec_en_number_lite_train.yml登录后复制

6.使用预训练模型

据悉使用预训练模型,训练速度更快!!!

PaddleOCR 提供的可下载模型包括推理模型、训练模型、预训练模型和 slim 模型,这些模型类型及其区别说明如下: 推理模型 - 格式: inference.pdmodel, inference.pdiparams - 简介: 用于预测引擎进行推理操作。这类模型是专门为实现 PaddleOCR 的识别功能而设计的。 训练模型、预训练模型 - 格式: *.pdparams, *.pdopt, *.states - 简介: 在训练过程中保存和使用的模型参数、优化器状态和训练中间信息,通常用于评估和恢复模型指标以及后续的训练任务。 slim 模型 - 格式: *.nb - 简介: 经过 PaddleSlim 压缩工具压缩后的模型,适用于移动端或 IoT 端等端侧部署场景。这种模型需要通过飞桨 Paddle Lite 进行部署使用。

各个模型的关系如下面的示意图所示。

文本检测模型

英文识别模型

中文版本模型名称:en_number_mobile_slim_vrec (slim裁剪量化版) 简介:此模型为一个超轻量的阿拉伯语、数字和英文识别模型,旨在提供高效的性能。 配置文件:rec_en_number_lite_train.yml 推理模型大小:(推理模型)/(训练模型)下载地址:无需额外地址。

In []

%cd ~/PaddleOCR/# mobile模型!wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar !tar -xf en_number_mobile_v2.0_rec_train.tar登录后复制

/home/aistudio/PaddleOCR --2022-01-02 00:10:41-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar Resolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368a Connecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 9123840 (8.7M) [application/x-tar] Saving to: ‘en_number_mobile_v2.0_rec_train.tar’ en_number_mobile_v2 100%[===================>] 8.70M 8.63MB/s in 1.0s 2022-01-02 00:10:42 (8.63 MB/s) - ‘en_number_mobile_v2.0_rec_train.tar’ saved [9123840/9123840]登录后复制

五、训练

In []

%cd ~/PaddleOCR/# mobile模型!python tools/train.py -c ./configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.checkpoints=./output/rec_en_number_lite/latest登录后复制

1.选择合适的batch size

2.训练日志

- 01/02 01:28:23] root INFO: save model in ./output/rec_en_number_lite/latest[2022/01/02 01:28:23] root INFO: Initialize indexs of datasets:['/home/aistudio/data/street_code_rec_data/mchar_train.csv'][2022/01/02 01:28:54] root INFO: epoch: [27/500], iter: 180, lr: 0.000986, loss: 1.043328, acc: 0.765624, norm_edit_dis: 0.863509, reader_cost: 2.26051 s, batch_cost: 2.59590 s, samples: 7168, ips: 276.12724[2022/01/02 01:29:18] root INFO: epoch: [27/500], iter: 190, lr: 0.000986, loss: 1.056450, acc: 0.765624, norm_edit_dis: 0.864510, reader_cost: 1.18228 s, batch_cost: 1.65932 s, samples: 10240, ips: 617.12064[2022/01/02 01:29:34] root INFO: epoch: [27/500], iter: 200, lr: 0.000985, loss: 1.069025, acc: 0.759277, norm_edit_dis: 0.860254, reader_cost: 0.74316 s, batch_cost: 1.15521 s, samples: 10240, ips: 886.42030eval model:: 100%|| 10/10 [00:07<00:00, 2.12it/s] [2022/01/02 01:29:42] root INFO: cur metric, acc: 0.6261999373800062, norm_edit_dis: 0.7362716930394972, fps: 4054.7339744968563[2022/01/02 01:29:42] root INFO: save best model is to ./output/rec_en_number_lite/best_accuracy[2022/01/02 01:29:42] root INFO: best metric, acc: 0.6261999373800062, start_epoch: 21, norm_edit_dis: 0.7362716930394972, fps: 4054.7339744968563, best_epoch: 27登录后复制

3. visualdl可视化

本地安装visualdlpip install visualdl 下载日志至本地 启动visualdl可视化visualdl --logdir ./ 打开浏览器查看http://localhost:8040/

六、模型评估

In []

# GPU 评估, Global.checkpoints 为待测权重%cd ~/PaddleOCR/# mobile模型!python -m paddle.distributed.launch tools/eval.py -c ./configs/rec/multi_language/rec_en_number_lite_train.yml \ -o Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams登录后复制

/home/aistudio/PaddleOCR ----------- Configuration Arguments ----------- backend: auto elastic_server: None force: False gpus: None heter_devices: heter_worker_num: None heter_workers: host: None http_port: None ips: 127.0.0.1 job_id: None log_dir: log np: None nproc_per_node: None run_mode: None scale: 0 server_num: None servers: training_script: tools/eval.py training_script_args: ['-c', './configs/rec/multi_language/rec_en_number_lite_train.yml', '-o', 'Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams'] worker_num: None workers: ------------------------------------------------ WARNING 2022-01-02 01:32:26,892 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective mode launch train in GPU mode! INFO 2022-01-02 01:32:26,894 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:33420 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:33420 | | PADDLE_RANK_IN_NODE 0 | | PADDLE_LOCAL_DEVICE_IDS 0 | | PADDLE_WORLD_DEVICE_IDS 0 | | FLAGS_selected_gpus 0 | | FLAGS_selected_accelerators 0 | +=======================================================================================+ INFO 2022-01-02 01:32:26,894 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 launch proc_id:1384 idx:0 [2022/01/02 01:32:28] root INFO: Architecture : [2022/01/02 01:32:28] root INFO: Backbone : [2022/01/02 01:32:28] root INFO: model_name : small [2022/01/02 01:32:28] root INFO: name : MobileNetV3 [2022/01/02 01:32:28] root INFO: scale : 0.5 [2022/01/02 01:32:28] root INFO: small_stride : [1, 2, 2, 2] [2022/01/02 01:32:28] root INFO: Head : [2022/01/02 01:32:28] root INFO: fc_decay : 1e-05 [2022/01/02 01:32:28] root INFO: name : CTCHead [2022/01/02 01:32:28] root INFO: Neck : [2022/01/02 01:32:28] root INFO: encoder_type : rnn [2022/01/02 01:32:28] root INFO: hidden_size : 48 [2022/01/02 01:32:28] root INFO: name : SequenceEncoder [2022/01/02 01:32:28] root INFO: Transform : None [2022/01/02 01:32:28] root INFO: algorithm : CRNN [2022/01/02 01:32:28] root INFO: model_type : rec [2022/01/02 01:32:28] root INFO: Eval : [2022/01/02 01:32:28] root INFO: dataset : [2022/01/02 01:32:28] root INFO: data_dir : /home/aistudio/data/street_code_rec_data/mchar_val [2022/01/02 01:32:28] root INFO: label_file_list : ['/home/aistudio/data/street_code_rec_data/mchar_val.csv'] [2022/01/02 01:32:28] root INFO: name : SimpleDataSet [2022/01/02 01:32:28] root INFO: transforms : [2022/01/02 01:32:28] root INFO: DecodeImage : [2022/01/02 01:32:28] root INFO: channel_first : False [2022/01/02 01:32:28] root INFO: img_mode : BGR [2022/01/02 01:32:28] root INFO: CTCLabelEncode : None [2022/01/02 01:32:28] root INFO: RecResizeImg : [2022/01/02 01:32:28] root INFO: image_shape : [3, 32, 320] [2022/01/02 01:32:28] root INFO: KeepKeys : [2022/01/02 01:32:28] root INFO: keep_keys : ['image', 'label', 'length'] [2022/01/02 01:32:28] root INFO: loader : [2022/01/02 01:32:28] root INFO: batch_size_per_card : 1024 [2022/01/02 01:32:28] root INFO: drop_last : False [2022/01/02 01:32:28] root INFO: num_workers : 8 [2022/01/02 01:32:28] root INFO: shuffle : False [2022/01/02 01:32:28] root INFO: Global : [2022/01/02 01:32:28] root INFO: cal_metric_during_train : True [2022/01/02 01:32:28] root INFO: character_dict_path : ppocr/utils/en_dict.txt [2022/01/02 01:32:28] root INFO: checkpoints : ./output/rec_en_number_lite/best_accuracy.pdparams [2022/01/02 01:32:28] root INFO: debug : False [2022/01/02 01:32:28] root INFO: distributed : False [2022/01/02 01:32:28] root INFO: epoch_num : 500 [2022/01/02 01:32:28] root INFO: eval_batch_step : [100, 100] [2022/01/02 01:32:28] root INFO: infer_img : None [2022/01/02 01:32:28] root INFO: infer_mode : False [2022/01/02 01:32:28] root INFO: log_smooth_window : 20 [2022/01/02 01:32:28] root INFO: max_text_length : 25 [2022/01/02 01:32:28] root INFO: pretrained_model : ./en_number_mobile_v2.0_rec_train/best_accuracy.pdparams [2022/01/02 01:32:28] root INFO: print_batch_step : 10 [2022/01/02 01:32:28] root INFO: save_epoch_step : 3 [2022/01/02 01:32:28] root INFO: save_inference_dir : None [2022/01/02 01:32:28] root INFO: save_model_dir : ./output/rec_en_number_lite [2022/01/02 01:32:28] root INFO: use_gpu : True [2022/01/02 01:32:28] root INFO: use_space_char : True [2022/01/02 01:32:28] root INFO: use_visualdl : False [2022/01/02 01:32:28] root INFO: Loss : [2022/01/02 01:32:28] root INFO: name : CTCLoss [2022/01/02 01:32:28] root INFO: Metric : [2022/01/02 01:32:28] root INFO: main_indicator : acc [2022/01/02 01:32:28] root INFO: name : RecMetric [2022/01/02 01:32:28] root INFO: Optimizer : [2022/01/02 01:32:28] root INFO: beta1 : 0.9 [2022/01/02 01:32:28] root INFO: beta2 : 0.999 [2022/01/02 01:32:28] root INFO: lr : [2022/01/02 01:32:28] root INFO: learning_rate : 0.001 [2022/01/02 01:32:28] root INFO: name : Cosine [2022/01/02 01:32:28] root INFO: name : Adam [2022/01/02 01:32:28] root INFO: regularizer : [2022/01/02 01:32:28] root INFO: factor : 1e-05 [2022/01/02 01:32:28] root INFO: name : L2 [2022/01/02 01:32:28] root INFO: PostProcess : [2022/01/02 01:32:28] root INFO: name : CTCLabelDecode [2022/01/02 01:32:28] root INFO: Train : [2022/01/02 01:32:28] root INFO: dataset : [2022/01/02 01:32:28] root INFO: data_dir : /home/aistudio/data/street_code_rec_data/mchar_train [2022/01/02 01:32:28] root INFO: label_file_list : ['/home/aistudio/data/street_code_rec_data/mchar_train.csv'] [2022/01/02 01:32:28] root INFO: name : SimpleDataSet [2022/01/02 01:32:28] root INFO: transforms : [2022/01/02 01:32:28] root INFO: DecodeImage : [2022/01/02 01:32:28] root INFO: channel_first : False [2022/01/02 01:32:28] root INFO: img_mode : BGR [2022/01/02 01:32:28] root INFO: RecAug : None [2022/01/02 01:32:28] root INFO: CTCLabelEncode : None [2022/01/02 01:32:28] root INFO: RecResizeImg : [2022/01/02 01:32:28] root INFO: image_shape : [3, 32, 320] [2022/01/02 01:32:28] root INFO: KeepKeys : [2022/01/02 01:32:28] root INFO: keep_keys : ['image', 'label', 'length'] [2022/01/02 01:32:28] root INFO: loader : [2022/01/02 01:32:28] root INFO: batch_size_per_card : 1024 [2022/01/02 01:32:28] root INFO: drop_last : True [2022/01/02 01:32:28] root INFO: num_workers : 8 [2022/01/02 01:32:28] root INFO: shuffle : True [2022/01/02 01:32:28] root INFO: profiler_options : None [2022/01/02 01:32:28] root INFO: train with paddle 2.2.1 and device CUDAPlace(0) [2022/01/02 01:32:28] root INFO: Initialize indexs of datasets:['/home/aistudio/data/street_code_rec_data/mchar_val.csv'] W0102 01:32:28.580307 1384 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0102 01:32:28.584791 1384 device_context.cc:465] device: 0, cuDNN Version: 7.6. [2022/01/02 01:32:33] root INFO: resume from ./output/rec_en_number_lite/best_accuracy [2022/01/02 01:32:33] root INFO: metric in ckpt *************** [2022/01/02 01:32:33] root INFO: acc:0.6261999373800062 [2022/01/02 01:32:33] root INFO: start_epoch:28 [2022/01/02 01:32:33] root INFO: norm_edit_dis:0.7362716930394972 [2022/01/02 01:32:33] root INFO: fps:4054.7339744968563 [2022/01/02 01:32:33] root INFO: best_epoch:27 eval model:: 0%| | 0/10 [00:00<"https://img.php.cn/upload/article/001/571/248/175375962652745.jpg" >

轻松跑步!大家加油吧!别忘了加入训练数据,持续改进,多次练习后一定能提升表现哦。

以上就是基于PaddleOCR2.4的天池街景字符编码识别Baseline的详细内容,更多请关注其它相关文章!

精品推荐

相关文章

最新资讯

热门文章

更多

最新推荐

更多

最新更新

更多