NAVER-OCR-Accepted_Pretrained_Models.xlsx

	A	B	C	D	E
1	pretrain	link	Pretrain data	Note	Approve
2	ViT (Vision Transformer) bản base	https://huggingface.co/google/vit-base-patch16-224	Image net		Accepted
3	ViT/base pretrained trên tập Imagenet theo paper DeIT: Training data-efficient image transformers & distillation through attention	https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth	Image net		Accepted
4	Text image super resolution : https://github.com/mjq11302010044/Real-CE/tree/main	https://drive.google.com/file/d/1wga0xFdBSkAt_Pif3wPMG4tnHA9wQ7wD/view?usp=sharing	Image net	super resolution problem	Accepted
5	backbone VGG19 được train trên tập IMAGENET 1K với nhiệm vụ phân loại	https://download.pytorch.org/models/vgg19_bn-c79401a0.pth	Image net		Accepted
6	ABINet: Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets	https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
7	VGG19	https://pytorch.org/vision/main/models/generated/torchvision.models.vgg19_bn.html	Image net		Accepted
8	Vgg19	https://pytorch.org/vision/stable/models/generated/torchvision.models.vgg19_bn.html#torchvision.models.VGG19_BN_Weights	Image net		Accepted
9	ViT/Base được train trên tập Imagenet theo paper DeIT: Training data-efficient image transformers & distillation through attention	https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth	Image net		Accepted
10	Model nhận diện chữ của Clova AI Research nổi tiếng vào 2019	https://drive.google.com/file/d/1b59rXuGGmKne1AuHnkgDzoYgKeETNMv9/view?usp=sharing	Pretrain MJSynth (MJ)[1], SynthText (ST)	Dữ liệu synthetic	Accepted
11	Pretrain CLIP dùng làm backbone	https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384	Image net		Accepted
12	vgg pretrained với Imagenet Dataset	https://pytorch.org/vision/stable/models/vgg.html	Image net		Accepted
13	ABINet	https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/algorithm_rec_abinet_en.md	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
14	Mô hình đa ngôn ngữ XLM Roberta bản base	https://huggingface.co/xlm-roberta-base	Image net	Language model	Accepted
15	ViT/small pretrained trên tập Imagenet theo paper DeIT: Training data-efficient image transformers & distillation through attention	https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth	Image net		Accepted
16	Text image super resolution :https://github.com/csxmli2016/textbsr	https://github.com/csxmli2016/textbsr/releases/download/0.2.0/bsrgan_text_256.pth	Image net	super resolution problem	Accepted
17	Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition. Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE datasets.	https://paddleocr.bj.bcebos.com/contribution/rec_resnet_rfl_att_train.tar	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
18	ResNet	https://pytorch.org/vision/stable/models/resnet.html	Image net		Accepted
19	RESNET50	https://pytorch.org/vision/master/models/generated/torchvision.models.resnet50.html	Image net		Accepted
20	PaddleOCR	https://github.com/PaddlePaddle/PaddleOCR	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
21	ViT/Small được train trên tập Imagenet theo paper DeIT: Training data-efficient image transformers & distillation through attention	https://dl.fbaipublicfiles.com/deit/deit_small_patch16_224-cd65a155.pth	Imagenet		Accepted
22	pretrain clip dùng làm backbone	https://huggingface.co/timm/convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320	Image net		Accepted
23	Resnet pretrained với Imagenet Dataset		Image net		Accepted
24	SVTR	https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/algorithm_rec_svtr_en.md	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
25	Mô hình pretrained tiếng anh MATRN	https://www.dropbox.com/s/pjcarm73cqwbxh4/best-train-matrn.pth?dl=0	Bộ dữ liệu synthetic: Synthtext, Mjsynth, wikitext	Dữ liệu synthetic	Accepted
26	From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network. Using MJSynth and SynthText two text recognition datasets for training, and evaluating on IIIT, SVT, IC13, IC15, SVTP, CUTE datasets.	https://paddleocr.bj.bcebos.com/VisionLAN/rec_r45_visionlan_train.tar	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
27	ViTSTR là mô hình một giai đoạn đơn giản sử dụng Vision Transformer (ViT) được đào tạo trước để thực hiện Nhận dạng văn bản cảnh (ViTSTR). Nó có độ chính xác tương đương với các mô hình STR hiện đại mặc dù nó sử dụng số lượng tham số và FLOPS ít hơn đáng kể. ViTSTR cũng nhanh do tính toán song song vốn có của kiến trúc ViT.	https://github.com/roatienza/deep-text-recognition-benchmark	synthetic training datasets MJSynth (MJ) and SynthText	Dữ liệu synthetic	Accepted
28	YOLO	https://github.com/ultralytics/ultralytics	Image net		Accepted
29	ViT/Base train trên tập dữ liệu tiếng anh theo bài ViTSTR: Vision Transformer for Fast and Efficient Scene Text Recognition	https://github.com/roatienza/deep-text-recognition-benchmark/releases/download/v0.1.0/vitstr_base_patch16_224_aug.pth	synthetic training datasets MJSynth (MJ) and SynthText	Dữ liệu synthetic	Accepted
30	Swin Transformer (large-sized model)	https://huggingface.co/microsoft/swin-base-patch4-window12-384-in22k	Image net		Accepted
31	ViT/Small train trên tập dữ liệu tiếng anh theo bài ViTSTR: Vision Transformer for Fast and Efficient Scene Text Recognition	https://github.com/roatienza/deep-text-recognition-benchmark/releases/download/v0.1.0/vitstr_small_patch16_224_aug.pth	synthetic training datasets MJSynth (MJ) and SynthText	Dữ liệu synthetic	Accepted
32	SVTR: Scene Text Recognition with a Single Visual Model (SVTR Large)	https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_large_none_ctc_en_train.tar	Pretrain on image net		Accepted
33	BEiT (base-sized model, fine-tuned on ImageNet-22k)	https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k	Image net		Accepted
34	pretrain backbone	https://huggingface.co/timm/mobilenetv3_large_100.ra_in1k	Image net		Accepted
35	YOLOS (base-sized) model	https://huggingface.co/hustvl/yolos-base	Pretrain Imagenet Finetune: COCO detection		Accepted
36	ABI net	https://paddleocr.bj.bcebos.com/rec_r45_abinet_train.tar	synthetic training datasets MJSynth (MJ) and SynthText	Dữ liệu synthetic	Accepted
37	SVTR	https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_large_none_ctc_en_train.tar	synthetic training datasets MJSynth (MJ) and SynthText	Dữ liệu synthetic	Accepted
38	pretrain backbone tổng hợp tại thư viện timm	https://huggingface.co/timm	Image net		Accepted
39	SRN	https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/algorithm_rec_srn_en.md	pretrain trên synthtext và Mjsynth	Dữ liệu synthetic	Accepted
40	VGG Model	https://pytorch.org/vision/main/models/vgg.html	Image net		Accepted
41	Vision Transformer (base-sized model)	https://huggingface.co/google/vit-base-patch16-224	Image net		Accepted
42	pretrain VIT dùng làm backbone	timm/vit_large_patch14_clip_224.openai_ft_in12k_in1k	Image net		Accepted
43	Efficientnet pretrained với Imagenet Dataset	https://pytorch.org/vision/stable/models/efficientnet.html	Image net		Accepted
44	ResNet	https://paperswithcode.com/method/resnet	Imagenet		Accepted
45	pretrain backbone	https://huggingface.co/timm/resnet50.a1_in1k	Image net		Accepted
46	Vit-Transformer pretrain với cifar, imagenet dataset	https://github.com/google-research/vision_transformer	Cifar, imagenet		Accepted
47	VGG19 bn	https://pytorch.org/vision/main/models/generated/torchvision.models.vgg19_bn.html	Image net		Accepted
48	Mô hình pretrained tiếng anh ABINet++	https://drive.google.com/file/d/1p6Pw053fFtwmOWd7Qiw3w4qYKf13-bDg/view?usp=share_link	Dữ liệu pretrain này huấn luyện trên synthtext và Mjtext	Sử dụng synthetic text	Accepted
49	Các mô hình pretrained tiếng anh của MMOCR như: abinet, satrn, sar, master....	https://github.com/open-mmlab/mmocr	Các model pretrain được huấn luyện trên các bộ dataset liên quan đến tác vụ OCR tuỳ weight sẽ chỉ dùng synthetic data hoặc data thu thập thuần)	Được sử dụng pretrain của - ABInet - SATRN - SVTR - NRTR - MASTER - ASTER - CRNN	Accepted
50	SVTR: Scene Text Recognition with a Single Visual Model (SVTR Tiny).	https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar		Train trên dữ liệu synthetic	Accepted
51	Mô hình STR parseq	https://github.com/baudm/parseq	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
52	Parseq	https://github.com/baudm/parseq	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
53	nó là pretrained cho model parseq được huấn luyện trên tập dữ liệu tiếng anh cho bài toán scene text do tác giả parseq thực nghiệm để đánh giá các mô hình trong paper parseq	https://github.com/baudm/parseq/releases/download/v1.0.0/parseq-bb5792a6.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
54	Pretrained của mô hình Parseq	https://github.com/baudm/parseq/releases/download/v1.0.0/parseq-bb5792a6.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
55	là với một tập. Mô hình là với một tập hợp các mô hình tự hồi quy (AR), họ có thể thống nhất các phương pháp giải mã STR hiện tại (AR nhận biết theo ngữ cảnh và không phải AR không theo ngữ cảnh) và mô hình sàng lọc hai chiều (cloze). Với tham số hóa bộ giải mã chính xác, nó có thể được huấn luyện bằng Mô hình ngôn ngữ hoán vị để cho phép suy luận về các vị trí đầu ra tùy ý cho các tập hợp con tùy ý của ngữ cảnh đầu vào. Đặc điểm của phương pháp này tạo ra một mô hình STR thống nhất—PARSeq—có khả năng suy luận không ngữ cảnh và nhận biết ngữ cảnh, cũng như sàng lọc dự đoán lặp lại bằng cách sử dụng ngữ cảnh hai chiều mà không yêu cầu mô hình ngôn ngữ độc lập.p hợp các mô hình tự hồi quy (AR), chúng tôi có thể thống nhất các phương pháp giải mã STR hiện tại (AR nhận biết theo ngữ cảnh và không phải AR không theo ngữ cảnh) và mô hình sàng lọc hai chiều (cloze):	https://github.com/baudm/parseq	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
56	The TrOCR model is an encoder-decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa.	https://huggingface.co/microsoft/trocr-base-printed	Finetune on IAM,SROIE Pretrain: Dữ liệu dạng văn bản lấy từ trên mạng( có chứa bộ dataset IIIT-HWS là chữ viết tay) với khá nhiều synthetic data	Có sử dụng IIIT-HWS pretrain là bộ dữ liệu chữ viết tay	Rejected
57	Parseq-tiny	https://github.com/baudm/parseq/releases/tag/v1.0.0	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
58	Mô hình CRNN kết hợp vgg19 và transformer	https://vocr.vn/data/vietocr/config/vgg-transformer.yml	Train trên 10M ảnh chữ trên tác vụ text recognition	Pretrain trên bộ data chữ tiếng Việt	Rejected
59	VietOCR - mô hình kết hợp giữa mô hình CNN và Transformer	https://github.com/pbcquoc/vietocr	Train trên 10M ảnh chữ trên tác vụ text recognition	Pretrain trên bộ data chữ tiếng Việt	Rejected
60	nó là pretrained cho model abinet được huấn luyện trên tập dữ liệu tiếng anh cho bài toán scene text do tác giả parseq thực nghiệm để đánh giá các mô hình trong paper parseq	https://github.com/baudm/parseq/releases/download/v1.0.0/abinet-1d1e373e.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
61	Mô hình pre-trained tiếng anh của Parseq	https://github.com/baudm/parseq/releases/download/v1.0.0/parseq-bb5792a6.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
62	TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models	https://github.com/microsoft/unilm/tree/master/trocr	Kết hợp synthetic vs collected data	Các bản pretrain đã được train trên dữ liệu text không phải synthetic	Rejected
63	Pretrained của mô hình ABInet	https://github.com/baudm/parseq/releases/download/v1.0.0/abinet-1d1e373e.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
64	vietocr là mô hình cài đặt mô hình Transformer OCR nhận dạng chữ viết tay, chữ đánh máy cho Tiếng Việt. Kiến trúc mô hình là sự kết hợp tuyệt vời giữ mô hình CNN và Transformer (là mô hình nền tảng của BERT khá nổi tiếng).	https://github.com/pbcquoc/vietocr	Train trên 10M ảnh chữ trên tác vụ text recognition	Pretrain trên bộ data chữ tiếng Việt	Rejected
65	Model OCR transformer finetune trên chữ viết tay tiếng anh của Microsoft	https://huggingface.co/microsoft/trocr-base-handwritten	Đã được finetune trên bộ chữ tiếng Anh	Các bản pretrain đã được train trên dữ liệu chữ viết tay	Rejected
66	Parseq	https://github.com/baudm/parseq/releases/tag/v1.0.0	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
67	nó là pretrained cho model vitstr được huấn luyện trên tập dữ liệu tiếng anh cho bài toán scene text do tác giả parseq thực nghiệm để đánh giá các mô hình trong paper parseq	https://github.com/baudm/parseq/releases/download/v1.0.0/vitstr-26d0fcf4.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
68	Pretrained của mô hình TRBA	https://github.com/baudm/parseq/releases/download/v1.0.0/trba-cfaed284.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
69	parseq_small_patch16_224	https://github.com/baudm/parseq/releases/tag/v1.0.0	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
70	Pretrained của mô hình ViSTR	https://github.com/baudm/parseq/releases/download/v1.0.0/vitstr-26d0fcf4.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
71	Pretrained của mô hình CRNN	https://github.com/baudm/parseq/releases/download/v1.0.0/crnn-679d0e31.pt	synthetic training datasets MJSynth (MJ) [30] and SynthText, Kết hợp với nhiều bộ data text recognition khác COCO Text, UberTextm RCTW17,ART,MLT19,…	Sử dụng dữ liệu ảnh text không phải synthetic trong quá trình huấn luyện: COCO text, Uber Text,...	Rejected
72	Backbone gồm resnet, transformer kết hợp với position attention	https://awscv-public-data.s3.us-west-2.amazonaws.com/semimtr/semimtr_vision_model_real_l_and_u.pth		Không tra được dữ liệu pretrain	Rejected
73	Một biến thể của khối decoder trong mô hình transformer	https://awscv-public-data.s3.us-west-2.amazonaws.com/semimtr/abinet_language_model.pth		Không tra được dữ liệu pretrain	Rejected
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100