
Run PaddleOCR-VL-1.5 on Novita
What is PaddleOCR-VL-1.5
PaddleOCR-VL-1.5 is a brand-new iteration of the PaddleOCR-VL series. Building on the comprehensive optimization of the core capabilities of version 1.0, this model achieved a high accuracy of 94.5% on the authoritative document parsing evaluation set OmniDocBench v1.5, surpassing global top-tier general large models and specialized document parsing models. PaddleOCR-VL-1.5 innovatively supports the localization of irregular bounding boxes for document elements, enabling PaddleOCR-VL-1.5 to perform exceptionally well in real-world deployment scenarios such as scanning, tilting, bending, screen photography, and complex lighting, achieving comprehensive SOTA results. Additionally, the model further integrates seal recognition and text detection/recognition tasks, with key metrics continuing to lead mainstream models.
Feature
- Achieved 94.5% accuracy on OmniDocBench v1.5 with 0.9B parameters, surpassing the previous generation SOTA model PaddleOCR-VL, with significantly improved recognition capabilities for tables, formulas, and text.
- The world's first document parsing model to support irregular bounding box localization, capable of accurately returning polygon detection boxes in tilted and curved scenarios. Its accuracy outperforms current mainstream open-source and closed-source models across 5 scenarios: scanning, bending, tilting, screen photography, and lighting changes.
- Added text line localization/recognition and seal recognition capabilities, with all technical indicators refreshing the SOTA in the field.
- Refined recognition capabilities for special scenarios and multiple languages. Optimized recognition effects for rarely used characters, ancient books, multilingual tables, underlines, and checkboxes, and expanded support for Tibetan and Bengali recognition.
- Supports automatic merging of cross-page tables and recognition of cross-page paragraph headers, solving the problem of fragmentation in long document parsing.
- Inference speed is further improved. When tested with PDF files on an A100, the model can process 1.43 document pages per second, which is 43% faster than MinerU2.5 and more than twice the speed of DeepSeek-OCR. You can check the details on the official website of the project.
How to get start
Demo
Step 1
This is a python test case.
1import base64 2import requests 3import pathlib 4 5API_URL = "http://localhost:8080/layout-parsing" # Service URL 6 7image_path = "./demo.jpg" 8 9# Encode local image to Base64 10with open(image_path, "rb") as file: 11 image_bytes = file.read() 12 image_data = base64.b64encode(image_bytes).decode("ascii") 13 14payload = { 15 "file": image_data, # Base64 encoded file content or file URL 16 "fileType": 1, # File type, 1 means image file 17} 18 19# Call the API 20response = requests.post(API_URL, json=payload) 21 22# Process the API response data 23assert response.status_code == 200 24result = response.json()["result"] 25for i, res in enumerate(result["layoutParsingResults"]): 26 print(res["prunedResult"]) 27 md_dir = pathlib.Path(f"markdown_{i}") 28 md_dir.mkdir(exist_ok=True) 29 (md_dir / "doc.md").write_text(res["markdown"]["text"]) 30 for img_path, img in res["markdown"]["images"].items(): 31 img_path = md_dir / img_path 32 img_path.parent.mkdir(parents=True, exist_ok=True) 33 img_path.write_bytes(base64.b64decode(img)) 34 print(f"Markdown document saved at {md_dir / 'doc.md'}") 35 for img_name, img in res["outputImages"].items(): 36 img_path = f"{img_name}_{i}.jpg" 37 pathlib.Path(img_path).parent.mkdir(exist_ok=True) 38 with open(img_path, "wb") as f: 39 f.write(base64.b64decode(img)) 40 print(f"Output image saved at {img_path}")
Step 2
Prepare the picture of that needs OCR Use the official test cases in this demo.
https://github.com/PaddlePaddle/PaddleOCR/blob/main/tests/test_files/book.jpg
1curl https://raw.githubusercontent.com/PaddlePaddle/PaddleOCR/main/tests/test_files/book.jpg -o demo.jpg
Step 3
Copy port mapping address and repace API URL in test.py file.
Step 4
Run test.py and check the result.
1$ python test.py 2{'page_count': None, 'width': 1100, 'height': 708, 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_chart_recognition': False, 'use_seal_recognition': False, 'use_ocr_for_image_block': False, 'format_block_content': False, 'merge_layout_blocks': True, 'markdown_ignore_labels': ['number', 'footnote', 'header', 'header_image', 'footer', 'footer_image', 'aside_text'], 'return_layout_polygon_points': True}, 'parsing_res_list': [{'block_label': 'text', 'block_content': "chances of the lottery jachts are also use combination formulas to work out the chances of the other prizes, but it all starts to get a bit fiddly so we'll move on to something else. (How to work out the other lottery chances is just one of the amazing features you'll find at: www.murderousmaths.co.uk)", 'block_bbox': [180, 0, 512, 109], 'block_id': 0, 'block_order': 1, 'group_id': 0, 'block_polygon_points': [[180.0, 0.0], [512.0, 0.0], [512.0, 109.0], [180.0, 109.0]]}, {'block_label': 'paragraph_title', 'block_content': 'The disappearing sum', 'block_bbox': [180, 113, 310, 137], 'block_id': 1, 'block_order': 2, 'group_id': 1, 'block_polygon_points': [[179.04934692382812, 119.45269012451172], [308.8138122558594, 110.3464126586914], [310.1516418457031, 129.41043090820312], [180.38717651367188, 138.51669311523438]]}, {'block_label': 'text', 'block_content': "It's Friday evening. The lovely Veronica Gumfloss has been out with the football team who have all escorted her safely back to her doorstep. It's that tender moment when each hopeful player closes his eyes and leans forward with quivering lips. Unfortunately Veronica's parents heard them clumping down the road and Veronica knows she only has time to kiss four out of the eleven of them if she's going to do it properly.", 'block_bbox': [175, 126, 505, 289], 'block_id': 2, 'block_order': 3, 'group_id': 2, 'block_polygon_points': [[175, 137], [175, 281], [499, 285], [504, 134], [455, 126], [302, 126]]}, {'block_label': 'image', 'block_content': '', 'block_bbox': [179, 282, 491, 471], 'block_id': 3, 'block_order': None, 'group_id': 3, 'block_polygon_points': [[179.0, 282.0], [491.0, 282.0], [491.0, 471.0], [179.0, 471.0]]}, {'block_label': 'vision_footnote', 'block_content': "How many choices has she got? It's $ ^{11}C_{4} $ which is $ ^{111}4l \\times 7 $ but for goodness sake DON'T reach for the calculator! The most brilliant thing about perms and", 'block_bbox': [164, 455, 493, 531], 'block_id': 4, 'block_order': None, 'group_id': 4, 'block_polygon_points': [[164, 459], [164, 505], [345, 527], [492, 527], [492, 474], [377, 470], [323, 466], [246, 459], [207, 455], [170, 455]]}, {'block_label': 'number', 'block_content': '94', 'block_bbox': [301, 546, 326, 563], 'block_id': 5, 'block_order': None, 'group_id': 5, 'block_polygon_points': [[301.0, 546.0], [325.0, 546.0], [325.0, 562.0], [301.0, 562.0]]}, {'block_label': 'text', 'block_content': "means that EVERYTHING ON THE BOTTOM ALWAYS CANCELS OUT! It's probably the best fun you'll ever have with a pencil so here we go...", 'block_bbox': [552, 0, 892, 85], 'block_id': 6, 'block_order': 4, 'group_id': 6, 'block_polygon_points': [[552.6058349609375, -9.254895210266113], [895.4388427734375, 13.18508529663086], [890.72705078125, 85.17122650146484], [547.89404296875, 62.73124313354492]]}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11!}{4!\\times7!}=\\quad\\frac{11\\times10\\times9\\times8\\times7\\times6\\times5\\times4\\times3\\times2\\times1}{4\\times3\\times2\\times1\\times7\\times6\\times5\\times4\\times3\\times2\\times1} $$ ', 'block_bbox': [573, 74, 880, 128], 'block_id': 7, 'block_order': 5, 'group_id': 7, 'block_polygon_points': [[573, 89], [573, 109], [650, 113], [700, 117], [879, 127], [879, 96], [869, 92], [770, 85], [688, 78], [644, 74], [579, 74]]}, {'block_label': 'text', 'block_content': "(Before we continue, grab this book and show somebody this sum. Rub their face on it if you need to and tell them that this is the sort of thing you do for fun without a calculator these days because you're so brilliant.)", 'block_bbox': [550, 123, 889, 219], 'block_id': 8, 'block_order': 6, 'group_id': 8, 'block_polygon_points': [[550, 123], [550, 204], [660, 208], [883, 218], [888, 141], [697, 127], [648, 123]]}, {'block_label': 'text', 'block_content': "Off we go then. For starters we'll get rid of the 7! bit from top and bottom and get:", 'block_bbox': [549, 203, 887, 253], 'block_id': 9, 'block_order': 7, 'group_id': 9, 'block_polygon_points': [[549, 203], [549, 238], [886, 252], [886, 218], [792, 214], [676, 207]]}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11\\times10\\times9\\times8}{4\\times3\\times2\\times1} $$ ', 'block_bbox': [677, 255, 769, 292], 'block_id': 10, 'block_order': 8, 'group_id': 10, 'block_polygon_points': [[677.0, 255.0], [769.0, 255.0], [769.0, 292.0], [677.0, 292.0]]}, {'block_label': 'text', 'block_content': "Pow! That's already got rid of more than half the numbers. Next we'll see that the $ 4 \\times 2 $ on the bottom cancels out the 8 on top (and we don't need that “×1” on the bottom either). We're left with...", 'block_bbox': [547, 300, 886, 376], 'block_id': 11, 'block_order': 9, 'group_id': 11, 'block_polygon_points': [[547.0, 299.99993896484375], [886.40771484375, 307.02911376953125], [885.0, 375.0], [545.59228515625, 367.9708251953125]]}, {'block_label': 'display_formula', 'block_content': ' $$ \\frac{11\\times10\\times9}{3} $$ ', 'block_bbox': [685, 384, 756, 417], 'block_id': 12, 'block_order': 10, 'group_id': 12, 'block_polygon_points': [[685.0, 384.0], [756.0, 384.0], [756.0, 417.0], [685.0, 417.0]]}, {'block_label': 'text', 'block_content': "Then the 3 on the bottom divides into the 9 on top leaving it as a 3 so all we've got now is:", 'block_bbox': [545, 429, 884, 468], 'block_id': 13, 'block_order': 11, 'group_id': 13, 'block_polygon_points': [[545.0, 429.0], [884.0, 429.0], [884.0, 468.0], [545.0, 468.0]]}, {'block_label': 'text', 'block_content': "Veronica's choices = 11 × 10 × 3", 'block_bbox': [618, 477, 817, 496], 'block_id': 14, 'block_order': 12, 'group_id': 14, 'block_polygon_points': [[618.0, 477.0], [816.0, 477.0], [816.0, 495.0], [618.0, 495.0]]}, {'block_label': 'text', 'block_content': 'Look! No bottom.', 'block_bbox': [543, 508, 666, 529], 'block_id': 15, 'block_order': 13, 'group_id': 15, 'block_polygon_points': [[542.9999389648438, 508.0], [664.9999389648438, 508.0], [664.9999389648438, 528.0], [542.9999389648438, 528.0]]}, {'block_label': 'number', 'block_content': '95', 'block_bbox': [705, 555, 729, 571], 'block_id': 16, 'block_order': None, 'group_id': 16, 'block_polygon_points': [[705.0, 555.0], [728.0, 555.0], [728.0, 570.0], [705.0, 570.0]]}, {'block_label': 'image', 'block_content': '', 'block_bbox': [938, 0, 1099, 647], 'block_id': 17, 'block_order': None, 'group_id': 17, 'block_polygon_points': [[938.0, -2.0], [1099.0, -2.0], [1099.0, 647.0], [938.0, 647.0]]}], 'layout_det_res': {'boxes': [{'cls_id': 22, 'label': 'text', 'score': 0.9220595955848694, 'coordinate': [180, 0, 512, 109], 'order': 1, 'polygon_points': [[180.0, 0.0], [512.0, 0.0], [512.0, 109.0], [180.0, 109.0]]}, {'cls_id': 17, 'label': 'paragraph_title', 'score': 0.8456085920333862, 'coordinate': [180, 113, 310, 137], 'order': 2, 'polygon_points': [[179.04934692382812, 119.45269012451172], [308.8138122558594, 110.3464126586914], [310.1516418457031, 129.41043090820312], [180.38717651367188, 138.51669311523438]]}, {'cls_id': 22, 'label': 'text', 'score': 0.9213816523551941, 'coordinate': [175, 126, 505, 289], 'order': 3, 'polygon_points': [[175, 137], [175, 281], [499, 285], [504, 134], [455, 126], [302, 126]]}, {'cls_id': 14, 'label': 'image', 'score': 0.9448813199996948, 'coordinate': [179, 282, 491, 471], 'order': None, 'polygon_points': [[179.0, 282.0], [491.0, 282.0], [491.0, 471.0], [179.0, 471.0]]}, {'cls_id': 24, 'label': 'vision_footnote', 'score': 0.8173566460609436, 'coordinate': [164, 455, 493, 531], 'order': None, 'polygon_points': [[164, 459], [164, 505], [345, 527], [492, 527], [492, 474], [377, 470], [323, 466], [246, 459], [207, 455], [170, 455]]}, {'cls_id': 16, 'label': 'number', 'score': 0.7621420621871948, 'coordinate': [301, 546, 326, 563], 'order': 4, 'polygon_points': [[301.0, 546.0], [325.0, 546.0], [325.0, 562.0], [301.0, 562.0]]}, {'cls_id': 22, 'label': 'text', 'score': 0.913713276386261, 'coordinate': [552, 0, 892, 85], 'order': 5, 'polygon_points': [[552.6058349609375, -9.254895210266113], [895.4388427734375, 13.18508529663086], [890.72705078125, 85.17122650146484], [547.89404296875, 62.73124313354492]]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.8774852156639099, 'coordinate': [573, 74, 880, 128], 'order': 6, 'polygon_points': [[573, 89], [573, 109], [650, 113], [700, 117], [879, 127], [879, 96], [869, 92], [770, 85], [688, 78], [644, 74], [579, 74]]}, {'cls_id': 22, 'label': 'text', 'score': 0.9250841736793518, 'coordinate': [550, 123, 889, 219], 'order': 7, 'polygon_points': [[550, 123], [550, 204], [660, 208], [883, 218], [888, 141], [697, 127], [648, 123]]}, {'cls_id': 22, 'label': 'text', 'score': 0.8921533823013306, 'coordinate': [549, 203, 887, 253], 'order': 8, 'polygon_points': [[549, 203], [549, 238], [886, 252], [886, 218], [792, 214], [676, 207]]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.7999240159988403, 'coordinate': [677, 255, 769, 292], 'order': 9, 'polygon_points': [[677.0, 255.0], [769.0, 255.0], [769.0, 292.0], [677.0, 292.0]]}, {'cls_id': 22, 'label': 'text', 'score': 0.9141753315925598, 'coordinate': [547, 300, 886, 376], 'order': 10, 'polygon_points': [[547.0, 299.99993896484375], [886.40771484375, 307.02911376953125], [885.0, 375.0], [545.59228515625, 367.9708251953125]]}, {'cls_id': 5, 'label': 'display_formula', 'score': 0.849932074546814, 'coordinate': [685, 384, 756, 417], 'order': 11, 'polygon_points': [[685.0, 384.0], [756.0, 384.0], [756.0, 417.0], [685.0, 417.0]]}, {'cls_id': 22, 'label': 'text', 'score': 0.8802617192268372, 'coordinate': [545, 429, 884, 468], 'order': 12, 'polygon_points': [[545.0, 429.0], [884.0, 429.0], [884.0, 468.0], [545.0, 468.0]]}, {'cls_id': 22, 'label': 'text', 'score': 0.7239603400230408, 'coordinate': [618, 477, 817, 496], 'order': 13, 'polygon_points': [[618.0, 477.0], [816.0, 477.0], [816.0, 495.0], [618.0, 495.0]]}, {'cls_id': 22, 'label': 'text', 'score': 0.8236865997314453, 'coordinate': [543, 508, 666, 529], 'order': 14, 'polygon_points': [[542.9999389648438, 508.0], [664.9999389648438, 508.0], [664.9999389648438, 528.0], [542.9999389648438, 528.0]]}, {'cls_id': 16, 'label': 'number', 'score': 0.552054762840271, 'coordinate': [705, 555, 729, 571], 'order': 15, 'polygon_points': [[705.0, 555.0], [728.0, 555.0], [728.0, 570.0], [705.0, 570.0]]}, {'cls_id': 14, 'label': 'image', 'score': 0.8069510459899902, 'coordinate': [938, 0, 1099, 647], 'order': None, 'polygon_points': [[938.0, -2.0], [1099.0, -2.0], [1099.0, 647.0], [938.0, 647.0]]}]}} 3Markdown document saved at markdown_0/doc.md 4Output image saved at layout_det_res_0.jpg