
Run GLM-OCR on Novita
What is GLM-OCR
GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.
Feature
1.Achieved 94.5% accuracy on OmniDocBench v1.5 with 0.9B parameters, surpassing the previous generation SOTA model PaddleOCR-VL, with significantly improved recognition capabilities for tables, formulas, and text. 2.State-of-the-Art Performance: Achieves a score of 94.62 on OmniDocBench V1.5, ranking #1 overall, and delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction. 3.Optimized for Real-World Scenarios: Designed and optimized for practical business use cases, maintaining robust performance on complex tables, code-heavy documents, seals, and other challenging real-world layouts. 4.Efficient Inference: With only 0.9B parameters, GLM-OCR supports deployment via vLLM, SGLang, and Ollama, significantly reducing inference latency and compute cost, making it ideal for high-concurrency services and edge deployments. 5.Easy to Use: Fully open-sourced and equipped with a comprehensive SDK and inference toolchain, offering simple installation, one-line invocation, and smooth integration into existing production pipelines.
You can check the details on the official website of the project.
Run GLM-OCR on Novita
Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
Step 2: Package Selection
Locate GLM-OCR in the template repository and begin installation sequence.
Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name.
Step 7: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.
How to get start
demo
1curl -sS http://127.0.0.1:8000/v1/chat/completions \ 2 -H "Content-Type: application/json" \ 3 -d '{ 4 "model": "glm-ocr", 5 "messages": [ 6 { 7 "role": "user", 8 "content": [ 9 { "type": "text", "text": "Text Recognition:" }, 10 { "type": "image_url", "image_url": { "url": "https://b0.bdstatic.com/ugc/HH-A0qTDkRlm6XZuGRFAsQ011d39f49244e4aadc2b34f17cd87d04.jpg" } } 11 ] 12 } 13 ], 14 "max_tokens": 256, 15 "temperature": 0 16 }'|jq .
Note: You need to replace http://localhost:8000 with your actual access address