Run GLM-OCR on Novita

What is GLM-OCR

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.

Feature

1.Achieved 94.5% accuracy on OmniDocBench v1.5 with 0.9B parameters, surpassing the previous generation SOTA model PaddleOCR-VL, with significantly improved recognition capabilities for tables, formulas, and text. 2.State-of-the-Art Performance: Achieves a score of 94.62 on OmniDocBench V1.5, ranking #1 overall, and delivers state-of-the-art results across major document understanding benchmarks, including formula recognition, table recognition, and information extraction. 3.Optimized for Real-World Scenarios: Designed and optimized for practical business use cases, maintaining robust performance on complex tables, code-heavy documents, seals, and other challenging real-world layouts. 4.Efficient Inference: With only 0.9B parameters, GLM-OCR supports deployment via vLLM, SGLang, and Ollama, significantly reducing inference latency and compute cost, making it ideal for high-concurrency services and edge deployments. 5.Easy to Use: Fully open-sourced and equipped with a comprehensive SDK and inference toolchain, offering simple installation, one-line invocation, and smooth integration into existing production pipelines.

You can check the details on the official website of the project.

Run GLM-OCR on Novita

Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
Step 2: Package Selection
Locate GLM-OCR in the template repository and begin installation sequence.
enter image description here Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
enter image description here Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name. enter image description here Step 7: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.

How to get start

demo

1curl -sS http://127.0.0.1:8000/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -d '{
4    "model": "glm-ocr",
5    "messages": [
6      {
7        "role": "user",
8        "content": [
9          { "type": "text", "text": "Text Recognition:" },
10          { "type": "image_url", "image_url": { "url": "https://b0.bdstatic.com/ugc/HH-A0qTDkRlm6XZuGRFAsQ011d39f49244e4aadc2b34f17cd87d04.jpg" } }
11        ]
12      }
13    ],
14    "max_tokens": 256,
15    "temperature": 0
16  }'|jq .

Note: You need to replace http://localhost:8000 with your actual access address