Skip to main content

Overview

DeepSeek-OCR is designed for document recognition and image-to-text scenarios, pushing the limits of visual and textual compression. The model can render long texts into highly compressed images, achieving an OCR accuracy of 97% at a lossless compression ratio of 10x and around 60% accuracy at 20x compression.
This model currently supports only single-turn, independent recognition tasks. It does not support multi-turn conversations. Only one image can be uploaded per request, and it is strongly recommended to use preset prompts for optimal performance.
# Convert the document contents to markdown format
<|grounding|>Convert the document to markdown.

# Perform text recognition on this image
<|grounding|>OCR this image.

# Extract all text without layout consideration
Free OCR.

# Parse any figures or tables in the document
Parse the figure.

# Provide a detailed description of the image content
Describe this image in detail.

# Locate the position of <|ref|>xxxx<|/ref|> in the image
Locate <|ref|>xxxx<|/ref|> in the image.

Usage Example

This example uses the <|grounding|>OCR this image. preset prompt to perform image text recognition.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key="<Your API Key>",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-ocr",
    messages=[
      {
        "role": "user",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.png"
            }
          },
          {
            "type": "text",
            "text": "<|grounding|>OCR this image."
          }
        ]
      }
    ],
    stream=False,
    max_tokens=4096
)

content = response.choices[0].message.content

print(content)
Example input image:
OCR Example Image
Example output:
<|/ref|><|det|>[[37, 48, 279, 140]]<|/det|>
<|ref|>Deploy open-source and specialized models<|/ref|><|det|>[[42, 48, 857, 133]]<|/det|>
<|ref|>smarterandfasterwithsimpleApls.Accessthe<|/ref|><|det|>[[44, 185, 902, 246]]<|/det|>
<|ref|>latest chat, code, image, audio, video models and<|/ref|><|det|>[[41, 291, 945, 370]]<|/det|>
<|ref|>more,ready for production with built-in<|/ref|><|det|>[[40, 407, 756, 488]]<|/det|>
<|ref|>scalability.<|/ref|><|det|>[[39, 515, 232, 606]]<|/det|>
<|ref|>Explore<|/ref|><|det|>[[87, 813, 266, 879]]<|/det|>
<|ref|>Models<|/ref|><|det|>[[289, 816, 432, 878]]<|/det|>