Don't show again

MiniCPM-V-2_6

Empower Your Applications with MiniCPM-V 2.6 on Novita AI.

Model Text to Image

Original Author : openbmbUpdate Time : 2024-09-25Hugging Face

One click deployment

On Demand

Deploy

README

Run MiniCPM-V-2_6 on Novita AI

Exploring MiniCPM-V

What is MiniCPM-V?

MiniCPM-V is a series of end-side multimodal Large Language Models (MLLMs) designed for vision-language understanding, capable of processing images, videos, and text to produce high-quality text outputs.

Key Features

High-Performance Models: MiniCPM-V 2.6 with 8B parameters and MiniCPM-V 2.0 with 2B parameters, both demonstrating strong performance in vision-language tasks.
Real-time Video Understanding: MiniCPM-V 2.6 supports real-time video understanding on devices like iPad.
Strong OCR Capability: Advanced OCR capabilities with support for images of any aspect ratio and up to 1.8 million pixels.
Multilingual Support: The models support over 30 languages, including but not limited to English, Chinese, German, French, Italian, and Korean.
Efficient Deployment: Models are optimized for efficient deployment on end-side devices with CPU and GPU optimizations.
Fine-tuning Support: The models support fine-tuning with frameworks like Hugging Face and SWIFT.
Inference with vLLM: vLLM now officially supports MiniCPM-V 2.6, MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.0

Evaluation

enter image description here

Examples

enter image description here

Know more about MiniCPM-V 2.6

What is MiniCPM-V 2.6

MiniCPM-V 2.6 is an advanced model in the MiniCPM-V series, optimized for real-time performance. With 8 billion parameters, it efficiently interprets complex multimodal data, making it ideal for interactive applications. This model empowers developers and researchers to harness AI for dynamic content generation and sophisticated analysis, unlocking new possibilities in artificial intelligence.

Inputs and Outputs of MiniCPM-V 2.6

Inputs

Single Image: Up to 1.8 million pixels.
Multiple Images: Engages in conversation and reasoning across multiple images.
Video: Accepts video inputs and generates dense captions for spatiotemporal information.

Outputs

Text: Generate coherent and relevant text responses.
Image/Visual Understanding: Provides detailed titles, analysis, and insights about visual content.

MiniCPM-V 2.6: A Powerful Upgrade Over Previous Models

The enhancements in MiniCPM-V 2.6 are not just incremental but transformative.It introduces advanced OCR capabilities and expanded multilingual support, which are significant upgrades from previous versions like MiniCPM-Llama3 and MiniCPM-Llama3-V 2.5.The model's efficiency in edge deployment has also seen considerable improvements, making it a more accessible and practical solution for various applications.

How to Use MiniCPM-V 2.6

Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10：

1Pillow==10.1.0
2torch==2.1.2
3torchvision==0.16.2
4transformers==4.40.0
5sentencepiece==0.1.99
6decord

In-context few-shot learning

1import torch
2from PIL import Image
3from transformers import AutoModel, AutoTokenizer
4
5model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True,
6    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
7model = model.eval().cuda()
8tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
9
10question = "production date" 
11image1 = Image.open('example1.jpg').convert('RGB')
12answer1 = "2023.08.04"
13image2 = Image.open('example2.jpg').convert('RGB')
14answer2 = "2007.04.24"
15image_test = Image.open('test.jpg').convert('RGB')
16
17msgs = [
18    {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
19    {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
20    {'role': 'user', 'content': [image_test, question]}
21]
22
23answer = model.chat(
24    image=None,
25    msgs=msgs,
26    tokenizer=tokenizer
27)
28print(answer)

Watch the complete installation video, if you are interested.

MiniCPM-V 2.6: GPU Cloud vs. Local Deployment Showdown

Performance

Local deployment of MiniCPM-V 2.6 is limited by hardware, potentially leading to underperformance. In contrast, GPU cloud deployment uses high-performance GPUs for faster processing and better efficiency.

Scalability

Local deployment of MiniCPM-V 2.6 requires significant investment and time to scale, whereas GPU cloud deployment easily adjusts resources based on project needs, providing greater flexibility.

Cost Efficiency

Local deployment of MiniCPM-V 2.6 comes with higher initial setup costs and potential ongoing maintenance fees, whereas GPU cloud deployment features pay-as-you-go pricing, ensuring you only pay for what you use, making it more cost-effective.

Setup and Maintenance

Local deployment of MiniCPM-V 2.6 requires manual installation and ongoing maintenance, while GPU cloud deployment offers quick setup with automatic updates, significantly minimizing workload.

Running MiniCPM-V 2.6 on Novita AI

You can run MiniCPM-V 2.6 on Novita AI with a pre-configured image for easy setup—no installation needed. Just select the model image and start working right away. Novita AI ensures optimized performance and scalability, letting you focus on your AI projects without the hassle. Try it today for seamless integration and efficiency!

Build Your Own MiniCPM-V 2.6 Template

If the template you need isn’t available, you can create your own MiniCPM-V 2.6 template on Novita AI. Our platform allows full customization to suit your specific project requirements. enter image description here

Request Custom Templates on Discord

You can explore more templates in the Novita AI Template Catalogue. If you can’t find what you’re looking for in the marketplace, feel free to join our Discord community and submit a request—we’ll review it and consider adding it in future updates. enter image description here

Common MiniCPM-V 2.6 Errors and Fixes

Memory Issues

If the model crashes due to insufficient memory, consider using GPU cloud services or running the model in lower precision (FP16) to reduce memory usage.

Slow Performance

To combat slow processing or latency, ensure you are using GPUs, update your hardware drivers, or allocate more resources in a cloud environment for better performance.

Fine-Tuning Problems

If fine-tuning fails, check your dataset formatting and ensure correct training parameters. Adjust settings like learning rate and batch size for optimal results.

Frequently Asked Questions

What type of inputs do MiniCPM-V models accept?

The models accept image, video, and text inputs.

Does MiniCPM-V support multilingual capabilities?

Yes, MiniCPM-V supports over 30 languages, extending its capabilities beyond just English and Chinese.

What is the significance of the token density mentioned in MiniCPM-V 2.6?

The superior token density in MiniCPM-V 2.6 allows for efficient encoding of high-resolution images into fewer tokens, improving inference speed, first-token latency, memory usage, and power consumption.

How can MiniCPM-V models be deployed on different devices?

The models can be deployed on various devices, including those with NVIDIA GPUs, CPUs, and even mobile phones with Android operating systems.

What fine-tuning options are available for MiniCPM-V models?

MiniCPM-V models support simple fine-tuning with Hugging Face and more advanced fine-tuning with the SWIFT framework.

Are there any open-source versions of MiniCPM-V models?

Yes, certain versions of MiniCPM-V, such as MiniCPM-V 2.6 and MiniCPM-Llama3-V 2.5, have been open-sourced.

What types of licenses does MiniCPM-V operate under?

The repository is released under the Apache-2.0 License, and the usage of MiniCPM-V model weights must follow the MiniCPM Model License.

Lisence

The code in this repo is released under the Apache-2.0 License.
The usage of MiniCPM-V series model weights must strictly follow MiniCPM Model License.md.
The models and weights of MiniCPM are completely free for academic research. After filling out a "questionnaire" for registration, MiniCPM-V 2.6 weights are also available for free commercial use.

View on Hugging Face

Source site: https://huggingface.co/openbmb/MiniCPM-V-2_6

Get in Touch:

Email: iris@novita.ai
Discord: novita.ai

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Other Recommended Templates

Meta Llama 3.1 8B

Accelerate AI Innovation with Meta Llama 3.1 8B Instruct, Powered by Novita AI

Kohya-SS

Unleash the Power of Kohya-SS with Novita AI

stable-diffusion-3-medium

Transform Creativity with Stable Diffusion 3 Medium on Novita AI

Qwen2-Audio-7B-Instruct

Empower Your Audio with Qwen2 on Novita AI

Llama3.1-8B

Run Llama3.1-8B with SGlang on Novita AI

Ready to build smarter? Start today.

Get started with Novita AI and unlock the power of affordable, reliable, and scalable AI inference for your applications.

Get Started