Model Library/ERNIE 4.5 VL 28B A3B
Wenxin

ERNIE 4.5 VL 28B A3B

baidu/ernie-4.5-vl-28b-a3b
The ERNIE 4.5 series of open-source models adopts a Mixture-of-Experts (MoE) architecture, representing an innovative multimodal heterogeneous model structure. It achieves cross-modal knowledge fusion through a parameter-sharing mechanism while retaining dedicated parameter spaces for individual modalities. This architecture is particularly well-suited for the continuous pre-training paradigm from large language models to multimodal models, significantly enhancing multimodal understanding capabilities while maintaining or even improving performance in text-based tasks. The models are efficiently trained, inferred, and deployed using the PaddlePaddle deep learning framework. During the pre-training of large language models, the Model FLOPs Utilization (MFU) reaches 47%. Experimental results demonstrate that this series of models achieves state-of-the-art (SOTA) performance across multiple text and multimodal benchmarks, with particularly outstanding results in instruction following, world knowledge memorizatio

Features

Serverless API

Docs

baidu/ernie-4.5-vl-28b-a3b is available via Novita's serverless API, where you pay per token. There are several ways to call the API, including OpenAI-compatible endpoints with exceptional reasoning performance.

Available Serverless

Run queries immediately, pay only for usage

Input$0.14 / M Tokens
Output$0.56 / M Tokens

Use the following code examples to integrate with our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.novita.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="baidu/ernie-4.5-vl-28b-a3b",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=8000,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Info

Provider
BAIDU
Quantization
fp16

Supported Functionality

Context Length
30000
Max Output
8000
Serverless
Supported
Function Calling
Supported
Reasoning
Supported
Input Capabilities
text, image
Output Capabilities
text

Everything you need to build production AI.

200+ models, on-demand GPUs, and secure agent runtimes — unified under one API. Free to start, scales as you grow.