Back to Templates Library
Can't be used normally? Clickhereto let us know.
GLM-4.7-Flash
Favorite
Copy Link
Official template
novitalabs/glm-4.7-flash:v0.02
Updated time: 27 Jan 2026
README
Configuration

Run GLM-4.7-Flash on Novita

What is GLM-4.7-Flash

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

You can check the details on the official website of the project.

Run GLM-4.7-Flash on Novita

Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
enter image description here Step 2: Package Selection
Locate GLM-4.7-Flash in the template repository and begin installation sequence.
enter image description here Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
enter image description here Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
enter image description here Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
enter image description here Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name. enter image description here Step 7: Verify Instance Status
Click the Logs button to view instance logs and confirm that the InvokeAI service has started properly.
enter image description here Step 8: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.
enter image description here

How to start

Demo

1curl --location --request POST 'http://127.0.0.1:8000/v1/chat/completions' \ 2> --header 'Content-Type: application/json' \ 3> --header 'Accept: */*' \ 4> --header 'Connection: keep-alive' \ 5> --data-raw '{ 6> "model": "zai-org/GLM-4.7-Flash", 7> "messages": [ 8> { 9> "role": "system", 10> "content": "you are a helpful assitant." 11> }, 12> { 13> "role": "user", 14> "content": "hello" 15> } 16> ], 17> "max_tokens": 20, 18> "stream": false 19> }' 20{"id":"chatcmpl-943f20f1c3a690ba","object":"chat.completion","created":1768823899,"model":"zai-org/GLM-4.7-Flash","choices":[{"index":0,"message":{"role":"assistant","content":"1. **Analyze the Input:** The user said \"hello\".\n2. **Ident","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":14,"total_tokens":34,"completion_tokens":20,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}