Run GLM-4.7-Flash on Novita

What is GLM-4.7-Flash

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

You can check the details on the official website of the project.

Run GLM-4.7-Flash on Novita

Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
enter image description here Step 2: Package Selection
Locate GLM-4.7-Flash in the template repository and begin installation sequence.
Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
enter image description here Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
enter image description here Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name. Step 7: Verify Instance Status
Click the Logs button to view instance logs and confirm that the InvokeAI service has started properly.
enter image description here Step 8: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.

How to start

Demo

1curl --location --request POST 'http://127.0.0.1:8000/v1/chat/completions' \
2> --header 'Content-Type: application/json' \
3> --header 'Accept: */*' \
4> --header 'Connection: keep-alive' \
5> --data-raw '{
6>     "model": "zai-org/GLM-4.7-Flash",
7>     "messages": [
8>         {
9>             "role": "system",
10>             "content": "you are a helpful assitant."
11>         },
12>         {
13>             "role": "user",
14>             "content": "hello"
15>         }
16>     ],
17>     "max_tokens": 20,
18>     "stream": false
19> }'
20{"id":"chatcmpl-943f20f1c3a690ba","object":"chat.completion","created":1768823899,"model":"zai-org/GLM-4.7-Flash","choices":[{"index":0,"message":{"role":"assistant","content":"1.  **Analyze the Input:** The user said \"hello\".\n2.  **Ident","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":14,"total_tokens":34,"completion_tokens":20,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}