
Run GLM-4.7-Flash on Novita
What is GLM-4.7-Flash
GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
You can check the details on the official website of the project.
Run GLM-4.7-Flash on Novita
Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
Step 2: Package Selection
Locate GLM-4.7-Flash in the template repository and begin installation sequence.
Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name.
Step 7: Verify Instance Status
Click the Logs button to view instance logs and confirm that the InvokeAI service has started properly.
Step 8: Environmental Access
Launch development space through Connect interface, then initialize Start Web Terminal.
How to start
Demo
1curl --location --request POST 'http://127.0.0.1:8000/v1/chat/completions' \ 2> --header 'Content-Type: application/json' \ 3> --header 'Accept: */*' \ 4> --header 'Connection: keep-alive' \ 5> --data-raw '{ 6> "model": "zai-org/GLM-4.7-Flash", 7> "messages": [ 8> { 9> "role": "system", 10> "content": "you are a helpful assitant." 11> }, 12> { 13> "role": "user", 14> "content": "hello" 15> } 16> ], 17> "max_tokens": 20, 18> "stream": false 19> }' 20{"id":"chatcmpl-943f20f1c3a690ba","object":"chat.completion","created":1768823899,"model":"zai-org/GLM-4.7-Flash","choices":[{"index":0,"message":{"role":"assistant","content":"1. **Analyze the Input:** The user said \"hello\".\n2. **Ident","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":14,"total_tokens":34,"completion_tokens":20,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}