Run GLM-Image on Novita

What is GLM-Image

GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture, effectively pushing the upper bound of visual fidelity and fine-grained details. In general image generation quality, it aligns with industry-standard LDM-based approaches, while demonstrating significant advantages in knowledge-intensive image generation scenarios.

You can check the details on the official website of the project.

Run GLM-Image on Novita

Step 1: Console Entry
Launch the GPU interface and select Get Started to access deployment management.
Step 2: Package Selection
Locate GLM-image in the template repository and begin installation sequence.
Step 3: Infrastructure Setup
Configure computing parameters including memory allocation, storage requirements, and network settings. Select Deploy to implement.
Step 4: Review and Create
Double-check your configuration details and cost summary. When satisfied, click Deploy to start the creation process.
Step 5: Wait for Creation
After initiating deployment, the system will automatically redirect you to the instance management page. Your instance will be created in the background.
Step 6: Monitor Download Progress
Track the image download progress in real-time. Your instance status will change from Pulling to Running once deployment is complete. You can view detailed progress by clicking the arrow icon next to your instance name. Step 7: Verify Instance Status
Click the Logs button to view instance logs and confirm that the InvokeAI service has started properly.

How to get started

example text2image.py

1import torch                                                                                                                                                                  
2from diffusers.pipelines.glm_image import GlmImagePipeline                                                                                                                                                                                                                                                                                                                                                
3pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image", torch_dtype=torch.bfloat16, device_map="cuda")                                                                                                       
4prompt = "A beautifully designed modern food magazine style dessert recipe illustration, themed around a raspberry mousse cake. The overall layout is clean and bright, divided into four main areas: the top left features a bold black title 'Raspberry Mousse Cake Recipe Guide', with a soft-lit close-up photo of the finished cake on the right, showcasing a light pink cake adorned with fresh raspberries and mint leaves; the bottom left contains an ingredient list section, titled 'Ingredients' in a simple font, listing 'Flour 150g', 'Eggs 3', 'Sugar 120g', 'Raspberry puree 200g', 'Gelatin sheets 10g', 'Whipping cream 300ml', and 'Fresh raspberries', each accompanied by minimalist line icons (like a flour bag, eggs, sugar jar, etc.); the bottom right displays four equally sized step boxes, each containing high-definition macro photos and corresponding instructions, arranged from top to bottom as follows: Step 1 shows a whisk whipping white foam (with the instruction 'Whip egg whites to stiff peaks'), Step 2 shows a red-and-white mixture being folded with a spatula (with the instruction 'Gently fold in the puree and batter'), Step 3 shows pink liquid being poured into a round mold (with the instruction 'Pour into mold and chill for 4 hours'), Step 4 shows the finished cake decorated with raspberries and mint leaves (with the instruction 'Decorate with raspberries and mint'); a light brown information bar runs along the bottom edge, with icons on the left representing 'Preparation time: 30 minutes', 'Cooking time: 20 minutes', and 'Servings: 8'. The overall color scheme is dominated by creamy white and light pink, with a subtle paper texture in the background, featuring compact and orderly text and image layout with clear information hierarchy."                                                                                                                           
5image = pipe(                                                                                                                                                                                
6prompt=prompt,                                                                                                                                                                   
7height=32 * 32,                                                                                                                                                                             
8width=36 * 32,                                                                                                                                                                      
9num_inference_steps=30,                                                                                                                                                       
10guidance_scale=1.5,                                                                                                                                                                               
11generator=torch.Generator(device="cuda").manual_seed(42),                                                                                                                                                     
12).images[0]                                                                                                                                                                                                                                                                  
13image.save("output_t2i.png")

You can modify the prompt in text2image.py to run or use existing examples to run directly.

python3 text2image

1$ python3 text2image.py                                                                                                                                                             
2Couldn't connect to the Hub: Cannot reach https://huggingface.co/api/models/zai-org/GLM-Image: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable..                   
3Will try to load from local cache.                                                                                                                                                                                
4Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.47it/s]
5Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 1391.52it/s, Materializing param=shared.weight]
6Loading pipeline components...:  71%|██████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 5/7 [00:02<00:00,  2.91it/s]Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.                                                                                                 
7Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1011/1011 [00:02<00:00, 359.59it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
8Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.02s/it]
9100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:11<00:00,  2.69it/s]

output_t2i.png

example image2image.py

1import torch                                                                                                            
2from diffusers.pipelines.glm_image import GlmImagePipeline                                                              
3from PIL import Image                                                                                                   
4                                                                                                                        
5pipe = GlmImagePipeline.from_pretrained("zai-org/GLM-Image", torch_dtype=torch.bfloat16, device_map="cuda")             
6image_path = "cond.jpg"                                                                                                 
7prompt = "Replace the background of the snow forest with an underground station featuring an automatic escalator."      
8image = Image.open(image_path).convert("RGB")                                                                           
9image = pipe(                                                                                                           
10    prompt=prompt,                                                                                                      
11    image=[image],  # can input multiple images for multi-image-to-image generation such as [image, image1]             
12    height=33 * 32,                                                                                                     
13    width=32 * 32,                                                                                                      
14    num_inference_steps=30,                                                                                             
15    guidance_scale=1.5,                                                                                                 
16    generator=torch.Generator(device="cuda").manual_seed(42),                                                           
17).images[0]                                                                                                             
18                                                                                                                        
19image.save("output_i2i.png")

You can modify the prompt and image in text2image.py to run or use existing examples to run directly.

1$ python3 image2image.py         
2
3Couldn't connect to the Hub: Cannot reach https://huggingface.co/api/models/zai-org/GLM-Image: offline mode is enabled. To disable it, please unset the `HF_HUB_OFFLINE` environment variable..                   
4Will try to load from local cache.                                                                                                                                                                                
5Loading pipeline components...:   0%|                                                                                                                                                       | 0/7 [00:00<?, ?it/s]Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.                                                                                                 
6Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1011/1011 [00:02<00:00, 360.88it/s, Materializing param=model.vqmodel.quantize.embedding.weight]
7Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.46it/s]
8Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00<00:00, 1426.62it/s, Materializing param=shared.weight]
9Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:07<00:00,  1.03s/it]
10100%|███████████████████████████████████████████████████████████████████████████████████| 30/30 [00:10<00:00,  2.97it/s]