Qwen2-Audio-7B-Instruct

Empower Your Audio with Qwen2 on Novita AI

Model Text to Text
Qwen2024-10-09Hugging Face

One click deployment

On Demand
gpu hot
README

Run Qwen2-Audio-7B-Instruct on Novita AI

What is Qwen2-Audio-7B-Instruct

Qwen2-Audio-7B-Instruct is a large audio-language model from the Qwen series, designed to handle various audio signal inputs and perform tasks such as audio analysis or provide textual responses to speech instructions. It supports two interaction modes: voice chat and audio analysis.

Key Features

  • Voice chat mode for free-form voice interactions without text input.

  • Audio analysis mode for providing audio and text instructions for analysis.

  • Pretrained model and chat model versions available.

  • ChatML format for dialog interaction.

  • Batch inference support for processing multiple conversations simultaneously.

Why use Qwen2-Audio?

Audio serves as a crucial medium for interaction and communication among humans and other living beings, carrying rich information content. A comprehensive understanding of various forms of audio signals is paramount to achieving Artificial General Intelligence (AGI).

According to the evaluation results from AIR-Bench, Qwen2-Audio outperformed previous SOTAs, such as Gemini-1.5-pro, in tests focused on audio-centric instruction-following capabilities. Qwen2-Audio is open-sourced with the aim of fostering the advancement of the multi-modal language community.

enter image description here

How to deploy Qwen2-Audio-7B-Instruct

The deployment steps for Qwen2-Audio-7B-Instruct are as follows:

step1: Ensure the latest Hugging Face transformers are installed.

step2: Use the provided Python code to load the processor and model from the Qwen repository.

step3: Prepare the conversation data in the specified format.

step4: Process the audio files and text using the AutoProcessor.

step5: Generate responses using the model with the processed inputs.

If you're interested, you can watch the full video on how to install Qwen2-Audio locally

How to Use Audio Analysis Inference with Qwen2-Audio-7B-Instruct?

In audio analysis, users could provide both audio and text instructions for analysis:

1from io import BytesIO 2from urllib.request import urlopen 3import librosa 4from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor 5 6processor = AutoProcessor.from_pretrained("Qwen/Qwen2-Audio-7B-Instruct") 7model = Qwen2AudioForConditionalGeneration.from_pretrained("Qwen/Qwen2-Audio-7B-Instruct", device_map="auto") 8 9conversation = [ 10 {'role': 'system', 'content': 'You are a helpful assistant.'}, 11 {"role": "user", "content": [ 12 {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/glass-breaking-151256.mp3"}, 13 {"type": "text", "text": "What's that sound?"}, 14 ]}, 15 {"role": "assistant", "content": "It is the sound of glass shattering."}, 16 {"role": "user", "content": [ 17 {"type": "text", "text": "What can you do when you hear that?"}, 18 ]}, 19 {"role": "assistant", "content": "Stay alert and cautious, and check if anyone is hurt or if there is any damage to property."}, 20 {"role": "user", "content": [ 21 {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac"}, 22 {"type": "text", "text": "What does the person say?"}, 23 ]}, 24] 25text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False) 26audios = [] 27for message in conversation: 28 if isinstance(message["content"], list): 29 for ele in message["content"]: 30 if ele["type"] == "audio": 31 audios.append( 32 librosa.load( 33 BytesIO(urlopen(ele['audio_url']).read()), 34 sr=processor.feature_extractor.sampling_rate)[0] 35 ) 36 37inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True) 38inputs.input_ids = inputs.input_ids.to("cuda") 39 40generate_ids = model.generate(**inputs, max_length=256) 41generate_ids = generate_ids[:, inputs.input_ids.size(1):] 42 43response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

If you would like to see the Inference code for other features, you can view it here

How to Vocode audio in Audacity?

  • Import Tracks: Your first step is to import the tracks you'll use. The voice track serves as your modulator, and the synthesized sound acts as your carrier.

  • Mono Conversion: Ensure your voice track is mono for clarity in the vocoding process. You can easily convert stereo tracks to mono in Audacity.

  • Combine Tracks: Merge your voice and synthesized sound into a stereo track, setting the stage for the vocoding magic to happen.

  • Apply the Vocoder Effect: Navigate through Audacity's effects menu to find the vocoder. Adjust its settings to match the sound you're aiming for. This step is where your sound begins to transform, taking on new qualities as the vocoder works its magic.

You can see a detailed video on how to use the Audacity Vocoder here

Run Qwen2-Audio-7B-Instruct on Novita AI

Novita AI offers a ready-to-use Template for Qwen2-Audio-7B-Instruct, eliminating the need for manual deployment. It can be started with a single click.

How to run the Qwen2-Audio-7B-Instruct with Novita AI Templates

Get started with your own Qwen2-Audio-7B-Instruct template on Novita AI with ease. If our pre-existing templates don’t quite fit your project's needs, our platform empowers you to create a fully customized template. With our versatile tools and extensive customization options, you can tweak every aspect of the template to make sure it aligns seamlessly with your project goals. Start designing your perfect template on Novita AI today to boost the efficiency and effectiveness of your project. enter image description here

You can explore the extensive Novita AI Template Catalogue for a range of template options. If you don't find what you're seeking in the catalog, we invite you to join our Discord community and submit a request for a custom template. Each submission will be carefully evaluated by our team, and if it aligns with our standards and user needs, we will consider incorporating it into future catalogue updates.

enter image description here

How to Use Qwen2-Audio-7B-Instruct in the Novita AI LLM API

Novita AI offers developers access to the Qwen2-Audio-7B-Instruct API. Detailed guides and documentation can be found on our Docs page. You can also refer to the LLM API reference available here. enter image description here Experience the capabilities of this model firsthand in our LLM playground. Upon signing up, you'll receive free vouchers which you can use to select this model and input prompts for testing. enter image description here

Some Limitations of the Qwen2-Audio-7B-Instruct

The model is not perfect, and it has some limitations. Let’s take a closer look at some of them.

  • Limited Context Understanding The model may not fully grasp the context of conversations, leading to potentially mismatched responses. User oversight is recommended for accuracy.

  • Dependence on Audio Quality Performance heavily relies on the clarity of the audio input. Noisy or unclear audio can impair the model's effectiveness, necessitating high-quality inputs for best results.

  • Limited Domain Knowledge The model's expertise varies across domains, with potential inaccuracies or gaps in specialized areas. Users should verify critical information independently.

  • Potential for Misinterpretation There is a risk of misinterpreting audio inputs, which can result in incorrect responses. Careful review and confirmation of outputs are advised.

Frequently Asked Questions

What are the two distinct audio interaction modes of Qwen2-Audio?

The two modes are voice chat and audio analysis.

How can I start using Qwen2-Audio-7B-Instruct for inference?

You can start using it by following the Quickstart guide in the README, which demonstrates how to set up and use the model for both voice chat and audio analysis.

What is the purpose of the ChatML format in Qwen2-Audio-7B-Instruct?

The ChatML format is used for structuring dialog interactions with the model.

How can I handle multiple conversations at once with Qwen2-Audio-7B-Instruct?

Batch inference is supported, allowing you to process multiple conversations simultaneously using the provided code.

What should I do if I encounter a KeyError when trying to use Qwen2-Audio?

If you encounter a KeyError, it is recommended to build the latest Hugging Face transformers from source using the provided pip command.

License

View on Huggingface

Source site: https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct

Boost AI Projects with Novita AI and GPU Cloud Solutions

Empower your AI projects by teaming up with Novita AI. Our advanced GPU cloud platform supports the creation of tailored templates, streamlines your workflow, and increases your project’s visibility through effective promotional strategies. Partner with us to harness your creative potential and advance your success in the AI space!

enter image description here

Get in Touch:

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Other Recommended Templates

Ollama Open WebUI

Streamline Your AI Workflows with Ollama Open WebUI

View more

Meta Llama 3.1 8B Instruct

Accelerate AI Innovation with Meta Llama 3.1 8B Instruct, Powered by Novita AI

View more

MiniCPM-V-2_6

Empower Your Applications with MiniCPM-V 2.6 on Novita AI.

View more

kohya-ss

Unleash the Power of Kohya-ss with Novita AI

View more

stable-diffusion-3-medium

Transform Creativity with Stable Diffusion 3 Medium on Novita AI

View more
discordJoin Our Community

Join Discord to connect with other users and share your experiences. Provide feedback on any issues, and suggest new templates you'd like to see added.

Join Discord