Search results for "AUDIO"

Kimi releases a brand new universal audio foundation model Kimi-Audio

Jin10 data reported on April 26th, today, Kimi released a new Open Source project - the brand new general audio foundation model Kimi-Audio. According to the introduction, this model supports various tasks such as speech recognition, audio understanding, audio to text, and voice dialogue.
More

Alitongyi Open Source audio language model Qwen2-Audio, related paper selected for top conference ACL 2024

Jinshi data, August 13 news, Ali Tongyi's large model continues to be Open Source, and the Qwen2 series Open Source family has added the audio language model Qwen2-Audio. Qwen2-Audio can directly perform voice Q&A without the need for text input, understand and analyze the audio signals input by users, including human voice, natural sound, music, etc. The model has significantly surpassed the previous best models in multiple authoritative evaluations. Tongyi team also simultaneously released a new trap audio understanding model evaluation Benchmark, and the related paper has been selected for the international top conference ACL2024 being held this week.
More
  • 3

Perfect World Games and NVIDIA continue to explore the application of AI in gaming scenarios

According to the latest news from Perfect World Games' official WeChat, in the early morning of March 19, Beijing time, the NVIDIA AI Conference (NVIDIA GTC 2024) was held at the SAP Center in San Jose, California, USA. NVIDIA CEO Jensen Huang spoke on the topic of "Witnessing AI's Transformative Moment" and shared how NVIDIA's accelerated computing platform is driving the next wave of AI, digital twins, cloud technologies, and sustainable computing. GTC also announced that Perfect World Games' Xianxia MMORPG terminal game "Zhuxian World" has officially connected to NVIDIA's Audio2Face technology (generative AI easily converts audio into animation technology), and used this conference to show the global audience the results of the combination with "Zhuxian World", and the two sides will continue to maintain close exchanges and cooperation in multiple fields and scenarios of AI in the future.
More

Meta announced the audio2photoreal AI framework, which can generate character dialogue scenes by inputting dubbing files

Meta recently unveiled an AI framework called audio2photoreal, which is capable of generating a series of realistic NPC character models and automatically "lip-syncing" and "posing" the character models with the help of existing voice-over files. The official research report pointed out that after receiving the dubbing file, the Audio2 photoreal framework will first generate a series of NPC models, and then use quantization technology and diffusion algorithm to generate model actions, in which quantization technology provides action sample reference for the framework and diffusion Algorithm is used to improve the effect of character actions generated by the frame. Forty-three percent of the evaluators in the controlled experiment were "strongly satisfied" with the character dialogue scenes generated by the frame, so the researchers felt that the Audio2 photoreal framework was able to generate "more dynamic and expressive" movements than competing products in the industry. It is reported that the research team has now made the relevant code and dataset public on GitHub.
More
  • 1