According to foreign media reports, OpenAI announced the launch of a new flagship generative AI model GPT-4o (“o” stands for “omni”, which is the model’s ability to process text, voice and video).
According to the demonstration video, it can engage in near-real-time voice conversations with users, showing humanoid personality and behavior.
, (Image source: OpenAI),GPT-4o provides “GPT-4 level” intelligence, but faster and improves GPT-4’s ability to span multiple modes and media.
Mira Murati, chief technology officer of OpenAI, said: “GPT-4o can reason across speech, text and vision.
This is very important because we are studying the future of human-machine interactions.
“, GPT-4o greatly improves the experience of OpenAI’s artificial intelligence chat robot ChatGPT.
The platform has long provided voice mode, using a text-to-speech model to transcribe chatbot responses, but GPT-4o enhances this to allow users to interact with ChatGPT, which is more assistant-like.
, For example, a user can ask a question to ChatGPT, which is supported by GPT-4o, and interrupt ChatGPT when it answers.
OpenAI said the model provides “real-time” responsiveness and can even identify nuances in a user’s voice to generate “a range of different emotional styles, including singing.
” GPT-4o also upgrades ChatGPT’s visual capabilities.
Now, by showing it a photo or desktop screen, ChatGPT can quickly answer questions on topics ranging from “What happened in this piece of software code?” To “What brand of shirt does this person wear?” Murati said these functions will be further developed in the future.
Currently, GPT-4o can view menu pictures in different languages and translate them.
For example, in the future, the model could support ChatGPT to “watch” live sports games and explain the rules to users.
, Murati said: “These models are becoming increasingly complex, but we want the actual interactive experience to become more natural and easy, and that customers don’t have to focus on the user interface, but only on collaboration with ChatGPT.
Over the past few years, we have been very focused on improving the intelligence of these models, and this is the first time we have really taken a big step in ease of use.
“, In order to make advanced artificial intelligence easier to obtain and use on a global scale, GPT-4o’s language capabilities have been improved in terms of quality and speed.
ChatGPT now supports more than 50 languages (opened in a new window), including registration, login, and user settings.
, OpenAI plans to first provide support for GPT-4o’s new audio capabilities to “a small group of trusted partners” in the coming weeks.
, Return to the first electric network home page>,.