Multi-modal large models lead the people-oriented interactive innovation of smart cars, and Shangtang Jueying debuted at WAIC 2024

The 2024 World Conference on artificial Intelligence and High-level Conference on Global Governance of artificial Intelligence (WAIC 2024) was held in Shanghai from July 4 to July 7.

Shangtang Jieying unveiled a number of intelligent driving and intelligent cockpit products based on the newly released Shang Tang “Rixin 5.

5″ native multimodal large model at this year’s WAIC, leading the “people-oriented” intelligent car interactive innovation.

As a strategic partner to accelerate smart cars into the AGI era, Shangtang Jueyu showed an interpretable and interactive self-driving model DriveAGI, and also released the industry’s first vehicle-generated interface, such as “freewheeling interface” (FlexInterface), “random manipulation” (AgentFlow) and other on-board AI Agent applications.

In addition, the Shangtang Shadow self-driving minibus was also unveiled at this WAIC and became the only L4 self-driving minibus to undertake the connection task.

At the artificial intelligence forum of “Great Love to New Power” held by WAIC 2024 strategic partner Shangtang Science and Technology on July 5th, Shangtang Science and Technology released China’s first WYSIWYG model “Rixin 5o”, which is a real-time streaming multimodal interactive experience calibration GPT-4o, demonstrating the powerful strength of Shangtang “Rixin 5o” model with hybrid end-cloud collaborative expert architecture.

Wang Xiaogang, co-founder and chief scientist of Shangtang Technology and president of Shadow Intelligent car Group, said, “the original multimodal large model is the key to open the door of AGI.

Shangtang Shadow is stimulating the creativity of AGI, promoting the deep integration of multimodal large models and smart cars, creating a series of brand-new on-board intelligent products, and accelerating the evolution of smart vehicles to super-agents.

” Lead a “people-oriented” intelligent car interaction change.

” Wang Xiaogang shares the latest technology and product progress of Shangtang in the forum of “Great Love without Frontier to Xinli”.

It is really “people-oriented”.

The multimodal model leads the innovation of intelligent car interaction.

the multimodal model can efficiently and deeply integrate various modes such as voice, text, image, gesture and video, providing a more rich and natural human-computer interaction experience.

In the past, when dealing with different modal information, many models first converted the input such as voice into text, the combination of text and image for analysis, and the output feedback was also converted into text, which was regenerated into speech output according to the text.

There will be a lot of information loss and high delay.

The new Shangtang multimodal large model supported by “Rixin 5.

5″ system is an end-to-end model, that is, text, voice, video and other modes are input together, and the model outputs the information of the corresponding modes after unified processing.

Compared with the previous schemes, the technical difficulty of multimodal fusion is the improvement of geometric multiples.

The technical difficulty is the direct embodiment of the leading native multimodal capability of Shangtang industry.

Ri Rixin 5.

0, released in April this year, is the first domestic large-scale model of benchmarking GPT-4 Turbo in China, and its system has been upgraded in an all-round way, with obvious enhancement of mathematical reasoning, English ability and instruction following ability, interactive effect and a number of core indicators to achieve benchmarking GPT-4o.

The previous release of GPT-4o showed consumers the way of multimodal real-time interaction, made more people appreciate the charm of multimodal perception and interaction, and began to help release the commercial landing imagination space of multimodal large models.

Compared with mobile phones, smart cars are more suitable for landing scenes carrying large multimodal models.

Because all kinds of cameras inside and outside the smart car are often on, users can interact with the car in real time in a multimodal way.

At the same time, the number of smart cars is increasing, which can generate rich end-user feedback and data information, which makes the model grow iteratively.

The combination of these factors shows an exciting direction for the development of smart cars in the future: from smart cars to super-agents, multimodal large models are the core driving force of this process.

Compared with OpenAI and other companies, Shangtang Jueyu is the core supplier of smart cars, has rich mass production experience in the field of intelligent driving and intelligent cockpit, and will accelerate the “people-oriented” interactive innovation of intelligent vehicles with multi-modal large models as the core.

The human-computer interaction of intelligent cars is changing from “car-centered” to “human-centered”.

In this transformation, at this stage, users still need to use text or voice to provide information and data to smart vehicles to obtain passive services, other information has been lost, has not really reached the active service users.

Shangtang Jieying is building a real “people-centered” intelligent car interaction with a large multimodal model, which covers the cockpit and the surrounding environment of the car, so that the information about “people” will not be ignored.

it even breaks through the limitations of space to connect cabin users with the broader physical and digital world.

Shangtang Shadow is the first in the industry to realize the vehicle-end deployment of native multimodal large models.

The performance of Shangtang’s end-side 8B multimodal model is industry-leading, and the ability of vehicle-end model deployment is an indispensable technical guarantee for intelligent vehicle interactive innovation.

Shangtang Jueyu can flexibly deploy multimodal large models in full-stack ways such as cloud side, end-cloud combination and end-side, so that Shangtang’s native multimodal capabilities can quickly land smart cars.

At this year’s WAIC, Shangtang Jueyu took the lead in realizing the vehicle-side deployment of native multimodal large models in the industry, and demonstrated the adaptability of running 2.

1B or 8B end-to-side multimodal large models on three different computing platforms.

Compared with the cloud deployment scheme with a few seconds delay, the Shangtang Shadow vehicle end-to-side 8B multimodal model can achieve a first packet delay of less than 300ms and a reasoning speed of 40Tokens/ seconds, providing an escort for “people-oriented” intelligent vehicle interactive innovation.

DriveAGI is interpretable, interactive and multimodal so that end-to-end intelligent driving is safe and reliable.

at the end of 2022, Shangtang and its joint laboratory put forward the industry’s first general model for integrated perception and decision-making self-driving, and won the best paper at the 2023 International Conference on computer Vision and pattern recognition (CVPR) the following year.

Shangtang Shadow demonstrated UniAD’s achievements on the road at this year’s Beijing Auto Show and continues to lead the innovative trend of end-to-end self-driving.

Since the Beijing Auto Show, UniAD has moved forward steadily.

Through continuous data collection, truth production, model training and real car testing, the stability of the UniAD system has been greatly enhanced, and the experience continuity and comfort have been continuously improved.

At this year’s WAIC, Shangtang Shadow showed UniAD with only seven cameras, realizing a real car demonstration of complex urban roads, rural roads and other scenes without a picture, end to end.

The intelligent driving model continues to evolve iteratively.

UniAD has significantly improved the driving ability of the intelligent driving system, but the pure end-to-end self-driving model is not the final answer to self-driving.

The ability to perceive, reason, make decisions and interact with the open world will be an important sign of smart cars moving towards super-agents.

Therefore, Shangtang pioneered the development of the first intelligent driving model applied to driving decision planning, that is, the DriveAGI based on the multimodal large model, which makes the end-to-end intelligent driving interpretable and interactive.

DriveAGI enhances the interpretability of the end-to-end system, which not only enables vehicles to understand the complex real world like human beings, gain insight into the behavior motives of all kinds of traffic participants, quickly learn various traffic rules, grasp rapidly changing road information, but also explain the reasoning process of driving decisions to users.

At present, the Shangtang Wisdom DriveAGI smart driving model can safely and smoothly pass through the narrow passage formed by two stone piers on the infinitely wide marked road.

it can also accurately identify and understand all kinds of traffic signs, including bus lane, tidal lane and construction lane, and change lanes or avoid them independently, even when an ambulance approaches behind, DriveAGI will carry out thinking and reasoning, and finally change lanes in time to avoid.

DriveAGI can not only identify ambulances, but also actively give way to ambulances on duty.

The multimodal large model also gives DriveAGI strong interactivity.

Users can not only ask DriveAGI to explain their decision-making process, but also control autopilot behavior through voice or gesture commands.

For example, in the future, in autopilot, the navigation instructs the vehicle to make a U-turn at the next intersection to reach its destination, but the driver knows that there is a shortcut ahead that he can turn directly, so he only needs to say “straight left” to the system.

The system will execute this instruction according to the current road conditions.

The amazing performance of UniAD and DriveAGI smart driving models depends on Shangtang’s powerful modeling ability, as well as a large amount of high-quality data to support learning and training.

As a “new quality productivity”, the large model represented by multi-mode greatly improves the production efficiency of end-to-end intelligent driving training and iteration.

Based on the real multimodal data, a series of large cloud models, such as Shangtang shadow world model and traffic flow simulation model, continue to produce high quality data, and at the same time, through the cooperation of various large models, realize the capabilities of scene production, traffic flow simulation, truth production, system diagnosis and so on, and create an end-to-end data closed loop in the era of intelligent driving model.

It provides a strong guarantee for the landing and evolution of end-to-end autopilot solutions.

Multi-mode is integrated into the intelligent cockpit to make smart cars your own “Jarvis”.

Today’s smart cars are equipped with rich and powerful hardware to create an independent interactive environment for users, which is the best scene for AGI landing.

Facing the intelligent cockpit, Shangtang Shadow fully releases the strong perception and interaction ability of the multimodal large model, and inspires more imagination.

Relying on the industry-leading multimodal capability, Shangtang Jieying is building a multimodal large model engine product “cockpit brain” (CockpitBrain), building a series of AI large model cockpit product matrix, so that Iron Man’s artificial intelligence helper “Jarvis” enters the smart car and becomes every user’s AI travel partner.

At this year’s WAIC, Shangtang officially unveiled the industry’s first generative interface product, “FlexInterface” and “AgentFlow”, which aims to completely change the way users interact with the vehicle system through AI technology.

Relying on the ability of real-time generation and modification of interactive interface of AI large model, FlexInterface realizes highly dynamic and personalized interface generation on the basis of analyzing user requirements of large model and combining the framework and paradigm of design system.

Whether it’s weather, time, festivals, anniversaries, or changes in the surrounding environment, FlexInterface can automatically change the style of the interface to provide the best user experience.

AgentFlow simulates the human click operation through the reasoning ability of the large model, and realizes the direct operation to APP and website.

Users only need to use natural language to allow AI to choose multiple tools to complete complex tasks without the need for additional R & D adaptation by the host factory.

This ability not only improves the convenience of operation, but also greatly expands the functional range of the vehicle system.

For example, users can have AgentFlow automatically search and book a bar suitable for watching games, providing an one-stop service from search to reservation.

In the live demonstration of Shangtang Shadow, users generate a “European Cup” style theme through FlexInterface, and the large model automatically generates a central control screen desktop and icon with European Cup elements.

at the same time, users can also play European Cup or football-related music at any time through AgentFlow, demonstrating the powerful ability and flexibility of these innovative products in practical application.

In addition to the “European Cup”-themed car interface generated by FlexInterface through a large model, at this year’s WAIC, Shangtang has created a “multimodal sentinel” based on the traditional smart car “sentinel mode”, which can fully understand and deal with all kinds of potentially random and dangerous behaviors that may cause damage to vehicles in the open world, such as rowing, painting the car body, slapping and smashing the car, pulling and pulling door handles, prying the door and treading the car.

Ensure that the vehicle is safe and free of dead ends.

With the help of multimodal large models, Shangtang Shadow will make smart cars the exclusive “Jarvis” of users, making smart cars a step forward in the form of super-agents.

It blossoms in an all-round way in mass production, opens up a new paradigm of travel, and accelerates into the AGI era.

As a strategic partner to accelerate smart cars into the AGI era, Shangtang Ju Ying not only leads the new era in large model technology, but also fully blossoms in mass production and landing.

In the field of intelligent cockpit, Shangtang’s large model products have been widely used in the mass-produced models of many mainstream automobile manufacturers.

For example, Shangtang’s big model has fully assisted Xiaomi SU7’s Xiaomi voice assistant car voice scene application.

On June 25th, Yizhen L380 was officially launched, and the latest version of Shang Tang “Rixin” model was put into mass production.

Based on the “discussion” big language model and “second painting” Wen Sheng model, Shang Tang customized for Yizhen L380 to create AI large model cockpit products and functions such as “AI chat”, “Meitu wallpaper”, “fairy tale picture book”, “AI consultation” and so on.

Airbus “upgraded smart cockpit experience.

, In the field of smart driving, Shangtang Jueying’s mass-produced smart driving products have been launched, including GAC Aon LX Plus, Hezhong Nezha S, GAC Haopin GT, Hongqi and other brands and models, and functions such as high-speed NOA have also begun to be implemented.

At the same time, Jueying is still promoting the delivery of more models and has the ability to mass produce and deliver full-stack smart driving technology from perception to planning control.

In early June, GAC and FAW were selected into the first batch of domestic L3 pilot projects, and Shangtang Jueying provided them with L3-oriented sensing algorithms.

Not only that, many of Shangtang Jueying’s current mass production smart driving solutions can be upgraded to end-to-end architecture in the future.

, In the higher-level L4 autonomous driving field, Shangtang Jueying autonomous driving minibus has become the only L4-level autonomous driving minibus to undertake connection tasks in WAIC 2024, providing a demand-responsive autonomous driving bus travel experience between multiple locations.

, Behind this is Shangtang Jueying’s hard-core technical strength and strong operational capabilities.

At present, the total test and operation mileage of Jueying L4-level self-driving minibuses has exceeded 3,000,000 kilometers, and autonomous driving connection services have been launched in Wuxi, Jiangsu, Xixian New District, Shaanxi and other places.

In Lingang, Shanghai, the Jueying L4 self-driving minibus is already in daily operation to the public.

The intelligent networked public transportation scene jointly created by Shangtang Jueying and Shanghai Lingang New Area Public Transportation Co., Ltd. adopts the “responsive public transportation” model to respond on demand.

The cumulative number of booked passengers has exceeded 16,000.

Shangtang Jueying’s large model products are integrating all aspects of smart cars and smart travel, opening up a new travel paradigm and accelerating the entry of smart cars into the AGI era.

, Return to First Electric Network Home>.

Link to this article: https://evcnd.com/multi-modal-large-models-lead-the-people-oriented-interactive-innovation-of-smart-cars-and-shangtang-jueying-debuted-at-waic-2024/

Multi-modal large models lead the people-oriented interactive innovation of smart cars, and Shangtang Jueying debuted at WAIC 2024

Related Suggestion