Multi-modal large model “gets on the bus”, Shangtang Jueying welcomes new breakthroughs again

From single-mode to multi-mode, the large model track is setting off a new round of technological arms race.

Different from model training based on single category data in the past, the so-called multimodal large model can jointly train and learn multi-modal data, such as voice, text, image, gesture, video and so on.

thus fully capture the correlation and complementary information between different modes, and achieve more comprehensive and accurate analysis and prediction.

For example, for intelligent cars, through the excellent analysis and reasoning capabilities of multimodal large models, we can not only achieve a safer, humanoid intelligent driving experience, but also create a richer and natural human-computer interaction experience.

Recently, on WAIC 2024, Shangtang Jueyu displayed a number of intelligent driving and intelligent cockpit products based on the newly released Shang Tang “Rishin 5.

5″ native multimodal large model, including interpretable and interactive self-driving large model DriveAGI, as well as vehicle-generated interactive interface” FlexInterface “,” AgentFlow “and so on, with the multimodal large model as the core.

Drive smart cars to accelerate their evolution into real super-agents.

Double-line layout, accelerate the boarding of large models, in the deep integration of multi-modal large models and intelligent vehicles, Shangtang Shadow mainly focuses on two major application scenarios: intelligent driving and intelligent cockpit.

In terms of intelligent driving, as early as the end of 2022, Shangtang Jiuying took the lead in launching the first general self-driving model for the integration of perception and decision-making in the industry, which greatly enhanced the continuity and comfort of the intelligent driving experience.

However, Shangtang Jueyu believes that the pure end-to-end self-driving model is not the final answer to self-driving, and that the ability to perceive, reason, make decisions and interact with the open world will be an important sign of smart cars moving towards super-agents.

Photo source: Shangtang vanishing shadow, therefore, on the basis of the existing UniAD, Shangtang quaint shadow further builds the intelligent driving large model DriveAGI for driving decision planning based on the multimodal large model, which enhances the interpretability of the end-to-end system, at the same time, enables vehicles to understand the complex real world like people, and even explain the reasoning process of driving decisions to users.

According to Shangtang Jueyu’s live demonstration in WAIC 2024, thanks to DriveAGI’s excellent analytical and reasoning ability, the test car carrying the model can safely and smoothly pass through the narrow passage formed by two stone piers on the infinitely wide marked road, and can accurately identify and understand all kinds of traffic signs, including bus lanes, tidal lanes and construction lanes, and change lanes or avoid them independently, even when an ambulance approaches behind.

DriveAGI will also change lanes and avoid concessions through thinking and reasoning.

Not only that, the multimodal large model also gives DriveAGI strong interactivity, users can not only ask DriveAGI to explain their decision-making process, but also control autopilot behavior through voice or gesture commands.

In terms of intelligent cockpit, Shangtang Shadow is building a multimodal large model engine product, “cockpit brain” (CockpitBrain), with the goal of building a series of AI large model cockpit product matrix.

On this year’s WAIC, Shangtang released its first generative interface product, FlexInterface and AgentFlow, in order to change the way users interact with the vehicle system through AI technology.

Relying on the ability of real-time generation and modification of interactive interface of AI large model, FlexInterface can achieve highly dynamic and personalized interface generation on the basis of analyzing user requirements of large model and combining with the framework and paradigm of the design system.

Regardless of weather, time, festivals, anniversaries, or changes in the surrounding environment, FlexInterface can automatically change the style of the interface to provide the best user experience.

Through the reasoning ability of the large model, AgentFlow can simulate the human click operation and realize the direct operation of APP and website.

Users only need to use natural language to allow AI to choose multiple tools to complete complex tasks without the need for additional R & D adaptation by the host factory.

For example, users can have AgentFlow automatically search and book a bar suitable for watching games, providing an one-stop service from search to reservation.

In addition, based on the traditional smart car sentinel model, Shangtang Jiuyin has also created a “multimodal sentinel” that can fully understand and deal with a variety of potentially random and dangerous behaviors that may cause damage to vehicles in the open world.

such as rowing, painting the body, slapping and smashing the car, pulling the door handle, prying the door and kicking the car, to ensure that the vehicle is safe without dead corners.

As a complex intelligent mobile terminal, intelligent vehicle deeply integrates advanced perception technology, AI algorithm, big data and high-performance computing platform, and at the same time, it naturally has the characteristics of tactile, visual, voice and other multimodal fusion interaction in human-computer interaction, so it is an excellent scene for multi-modal large model landing.

However, due to the uniqueness of the intelligent vehicle itself and the high requirements for security and real-time response, it is difficult to fully rely on the cloud testing model to meet the diversified needs of the car, so the combination of end-to-end cloud is imperative.

In view of this trend, Shangtang Jueyin has built a high-performance computing engine HyperPPL for multimodal large models, which provides a powerful computing base for the landing end of multimodal large models by integrating large language models, multimodal models, CNN models, etc.

According to Wang Xiaogang, co-founder, chief scientist and president of Qualcomm Intelligent Automobile Group, HyperPPL can adapt to a number of mainstream vehicle computing platforms, including Nvidia, Qualcomm and Intel, and is compatible with a variety of mainstream operating systems.

At the same time, HyperPPL supports more than 400 hardware operators such as flash decode and segment prefill, and optimizes the performance of the operators.

at the same time, it quantitatively supports int8 and int4 modes, and supports quantization after training, so as to achieve the ultimate reasoning efficiency.

Not only that, Shangtang Shadow HyperPPL is also specially optimized for the vehicle multi-person scene, so that in the case of multi-person concurrency, the reasoning efficiency of the vehicle-end multimodal large model is not significantly lower than that of the single person.

At this year’s WAIC, Shangtang Jueyu demonstrated the adaptability of running 2.

1B or 8B end-to-side multimodal large models on three different computing platforms.

It is reported that compared with the cloud deployment plan which is often delayed by a few seconds, Shangtang has no shadow on the side of the car.

The 8B multimodal model can achieve first packet delay as low as 300 milliseconds and an inference speed of 40Tokens/second.

, It is worth mentioning that in addition to continuing to promote large-scale model technology and application innovation, Shangtang Jueying has also made good progress in mass production.

In the field of smart cockpits, Shangtang Jueying’s large model products have been widely used in mass production models of many car companies.

For example, Shangtang’s large model has fully assisted Xiaomi SU7’s Xiaoai voice assistant in-car voice scene application.

On June 25, the Yizhen L380 was officially launched.

The car is also equipped with AI large model cockpit products and functions such as AI chat, Mito wallpaper, fairy tale picture book, and AI consultation customized by Shangtang Jueying.

In the field of smart driving, Shangtang Jueying’s mass-produced smart driving products have also been launched, including GAC Aian LX Plus, Nezha S, Haopin GT, Hongqi and other brands and models, and Shangtang Jueying’s current mass-produced smart driving solutions can be upgraded to end-to-end architecture in the future.

Return to First Electric Network Home>.

Link to this article: https://evcnd.com/multi-modal-large-model-gets-on-the-bus-shangtang-jueying-welcomes-new-breakthroughs-again/

Multi-modal large model “gets on the bus”, Shangtang Jueying welcomes new breakthroughs again

Related Suggestion