Samsung's Big Move? Galaxy S26 Rumored to Feature On-Device LLM, Igniting the Next Smartphone AI War
Over the past year, smartphone manufacturers have been accelerating their AI initiatives. Honor's YOYO has integrated with more third-party agents, bridging AI capabilities between the system and application layers, while Huawei's Xiaoyi can navigate between apps to complete tasks with a single command.
Despite these powerful advancements, a closer look reveals a fundamental reality: these AI features still rely on an internet connection. In essence, smartphone AI remains in the edge-cloud collaboration stage, yet to take the next significant leap forward.
Recently, a leak on X from Semi-retired-ing revealed that Samsung is preparing a large language model (LLM) that can run locally on the upcoming Galaxy S26 series to power most of its AI functions. This on-device model is even said to have advanced privileges, allowing it to clear memory when necessary to ensure it can respond to user requests at any time.

In fact, Samsung showcased an on-device LLM named "Gauss" back in 2023, which was even rumored to be pre-installed on the Galaxy S25 series. However, for reasons unknown, Samsung has been heavily promoting Google's Gemini and has barely mentioned Gauss since. It's only now that Samsung's on-device model is back in the conversation. As most manufacturers continue to focus on cloud-based solutions, why is Samsung attempting to put the model directly into the phone? Is this a strategy to leapfrog the competition, or does the mobile platform finally have the capability to deploy LLMs locally? Whatever the answer, a new phase of smartphone AI is about to begin.
Smartphone Makers Won't Abandon the Edge-Cloud Hybrid Model
If Samsung truly deploys an LLM locally, does this mean smartphone AI will abandon the edge-cloud hybrid strategy for a purely on-device approach? In reality, this is unlikely to happen in the short term. The edge-cloud hybrid model is currently a near-perfect solution. The cloud handles model scale, complex reasoning, and rapid iteration, leveraging the superior computing resources of servers for updates, governance, and security. Meanwhile, the device (the edge) handles the initial user command, such as wake-up words, voice recognition, and basic intent analysis, before passing complex requests to the cloud.

This division of labor works well for users who use AI occasionally. However, this logic is based on the assumption that AI usage frequency is not too high. As the direction of smartphone AI becomes clearer, manufacturers are shifting their focus from "answering your questions" to "completing tasks for you." AI is evolving from a chatbot into a system that can understand on-screen content, break down task objectives, and plan execution paths, forming a complete AI Agent chain. Once AI enters this high-frequency, continuous, system-level interaction scenario, the shortcomings of the edge-cloud model become glaringly obvious. For instance, in weak network conditions, cloud response latency can cause noticeable interruptions. In a sequence of commands, a network drop could halt the entire process. For users, such inefficiency is unacceptable.
On-Device LLMs: What Are the Hurdles?
Given the drawbacks of the hybrid model, why has it been so difficult for on-device LLMs to be implemented on smartphones? It's not that manufacturers are unwilling to try, but that the limitations are very clear. First are the hardware constraints. Memory, computing power, and power consumption are the three core conditions for on-device AI. Even for a moderately sized model, running constantly in the background puts a continuous strain on system resources. The memory requirement alone has reportedly forced Apple to increase the RAM in its iPhones. Second are stability and maintenance costs. Cloud models can be iterated quickly with instant bug fixes, whereas on-device models can only be optimized through system updates. For a system-level AI, this means higher risks and greater testing costs.

However, the landscape is changing in 2025, as significant improvements in chip capabilities are making purely on-device LLMs a near reality. Take the Snapdragon 8 Gen 5, for example. Qualcomm has revealed that its Hexagon NPU can achieve output speeds of around 200 tokens/s in local generative tasks. This metric is significant because it means on-device models can now perform continuous and natural language generation, a prerequisite for AI to execute complex interactive commands. Similarly, MediaTek's Dimensity 9500 features a more aggressive power-efficient design in its NPU 990. According to official claims, it improves generation efficiency on a 3-billion-parameter on-device model while significantly reducing overall power consumption. This suggests that on-device models are no longer just a 'run-once' feature but are becoming viable for continuous operation.
On-Device AI Won't Be a Revolution, But a Dividing Line for Flagships
Looking at the choices of mainstream manufacturers, the edge-cloud hybrid model remains the most prudent solution. Huawei's Xiaoyi, for instance, is still one of the most comprehensive system-level AI assistants in China, yet its core architecture remains a classic hybrid: the device handles perception and basic understanding, while the cloud takes on complex reasoning. This isn't because manufacturers can't do on-device AI, but rather a practical trade-off. When AI becomes deeply integrated into the system and services, stability, efficiency, and resource control are always more important than aggressive deployment.
Ultimately, on-device large language models will not completely alter the overall direction of smartphone AI in the short term. Whether it's Samsung, Huawei, or other major domestic players, the current choice is the edge-cloud hybrid solution. After all, smartphones were not designed for large models, which forces a balance between performance, power consumption, stability, and security. From this perspective, on-device LLMs may not be the 'bombshell' feature at phone launch events, but they will quietly raise the technical bar for flagship devices. This will create a tangible experience gap between AI phones with on-device capabilities and those that are purely cloud-based. That dividing line may be drawn very soon.



























































