Dek: Chinese media say KernelCAT, an AI agent from a five-month-old Shenzhen startup incubated by the Shenzhen Institute of Big Data, automated DeepSeek-OCR-2 deployment and inference verification on Huawei Ascend in 38 minutes. The bigger signal is not chip victory-lap rhetoric, but a faster path for porting advanced AI workloads onto China’s domestic compute stack.
China’s AI-chip conversation is often framed as a hardware race against Nvidia. In practice, one of the hardest barriers sits in the software layer: getting advanced models, frameworks, operators, and dependency chains to run reliably on non-CUDA stacks.
That is why a very specific claim from Shenzhen is worth paying attention to. On March 9, Shenzhen News and Nanfang Plus reported that KernelCAT, an AI agent tool built by a Shenzhen startup incubated by the Shenzhen Institute of Big Data, completed the automated deployment and inference verification of DeepSeek-OCR-2 on Huawei Ascend in just 38 minutes.
The cleanest way to read the story is not as proof that China’s domestic AI stack has already caught up with Nvidia’s full ecosystem. It has not. The stronger interpretation is narrower and more useful: Chinese teams are increasingly trying to use AI agents to compress the software-adaptation work that has long slowed the practical rollout of domestic AI chips.
What local media actually reported
According to the March 9 source chain, the startup behind KernelCAT was incubated by the Shenzhen Institute of Big Data in 2025 and is only about five months old. Local media described KernelCAT as the company’s self-developed AI agent for model migration and optimization on domestic compute platforms.
The headline fact is straightforward. Chinese media reported that KernelCAT completed automated deployment and inference verification of DeepSeek-OCR-2 on a Huawei Ascend platform in 38 minutes.
That number matters because the underlying task is not trivial. DeepSeek-OCR-2 was described in the reporting as a complex multimodal OCR model with demanding operator requirements. In older workflows, porting that kind of system to a different hardware-software stack could mean wrestling with tightly coupled versions of vLLM, PyTorch, and NPU drivers, then manually rewriting or optimizing low-level code until the model finally runs.
Nanfang Plus added a second metric that should be kept clearly separate from the 38-minute migration claim. Citing information from Ascend CANN about an earlier DeepSeek-OCR deployment task, the outlet said a KernelCAT-generated solution delivered up to 139x acceleration compared with a native Transformers approach.
That is an attention-grabbing figure, but it should stay tightly attributed and context-bound. The current source chain does not justify flattening the two numbers into one sweeping statement, and it does not justify presenting the 139x result as a universal benchmark across all workloads.
Why the software layer is the real bottleneck
The most valuable part of the local reporting is the diagnosis, not the startup mythology.
Nanfang Plus framed the real domestic-chip bottleneck as software ecology and operator adaptation, not a total absence of theoretical hardware performance. In that telling, the hard part is not merely owning accelerators. It is building enough low-level software support that advanced AI models can be moved, tuned, and reproduced without burning weeks or months of expert engineering time.
That argument will sound familiar to anyone who has followed China’s broader AI infrastructure push. Recent 1M Reviews coverage such as Huawei Launches AI Data Platform to Push Enterprise AI Beyond Model Hype made a similar point from the enterprise side: the next bottleneck is often not model capability alone, but the deployment plumbing around retrieval, inference, memory, and systems integration.
KernelCAT sits one layer deeper in the stack. The story here is not about a chatbot or an agent interface. It is about using an AI-driven tool to reduce the painful engineering work required to make a sophisticated model run on a different compute environment.
That is why the Nvidia comparison keeps surfacing in the source material. The moat is not simply that Nvidia has strong chips. It is that Nvidia spent years building CUDA, operator libraries, frameworks, tooling, and developer habits. Chinese media quoted KernelCAT’s backers as arguing that automated tooling may offer one of the few realistic ways to shorten that catch-up cycle.
Why 38 minutes is a compelling signal — and a limited one
The 38-minute number travels well because it is concrete, current, and easy to understand. It also fits a broader editorial trend in China tech coverage: AI is increasingly being sold not only as an application layer, but as a way to fix underlying industrial bottlenecks.
That same shift is visible in China’s AI+Manufacturing Push Targets 1,000 Industrial Agents by 2027, where the official narrative moved from AI demos to workflow efficiency and factory-floor ROI. It is also visible in Pointer-CAD Shows China’s AI Race Moving Into 3D Design, where large-model capability started showing up inside specialized engineering software rather than only inside chat products.
KernelCAT extends that pattern into the infrastructure layer. Instead of saying “AI can help design things” or “AI can help office workers,” the Shenzhen story says AI can help solve the ugly compatibility, dependency, and optimization work that slows domestic compute adoption.
But that still needs careful boundaries.
A 38-minute deployment-and-verification result does not mean Huawei Ascend has fully matched Nvidia’s software ecosystem. It does not mean every major model can now be migrated just as quickly. And it does not prove that commercial users can already deploy at scale with the same maturity, stability, tooling depth, or developer convenience they get elsewhere.
What it does suggest is that one of the most stubborn parts of the problem may be becoming more automatable.
The engineering angle is what makes the story interesting
The local reports tried to explain why the result could matter technically.
According to Nanfang Plus, KernelCAT combines AI-driven code generation, hardware analysis, mathematical optimization, and a hardware-in-the-loop mechanism that forces testing on real hardware rather than trusting a purely simulated answer. If that description is broadly accurate, the value is not just speed. It is that the tool attempts to automate both migration and validation in a way that reduces trial-and-error cycles.
That is a more durable editorial angle than simply saying a startup posted a good benchmark.
Chinese media also used a consumer metaphor to explain the challenge: porting a model to a domestic stack can resemble trying to get a Windows game to run smoothly on a Mac, except with much more fragile dependencies and much higher stakes for performance. That metaphor is imperfect, but it gets close to the real story. The hard part is not just whether software launches. It is whether it launches reliably, performs competitively, and can be reproduced without a heroic engineering team.
If AI agents can genuinely shrink that workflow from weeks to tens of minutes or a few hours, the implications go beyond one OCR model. It could make domestic compute platforms more usable for enterprises, research groups, and sector-specific deployments that currently find the migration burden too high.
What not to overstate
This is exactly the kind of China AI story that becomes weaker if it is turned into a geopolitical victory speech.
The current source set supports saying that local media reported a 38-minute automated deployment and inference-verification result for DeepSeek-OCR-2 on Huawei Ascend. It supports saying that local reporting, citing Ascend CANN information about an earlier task, also referenced up to 139x acceleration versus a native Transformers baseline.
It does not support saying China’s domestic chip ecosystem has already closed the full software gap with Nvidia. It does not support saying every model can now be ported this way. And it does not support writing the story as if large-scale commercial deployment outcomes are already settled.
It is also worth remembering that the current public record is still narrow. The evidence available in this workflow comes primarily from Chinese media reports and their descriptions of the company’s tool and test results. That is enough for a carefully framed signal story. It is not enough for a sweeping declaration about industry-wide parity.
Bottom line
The most interesting part of the Shenzhen KernelCAT story is not that a startup said it did something fast. It is that the task in question sits squarely inside one of China’s biggest AI infrastructure pain points: the software-adaptation layer between advanced models and domestic chips.
If the reported 38-minute result holds up as part of a repeatable engineering workflow, it would suggest a more credible path for shortening the migration cycle onto platforms such as Huawei Ascend. That still falls well short of proving full ecosystem catch-up. But it is a meaningful signal that China is trying to use AI agents to attack the infrastructure bottleneck itself, not just build new apps on top of it.
Sources
- Shenzhen News / Shenzhen Special Zone Daily relay: https://www.sznews.com/news/content/2026-03/09/content_31969917.htm
- Nanfang Plus (NF News): https://www.nfnews.com/content/VoQVQXq0y5.html