Alibaba-Backed ShengShu Raises $293 Million as China’s AI Capital Looks Past Video Generation

On April 10, ShengShu Technology, the Chinese startup behind the AI video generator Vidu, said it raised 2 billion yuan ($293 million) in a Series B led by Alibaba Cloud to build a “general world model” that would link digital content generation with robots and other physical-world systems. The round matters because it suggests Chinese AI capital is no longer treating video startups only as tools for marketing clips or entertainment content. It is beginning to fund some of them as possible world-model and embodied-AI infrastructure bets. That does not mean ShengShu has built anything like general intelligence or deployed physical AI at scale. It does mean investors are now willing to finance that roadmap.

This is a funding signal about direction, not proof of delivery

The hard news is straightforward. Reuters, CNBC and MarketWatch all reported that ShengShu closed a 2 billion yuan Series B, with Alibaba Cloud leading the round and other investors including TAL Education and China Internet Investment Fund appearing in public coverage. The company did not disclose its valuation. ShengShu is only about three years old, according to CNBC, which makes the size and speed of the financing notable even in China’s crowded AI market.

The timing makes the round more revealing. CNBC said the financing arrived about two months after ShengShu had already raised 600 million yuan in an earlier round. In other words, this is not a startup coming back to market after a long reset. It is a company attracting consecutive large raises in a short period while the market is still trying to decide which parts of generative AI can become durable businesses. That alone turns the ShengShu story into something bigger than a routine company update.

What changes the category of the story is the stated use of the money. ShengShu said the new capital will support a “general world model,” which the company described as a system built on multimodal data such as vision, audio and touch. Reuters and CNBC both framed that as an effort to move beyond text-centric large language models and toward AI systems that can better model perception, interaction and real-world behavior. That is a much broader ambition than making an AI video product faster or prettier.

Still, the safest reading is a funding bet on a direction, not evidence that ShengShu has already delivered a mature world model. The language about perception, action and physical-world understanding comes largely through media reports and company statements, not through an independently documented commercial deployment at scale. The financing is real; the roadmap remains a roadmap.

Alibaba Cloud is backing more than another content tool

Alibaba Cloud’s name matters here because the company is not arriving as a random late-stage investor. CNBC noted that Alibaba has recently expanded its exposure to adjacent areas, including investments in Tripo AI and PixVerse, while also releasing open-source models for video generation and, earlier this year, a model aimed at robotics applications. Seen in that context, ShengShu’s round looks less isolated and more like part of a broader push toward AI systems grounded in physical space rather than text alone.

That does not automatically mean ShengShu will become a core Alibaba platform asset or that integration plans already exist. Public reporting does not go that far. But Alibaba Cloud leading the round does indicate that one of China’s largest cloud and AI infrastructure players is willing to put serious capital behind the idea that a company known for video generation could evolve into something closer to a world-model platform. In venture terms, that is a strong statement about where the next layer of AI value might be built.

The competitive backdrop also helps explain the interest. China’s AI video market is no longer a novelty category. ByteDance, Kuaishou, Alibaba-backed PixVerse and several other players have all pushed deeper into video generation, while benchmark tables and creator adoption now move quickly from one model release to another. In that kind of environment, a startup needs a bigger narrative than “we also generate video.” ShengShu’s new narrative is that video is only the visible front end of a wider attempt to model physical reality.

ShengShu is trying to use video-model know-how as a bridge into world models

That repositioning is believable enough to take seriously because ShengShu already has a real product foothold. Vidu is one of the better-known Chinese AI video products in global coverage, and CNBC said the latest Vidu Q3 Pro model ranks among the top video-generation models tracked by Artificial Analysis. MarketWatch likewise described the company as a recognized name in AI video, not an untested lab project pitching a concept deck with no product history.

ShengShu has also tried to build the commercial case that it can operate beyond a demo culture. In its previous funding announcement carried by PR Newswire, the company said it had built out MaaS, SaaS, app and agent products around Vidu, and that it achieved more than 10x growth in users and revenue in 2025. Those figures are company-supplied and should be treated as such, but they still matter because they show how ShengShu wants investors to interpret the path forward: first prove it can ship and monetize multimodal generation, then argue that the same stack can expand toward world models and embodied AI.

This is where the evidence boundary becomes especially important. ShengShu’s description of a “general world model” bridging digital and physical environments is meaningful as strategy language, but it should not be turned into a claim that the company has solved robotics, autonomous driving or general intelligence. The current public record supports a much narrower conclusion: ShengShu believes video and multimodal generation can be a stepping stone toward broader physical-world simulation, and investors including Alibaba Cloud are willing to fund that thesis.

That distinction matters because AI markets are full of category inflation. A company that can generate videos is not automatically a robotics platform. A startup that talks about perception and action is not automatically close to AGI. The right interpretation is more practical. ShengShu is trying to move up the ambition stack while it still has investor attention and product momentum, and the new financing gives it the time and capital to attempt that move.

The wider signal is about where Chinese AI money may go next

The deeper significance of the round is therefore industrial, not just corporate. CNBC framed the ShengShu financing as part of a turn toward world models as developers confront the limits of text-trained large language models. Whether or not that framing becomes the industry consensus, it matches what investors are doing: looking for the next layer of AI capability after chatbots, image tools and first-generation video generators have become crowded fields.

China is a particularly interesting place for that shift because it already has a dense overlap between consumer internet platforms, industrial automation ambitions, robotics startups and local competition in multimodal models. If investors start treating leading video-generation startups as potential suppliers of physical-world simulation infrastructure, then the boundaries between content AI, 3D generation, robotics and embodied AI begin to blur. ShengShu’s round does not complete that transition, but it does give it a price tag: 2 billion yuan for a chance to try.

There is a capital-markets angle as well. MarketWatch noted that PixVerse also reached unicorn status after a recent funding round, showing that money is still flowing into Chinese AI video names even after the first hype wave. The difference with ShengShu is that the company is not asking investors to believe only in better clips or faster rendering. It is asking them to believe that the same research base can support a more foundational AI layer. That is a more ambitious, and riskier, pitch.

The risk is obvious. A world-model narrative is easy to say and hard to prove. There is no disclosed timetable for commercial deployment, no public valuation benchmark from this round, and no independently verified evidence that ShengShu has already translated video-model expertise into reliable real-world machine behavior. Investors are funding optionality here. They are not buying proof that the physical-world AI problem is close to solved.

What changed, and what could happen next

What changed on April 10 is that ShengShu stopped looking like just another Chinese AI video startup with a strong product and started looking like a test case for a broader capital thesis. Alibaba Cloud’s lead investment reframed the company from a video-generation competitor into a candidate world-model bet. That is the real news. The money says investors think the next value layer may sit in systems that can connect generated digital scenes with physical perception and action.

What could happen next is more important than the funding headline. If ShengShu can show concrete product milestones that go beyond AI video, such as credible tooling for simulation, robotics partnerships with visible outcomes or developer adoption in physical-world use cases, then this round will look early and strategic. If it cannot, the financing may end up reading as a top-of-cycle wager on fashionable language.

For now, the disciplined conclusion is simple. ShengShu’s Series B does not prove that world models are ready for large-scale commercialization, and it certainly does not prove that AGI is around the corner. It does show that at least some major Chinese investors are prepared to move capital from pure content generation into the harder, less proven territory of world models and embodied AI. In China’s AI market, that is a real change in direction.

Sources

Reuters — Chinese startup ShengShu raises $293 million to advance artificial general intelligence
https://www.reuters.com/world/asia-pacific/chinese-startup-shengshu-raises-293-million-advance-artificial-general-2026-04-10/
CNBC — Alibaba leads $290 million investment for building a new kind of AI model as LLM limits emerge
https://www.cnbc.com/2026/04/10/alibaba-cloud-invests-world-model-ai-shengshu-vidu.html
MarketWatch / Dow Jones — China AI Startup ShengShu Raises Over $290 Million in Latest Fundraising
https://www.marketwatch.com/story/china-ai-startup-shengshu-raises-over-290-million-in-latest-fundraising-3321d4b0
PR Newswire — ShengShu Technology Completes Series A+ Funding of Over RMB 600 Million
https://www.prnewswire.com/news-releases/shengshu-technology-completes-series-a-funding-of-over-rmb-600-million-302679760.html

Related coverage

Alibaba and China Telecom Launch a 10,000-Chip Zhenwu AI Cluster in Guangdong, Turning China’s Domestic Compute Push Into Real Infrastructure

Alibaba-Backed ShengShu Raises $293 Million as China’s AI Capital Looks Past Video Generation

This is a funding signal about direction, not proof of delivery

Alibaba Cloud is backing more than another content tool

ShengShu is trying to use video-model know-how as a bridge into world models

The wider signal is about where Chinese AI money may go next

What changed, and what could happen next

Sources

Related coverage

Longxin, Zhang

More From Author

Li Auto’s i6 Hits 100,000th Roll-Off Milestone as China’s Pure-EV SUV Output Regains Speed

Sharetronic’s $92 Million Paper Trail Raises New Questions About How Restricted Nvidia Servers Reach China

Alibaba Unmasks HappyHorse After the Video Model Topped a Global Text-to-Video Leaderboard

Li Auto’s i6 Hits 100,000th Roll-Off Milestone as China’s Pure-EV SUV Output Regains Speed