Abstract data center and robot illustration representing embodied intelligence data

JD.com plans the world’s largest embodied-intelligence data collection center, targeting 10 million hours of real-world video

JD.com plans the world’s largest embodied-intelligence data collection center, targeting 10 million hours of real-world video

JD.com (Jingdong), China’s e-commerce and logistics giant, says it will build what it calls the world’s largest and most scenario-complete embodied-intelligence data collection center. The company announced the plan on March 16, 2026, noting that it will leverage real business environments—retail, logistics, healthcare, industrial operations, food delivery, and home services—to generate training and validation data for robots and embodied AI systems. JD says it aims to collect 5 million hours of human real-world video within a year, exceed 10 million hours within two years, and capture 1 million hours of robot body (embodiment) data in parallel. The move positions data scale and scenario coverage as JD’s core advantage in China’s fast-growing embodied-intelligence race.

What JD says it will build

According to state broadcaster CNR and multiple Chinese business outlets, JD’s data collection center is designed to cover a full pipeline of “collection–labeling–training–validation.” That end-to-end framing signals that the company wants more than raw footage; it wants a standardized data factory capable of turning multi-scenario signals into machine-ready datasets. JD’s official messaging highlights coverage across retail, logistics, healthcare, industrial manufacturing, food delivery, and home services—sectors in which it already operates large-scale, real-world workflows in China.

The company positions the center as a response to what it calls a “data shortage” in embodied intelligence. Instead of relying primarily on simulation or narrow lab environments, JD argues that real-world, multi-scenario data is the fuel that can move embodied AI from demos to production systems. That claim is reinforced by the company’s stated aim to use live business operations as the primary source of data, rather than staged, synthetic, or limited test sites.

The scale claim: 5 million to 10 million hours, plus 1 million robot-hours

The most concrete numbers in JD’s announcement are the data targets. It says it will accumulate 5 million hours of human real-world video data within one year, exceed 10 million hours within two years, and simultaneously collect 1 million hours of robot embodiment data. Those figures matter because embodied AI, unlike many text-only models, depends on highly diverse, long-horizon, and sensor-rich interaction data. The scale is designed to address the long-tail of edge cases—different lighting, surfaces, obstacles, and human behaviors—that robots need to handle in commercial deployments.

JD’s emphasis on a complete data pipeline—collection, labeling, training, and validation—also hints at the company’s intent to internalize the most cost-intensive part of embodied AI development. Labeling and validation are often the bottlenecks for real-world robotics, especially when the data must be mapped to precise physical actions and safety constraints. JD is effectively positioning itself as both a data producer and a data processor rather than a downstream customer of third-party datasets.

Why real-world scenarios matter for embodied intelligence

Embodied intelligence aims to connect perception, decision-making, and physical action in the real world. That makes data quality and variety more critical than in purely digital AI tasks. Simulations can accelerate early development, but they are limited in their ability to represent messy, shifting real-world conditions—uneven floors, cluttered warehouses, dynamic crowds, and unpredictable interactions with humans.

By anchoring the center in real business scenarios—such as warehouse logistics, retail operations, healthcare workflows, industrial settings, food delivery, and home services—JD is betting that broad scenario coverage will become a competitive moat. The company explicitly lists those sectors as the core sources of data, which implies a focus on both consumer-facing and industrial environments. If the data pipeline works as described, JD’s datasets could help robots generalize across multiple operational contexts instead of learning siloed skills for a single task.

Industry context: China’s market is approaching trillion-yuan scale

JD’s announcement lands in a moment when China’s embodied-intelligence market is entering a high-growth phase. A 36Kr Research report estimates that China’s embodied-intelligence market size reached roughly 915 billion yuan in 2025 and could exceed 1 trillion yuan in 2026. That scale makes data infrastructure a strategic asset rather than a narrow R&D capability. Recent funding rounds like Digua Robot’s $120M Series B1 and Horizon Robotics’ Diguo B1 raise show how capital is consolidating around embodied-intelligence platforms and real-world deployment capabilities.

The data-center narrative also aligns with China’s broader push for “AI + manufacturing,” where robotics and intelligent automation are expected to improve productivity in factories, warehouses, and logistics networks. JD is not positioning itself as a niche robotics company; it is positioning itself as a data and scenario platform that could underpin a wider ecosystem of embodied-intelligence partners, suppliers, and downstream integrators.

Strategic implications for JD and the ecosystem

JD is a China-based company with large-scale, domestically operated supply chains and service networks. That gives it access to real-world scenarios that many AI startups cannot replicate at scale. If it succeeds in building a high-quality data pipeline, JD could become a key data provider for the broader embodied-intelligence ecosystem—supplying data that can be used to train models for warehouse robots, service robots, and industrial automation systems.

At the same time, the effort highlights how embodied AI may be moving away from purely model-centric competition toward infrastructure competition. A company that owns multi-scenario datasets can iterate on models faster, prove safety performance in real settings, and potentially set de facto benchmarks for what “real-world readiness” looks like in China’s robotics market.

Risks and open questions

JD’s claim of building the world’s largest center is ambitious, but it raises several practical questions. First, the quality of the data matters as much as the quantity; 10 million hours of footage only helps if it is accurately labeled and aligned with actionable robot behaviors. Second, data governance and privacy will be crucial in sectors like healthcare and home services, where real-world data can be sensitive. Third, the “collection–labeling–training–validation” loop must be operationally efficient; any bottleneck in labeling or safety validation could slow the entire pipeline.

There is also the question of interoperability. If JD’s datasets are optimized for its own workflows, they might be less transferable to other robotics platforms. Conversely, if the company standardizes its data formats and interfaces, it could become a central hub for embodied-intelligence innovation in China.

What changed and what could come next

What changed is that JD has publicly tied its embodied-intelligence ambitions to a concrete, large-scale data infrastructure plan, complete with timeline targets and multi-scenario coverage. What could come next is a race among China’s major platform companies—logistics, manufacturing, and service giants—to replicate or partner on similar data centers, because data volume and scenario diversity are likely to set the pace for embodied-intelligence deployments.

If JD delivers the 5 million- and 10 million-hour targets and proves that the data pipeline can translate into safer, more capable robots, it could accelerate the shift from simulation-heavy development to real-world, data-driven embodied AI. The outcome will matter not only for JD’s own robotics roadmap, but also for the broader trajectory of China’s emerging embodied-intelligence industry.

Sources

  • CNR (China National Radio) — “JD to build the world’s largest embodied-intelligence data collection center”
    https://www.cnr.cn/tech/techph/20260316/t20260316_527553464.shtml
  • IT Home — “JD to build the world’s largest embodied data collection center”
    https://www.ithome.com/0/929/592.htm
  • 36Kr — “JD to build the world’s largest embodied data collection center”
    https://36kr.com/newsflashes/3725430294166147
  • Yicai — “JD to build the world’s largest embodied data collection center”
    https://www.yicai.com/news/103088116.html
  • 36Kr Research — “2026 Embodied Intelligence Industry Development Report”
    https://eu.36kr.com/zh/p/3660312420180864

More From Author

Chery Battery Night solid-state battery concept

Chery sets March 18 Battery Night to unveil solid-state progress and a new energy system

Abstract hydrogen molecule and energy network illustration

China launches hydrogen comprehensive-application pilots with up to RMB 80 billion in central rewards

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注