Opt in to receive updates. Unsubscribe anytime.

Experience NEO

We're excited to announce Vikram Kothari as our new VP of Operations.

He joins the 1X team after over eight years at SpaceX leading Dragon, Starship, Raptor, and Launch Avionics supply chain orgs. Preceding his time at SpaceX he was a Director at Microsoft, where he also led hardware supply chain for almost a decade.

“Joining 1X Technologies has been an inspiring step for me. The company’s commitment to building safe, privacy-first humanoids that are designed to coexist meaningfully with humans sets it apart in the robotics space. The team’s depth of talent and shared sense of purpose create an environment that’s both ambitious and grounded. I’m excited to contribute to a mission that places human values at the center of innovation." -Vikram Kothari.

Vikram's tenure at SpaceX streamlined operations and powered historic milestones, including the development of Starship and Starlink’s global expansion-- his expertise in scaling complex supply chains will serve as an even greater accelerant to 1X’s mission to deploy humanoids into homes.

“Vikram joining the team is a game-changer. Manufacturing has yet to find its heroes—but as we bring NEO into the world, Vik’s impact on the re-industrialization of America and the future of humanoids will be a defining part of the story.” -Bernt Bornich

Welcoming Vikram Kothari as VP of Operations

We're excited to announce our new HQ in Palo Alto, California.

The 80,000 sq. foot facility will seat 400 people, with employees from the company's previous offices migrating to the new space over the course of summer 2025.

Our leadership is confident that by positioning the team at the global center of AI and robotics, 1X will be able to attract top talent and foster even closer collaboration with our partners and investors.

As we grow, consolidating our Sunnyvale [CA] and Moss [Norway] team under one roof will accelerate our speed to ramping production and getting NEO into homes near you.

Join us on our mission to create an abundance of labor in the world.

Opening our new HQ in Palo Alto

At 1X, we build robots that help humans in the most diverse environment imaginable: people’s homes. To deploy our Redwood AI model safely and reliably, we need to anticipate its policy behavior across all that it can encounter. From retrieving a rarely-used kitchen gadget tucked away in a cluttered drawer, to navigating a living room unexpectedly rearranged overnight, physically evaluating each policy we've trained across these varied scenarios would take several lifetimes. How can we accelerate the evaluation of generalist robot models?

Today, we share progress on the 1X World Model (1XWM): a bridge between the world of atoms and the world of bits.

The 1X World Model enables NEO to anticipate the outcomes of robot actions and their consequences on the world.



This approach significantly accelerates experimentation, allowing us to evaluate the reliability and effectiveness of robotic policies in a fraction of the time, without requiring intensive, environment-specific engineering within traditional physics-based simulators or mock real-world sets.

Precise action controllability that enables us to compare policies that take different actions given the same observations.

Scaling results when leveraging a specific source of data: autonomous policy rollouts

High correlation between the world model and real world evaluations

Quickly iterate on architectural decisions.

Select the best checkpoint from training runs.

Curate datasets of long-tail scenarios in production and re-evaluate models on them.

Optimize robot policies at scale through efficient training-evaluation cycles.



We train our world model on sequences of video frames, robot observations, and input action trajectories. We encode the inputs to latent representations, and predict the latent encodings of future frames. We also predict the state value of the final frame to evaluate task success and completion.

Specializing Video Generation Models for Action Controllability

Most video generation models are text-to-video (T2V), using a language prompt to generate the video, and optionally, one or more reference frames for guidance. However, world models for simulating robots need to be action-controllable, steered by exact robot trajectories rather than loose directives like “Grab the mug” or “Wipe the countertop.” We demonstrate the action-controllability of 1XWM by providing it with a few initial frames of real footage along with multiple subsequent action trajectories. From this anchor point, the 1XWM simulates the consequences of taking those exact actions, including the physics of objects like doors being opened and cloths being wiped across a countertop.

LAUNDRY

OPEN

CLEAN

Does 1XWM improve as data is scaled up? What kinds of data best improve its understanding of physics?

To study this, we train 1XWM to not only predict future states and images, but also whether the task attempt succeeded or failed at the end of each generation. We see improvements in accuracy across the board as we scale up the number of tasks and the diversity of robot behaviors. We explore this in the following tasks: Airfryer, Arcade, and Shelf.

Air fryer

Arcade

Shelf



We observe a clear improvement in generation quality as we train on task-specific data. When confronted with an unfamiliar task and environment, the WM often struggles to model the object interactions exactly, without having knowledge of their specific properties. Training on task-specific data allows the WM to update based on the subtle dynamics of the task at hand.



For example, when trained on smaller amounts of data, 1XWM hallucinates the air fryer tray and body as a single unit, pulling the entire unit off the counter. After adding interaction data with the air fryer, 1XWM gains a better understanding of how the tray separates from the air fryer, and even learns to model subtle interactions such as the confinement of the tray movement within the base of the air fryer.



We also observe that training on both shelf and arcade data improves accuracy compared to training on shelf alone. This positive transfer of accuracy and task understanding from one task to another reinforces our belief in the capability of 1XWM on scaling.

The more task-oriented robotics data we accumulate, the more accurately we can predict task-level future outcomes. We then turn this capability into an evaluation engine as a novel application of world modeling for robotics development.

An aligned world model can solve the evaluation problem by forecasting the actions of candidate robot policies. Given 1X World Model generations from each policy on datasets of initial states, we can compare their respective performances. Importantly, we can curate datasets of production-setting states and generate counterfactual results from states that an autonomous policy has previously failed in.

For every set of model checkpoint weights, we predict future states and success likelihood that we show are distributionally aligned to actual real-world futures. This gives us insight into model performance at scale and allows us to make model architecture and checkpoint selection decisions with an instant feedback loop.



In the plot below, we ablate the decision to include proprioception (robot joint states) as input to our robot policy, and plot the 1X World Model predicted success rate for each checkpoint. We then run real world evaluation on the most and least promising checkpoints according to 1XWM. We find that there is indeed a correlation between the predicted success rates and the true task scores. This allows us to make a likely forecast that proprioception improves policy performance.

Given a true real-world success rate gap of 15% between two policies, a World Model with 70% accuracy can accurately predict the better policy with 90% success. Given that we see a consistent predicted performance gap across checkpoints, and that we can evaluate policies on identical starting states, we can be even more confident in such a verdict.



As another experiment, we compare using two different image encoders for a policy. We compare the most promising predicted checkpoints from both policies, and see that the predicted better ViT-L model does indeed perform better in the real world. For more experiments, see our technical report.

As we deploy robots in home, we will need to move away from task-specific evaluation and towards production-level evaluation, capable of handling a wider, more ambiguous array of full-body manipulation tasks and objects. Improving the generalization capability and accuracy of 1XWM will be the
first step towards this goal.

LIMITATION #1

LIMITATION #2

We've shown promising results in scaling up the 1X World Model (1XWM) to predict the future. As we increase the amount of training compute and real world NEO data, 1XWM's accuracy of predicting whether NEO accomplishes a task also increases. 

The implications of accurate hallucinated rollouts extend far beyond rigorous evaluation for humanoid robots. Consider the implications of what happens when the data generated by 1XWM – the joint distribution over all the sensor readings and actions observed by the robot – become indistinguishable from the real data. This moment has already happened for LLMs, and we think that it will also soon be true for synthetic robotics data. Data and evals are the cornerstone of solving autonomy, and 1XWM provides a unified path for tackling both challenges.

1X World Model

Having complete and unfettered access to all parts of the world is important for NEO to accomplish tasks in home environments, and to enable our Redwood AI to learn from the broadest set of physical interactions possible. Our latest RL controller provides Redwood with a complete mobility toolkit to access the world, including natural walking in any direction, sitting, standing, kneeling, getting up from the floor, going up and down stairs with stereo vision. Bringing all of these capabilities together for the first time in a unified controller represents an important milestone in unlocking the full potential of humanoid robots.

SIT & STAND

Full stairs

Bridging Natural Movement with Omnidirectional steering

The past decade has seen rapid progress in legged robot locomotion, primarily driven by torque-transparent actuators, deep Reinforcement Learning (RL) algorithms, and GPU-accelerated simulation. Using off-the-shelf software packages and a desktop GPU, it is now possible to train a bipedal robot to stand upright in simulation and follow walking direction commands in under an hour. Even though the policy is trained completely in a simulation environment whose physics are merely an approximation of the real world, it is trained on so many randomized physics parameters – e.g. friction, mass, sensor noise – that the model ends up being robust to the real world’s physics parameters. Once trained, this walking controller can receive walking direction commands from either the teleoperator or the AI model (e.g. Redwood), and then translate those high level directions into the dynamic, contact-aware interactions with the world.



Beyond walking and turning, bipeds can also perform side-stepping. This is useful for navigating the close quarters of a kitchen or the space between the sofa and the coffee table, where the footprint would be too small for a wheeled base robot.



However, these basic walking RL controllers often require additional “shaping rewards” to achieve a natural human-like gait in all directions. These tend to be highly specialized to walking, which means that the same process of hand-tuning rewards must be repeated for every new behavior. Gait patterns can vary based on the direction of walking, so this often requires unique shaping terms based on the direction of motion.

Is there a more scalable way to increase controller capabilities without hand-written shaping rewards for each of them? One method is to collect motion capture references from humans moving naturally, retarget them to NEO’s joints and body, and then train the RL controller to match those kinematic reference trajectories.

Because the reference only specifies where the body should be, the RL controller still needs to figure out how to keep the robot stable, while “keeping tempo” and tracking the reference trajectory as closely as possible.

Using these techniques, it is possible to over-fit a policy to track a single human motion capture trajectory and achieve very dynamic and fluid motions such as dancing or walking. Examples of “single trajectory replay controller” are shown below for natural running and pivoting:

Pivot



These behaviors, while elegant, are not readily useful for general purpose tasks as they only can replay a single trajectory. They do not expose a steerable interface with which a high level policy like the AI model can execute the right actions. It is also not obvious how to transition smoothly from one reference to another, as motion capture datasets rarely include the transition behaviors between arbitrary task-centric motions, e.g. switching from shuffling quickly side-to-side to a skipping motion.

To handle multiple trajectories, we could train the controller to follow multiple mocap references, taking in the encoded kinematic trajectory as an input. However, this approach encounters a teleoperation UX problem: it is not obvious how one provides the high-dimensional kinematic trajectories at test-time using a more limited input device like a gamepad joystick or a VR controller. The model is trained on high-dimensional kinematic trajectories with nuanced rhythm and periodicity, but the commands provided by teleoperation are coarse-grained, which results in the RL controller interpreting it into an unnatural gait.

How does one achieve steerability and robustness, while still achieving the fluidity of motion capture-based RL training? We developed a two-stage controller, consisting of a high level “kinematic planner” to synthesize kinematic targets that resemble human motion capture data, and a low level controller that tries to achieve those plans.



The lower level RL controller takes as input a kinematic reference trajectory of body poses that it must attempt to track while maintaining balance. This is paired with a high level motion generator model that is trained with supervised learning to convert input commands like joystick direction to the richer kinematic trajectories. The generative model also plays the role of smoothing transitions during behavior changes.

An important reason for having robots with legs in the home is to traverse stairs. We develop a “stair mode” in our controller which engages the use of stereo RGB vision to infer the height of the floor around NEO.

To climb up and down stairs in a graceful way, NEO’s RL controller must anticipate the necessary height of each step well in advance of making contact with that step. Unlike most humanoid robots, which employ a time-of-flight depth sensor or a lidar to estimate the floor plane, NEO’s RL controller is purely vision based. Depth is predicted directly from the RGB stereo pair, and this is fused with the proprioceptive history for NEO to figure out how and where to step.



Stairs are not always idealized; through domain randomization in simulation, the controller is also robust enough to support side-stepping and handling stairs of mixed heights.

There are numerous home chores that require NEO to work at floor height for extended periods of time: removing a stain from the carpet, reorganizing the bottom drawer of a cabinet, packing a suitcase, and sorting socks. We extend our RL controller to be able to safely sit, kneel and lie down on the floor, as well as get back up from each of these poses.

Kneel

Sit & stand

The controller provides an “action interface” for which teleoperation or Redwood AI is able to interact in a safe, contact-rich manner with the physical world. To demonstrate the controllability of the natural walking behavior, we fine-tuned the Redwood model to do a soccer ball dribbling task.

Here is Redwood interacting with this new RL controller, where it predicts whole body joint targets and walking pelvis velocities from vision. The controller then translates those intents into the specific forces applied by the leg to walk in the direction of the ball.

We have developed the first general-purpose, fully AI and teleoperation compatible controller that unlocks the full kinematic workspace that is available to a bipedal humanoid robot. This will enable us to train Redwood AI to fully explore the entire state space of the home: every high and low shelf, every nook and cranny, every floor.

We will then use that data to make an AI like no one has ever seen.

Redwood AI: Mobility

We are excited to introduce Redwood, 1X’s breakthrough AI model that we will be deploying to homes. Redwood is a vision-language transformer tailored for the humanoid form factor and capable of performing end-to-end mobile manipulation tasks like retrieving objects for users, opening doors, and navigating around the home. Redwood empowers NEO Gamma to learn from real-world experience, on top of hardware designed for compliance, safety, and resilience.

: Handles variation in tasks—like picking up never-before-seen objects in unfamiliar locations. Trained on a large dataset of teleoperated and autonomous episodes from EVE and NEO, Redwood exhibits emergent behaviors such as choosing

 Redwood is among the first VLAs to control locomotion jointly with manipulation, enabling bracing and leaning behaviors during manipulation.

 Allows NEO to position itself precisely for tasks, perform actions that require movement across space, and manipulate objects while on the move.

Redwood is compute-efficient and runs fully on NEO’s onboard embedded GPU.

In order to power autonomy on both EVE and NEO platforms, Redwood fuses pre-trained language embeddings, vision tokens from a pre-trained vision transformer, and proprioception embeddings from a sequence of joint positions and joint applied forces. These are passed through several more transformer blocks, which extract a latent representation vector. We decode this representation into EVE or NEO actions using a 



Generalizing to manipulating new objects in locations not seen in the training data is crucial for the model to work in home environments, where the home is never in the exact same configuration twice. This is achieved by training on a diverse dataset gathered on NEOs in 1X offices as well as employee homes.

To further improve generalization to new scenarios despite its small size (160M parameters), Redwood is trained not only to predict actions, but a variety of “cognitive” prediction targets like estimating the current location of NEO’s hands and relevant objects in image space. These cognitive tasks help ground NEO’s visual representations and allow it to generalize better to unseen environments, despite having a small model size. Below, we show a continuous take of Redwood being able to grasp unseen bottles from a variety of locations.

Whole Body Control and Multi-contact manipulation

Manipulation and locomotion behaviors are typically decoupled in most robotic systems. However, manipulation in the home necessitates going beyond picking small objects on counter-tops and tables: humans use their legs, hips and spine to bend down to pick toys and clothes off the ground, and lean into heavy doors when pushing them open. These “whole body control” tasks make it impossible to cleanly separate locomotion and manipulation.

To enable similar capabilities on NEO, Redwood predicts not only the arm and hand commands, but also walking, manipulation, and pelvis pose commands simultaneously. This greatly expands the kinematic reach and payload capacity NEO can work with.



Coordinating all parts of the body to engage with the environment also enables multi-contact manipulation, such as bracing a hand against a wall when pulling open heavy doors.

Solving chores requires combining manipulation with navigating across the home. In real home tasks, the objects of interest are rarely all in front of the robot at the start. Furthermore, navigating to and getting close enough to an object to grasp it needs to take into account the way the model will choose to pick up the object. If one does not train navigation skills jointly with those for manipulation, then a separate navigation stack may fail to position the robot in an optimal position to grasp the target object. Vice versa, if manipulation behavior does not take into account navigation behavior that might follow it, this could lead to unwanted collisions or carrying the object across the room in an unsafe way.

To that end, Redwood is trained on a large diverse set of object navigation and pick-and-place demonstrations within the home and is trained to plan navigation and manipulation behaviors jointly. An emergent property of training from these demonstrations is that Redwood can automatically decide to use the left, right, or both hands to pick up an object.

Running Redwood onboard allows NEO to be deployed in more diverse environments: in basements, in the garden, in homes with spotty Internet infrastructure, in wilderness campsites.

To that end, Redwood is a 160M parameter transformer model that runs on NEO’s onboard GPU at around 5hz. To pack as much intelligence as possible into a relatively small number of parameters, we’ve found that the additional cognitive losses help with grounding the representations, especially in unseen environments.

Voice control is an intuitive interface to interact with general-purpose robots in the home. Using an offboard speech-to-speech LLM, we extract the goal the user intends to command NEO with from a conversational context, and then convert the command into a vector offboard using a sentence encoder. This vector is then passed as an input into the Redwood model, which is trained on thousands of such text embeddings.

Large-scale behavior cloning methods typically only imitate successful demonstrations. Redwood is trained to learn from both successful and failure rollouts, allowing NEO to improve from any interaction it has with the world regardless of success. The failure rollouts provide supervision signals on the cognitive prediction heads, which helps prevent overfitting to a relatively narrow distribution of states seen during successful demonstrations. The successful demonstrations supervise both the action diffusion heads and the cognitive predictions.

We think that general purpose autonomous humanoids, with their intelligence incubated in the home, will be a generation-defining technology that reshapes quality of life for the elderly, for busy parents, and for use cases that scarcely cannot be imagined today. We’re looking for driven, high-agency engineers to help us scale up Redwood to the next level, and to deploy a production-grade AI in as many homes as possible this year. If working on Redwood excites you, we have a large number of open roles in Palo Alto:

Research Engineer, Reinforcement Learning

Redwood AI

This move comes as we prepare for large-scale deployment of NEO, your friendly home humanoid. With the aim of having hundreds of NEO’s arrive in homes across the United States in 2025, followed by rapid expansion– Mustally will be instrumental in enabling our growth along the way.

“I am excited to welcome Mustally to the Executive Team at 1X,” said Bernt Børnich, CEO of 1X Technologies. “He will play a key role in helping us accelerate our growth plans from our global HQ in Palo Alto, California.”

Mustally brings a strong track record of leadership across Fortune 500 companies and high-growth startups. He has built finance organizations from the ground up, raised over $50 billion in capital through public and private markets, and led transformative commercial partnerships and M&A initiatives. Prior to joining 1X, Mustally served as Managing Director, Global Treasurer, and Head of Financial Services at Lucid Group, and held senior finance roles at Herc Holdings, Hyundai Capital, and National Grid, following a career in investment banking and management consulting.

“I’m thrilled to join the 1X team, which is well positioned to lead the physical AI and robotics sector,” said Mustally Hussain. “My focus will be on driving financial discipline and strategic growth as 1X scales its manufacturing and commercialization footprint in the U.S., followed by global expansion.”

Welcoming Mustally Hussain as CFO

Developing AI for humanoid robots involves tackling many open research challenges – in safety, dexterity, visual understanding, and much more. It helps to compare notes with other labs tackling similar challenges, in order to accelerate progress towards a future of NEOs doing all the tasks needed to keep your home in order autonomously.

To that end, 1X AI and NVIDIA are pleased to announce our research collaboration effort. As a first step, the teams worked together to prepare an autonomy demo for Jensen Huang’s GTC 2025 Keynote, featuring NEO doing a dish loading task autonomously.

The following is a look into where, how and when we taught NEO to do the dishes with the NVIDIA team.



To make this collaboration possible, the 1X AI Team created a dataset API for NVIDIA to access data collected from 1X offices and employee homes, and an inference SDK to serve model predictions at a continuous 5Hz vision-action loop using an onboard NVIDIA GPU in NEO’s head or an offboard GPU.

A crucial step when onboarding a new learning codebase onto NEO is to verify correctness, i.e., overfitting a baseline model to a small amount of demonstration data and making sure that the time synchronization between images and actions is consistent all the way from data collection to training to runtime inference.

We demonstrate this by working with the NVIDIA GEAR team to train a single end-to-end neural network based on the 

 model to autonomously grasp a cup, hand it over to the other hand, and place it in a dishwasher to showcase how NEO fits compactly into the kitchen space while still having the kinematic reach to carry the cup from sink to dishwasher.

This is a good “first task” to learn because it checks for basic compatibility of an external research codebase with the logging and inference architecture. The obvious next step after verifying correctness is to feed thousands of hours of internally collected NEO data into the model.

Over the course of a week, our teams developed this model at a 1X employee’s home, swapped notes on action spaces, control frequencies, and other imitation learning tricks needed to get good performance on NEO Gamma. Moments like these – where friends are just hanging out in the home while a NEO does dishes in the background – will soon become an everyday occurrence.



When working in homes, the safety of NEO Gamma becomes particularly evident. NEO’s mechanically compliant and safe design allowed engineers to get in extremely close quarters with the robot while testing a variety of experimental architectures.


Our teams are both looking forward to continuing to learn from each other and push the industry forward. We hope that together we can accelerate our path to humanoids living and learning among us and providing a helping hand wherever one is needed.


 team and Jensen Huang for being gracious with their time and having us be a part of the NVIDIA experience at GTC. 

Additionally, thank you to 

Welcoming Vikram Kothari as VP of Operations

Opening our new HQ in Palo Alto

1X World Model

Stories

1X & NVIDIA Research Collaboration

1X Acquires Kind Humanoid

Cooking With NEO Beta and Nick DiGiovanni

Podcast: 1X CEO, Bernt Børnich on the Venture Europe Podcast

NEO Featured in NVIDIA GTC Keynote

Redwood AI

Company

Welcoming Vikram Kothari as VP of Operations

Opening our new HQ in Palo Alto

Welcoming Mustally Hussain as CFO

1X Strengthens Leadership Team with New Hires

1X Attends NVIDIA GTC

ExperienceNEO