We have all seen the initial AI-generated videos of humans moving. The lack of ability to reproduce authentic human-like behavior makes us feel creeped out, in an innately human way. The opposite of that is true when we see a well-animated, completely made-up character like those from Pixar movies - it makes us feel human and warm on the inside. Now there is a robust AI startup bringing exactly that, deeply human behavior to AI models, and they are leveraging decades of hard work to make this available to everyone: from researchers and indie developers to the biggest global corporations.
Meshcapade is taking a unique approach by focusing on the most complex and nuanced entity of all: the human being. Co-founded by Naureen Mahmood, Talha Zaman and Michael Black, Meshcapade is bridging the gap between AI and human behavior after years of groundbreaking research. Their mission? To create AI avatars that can see, understand, and interact with the world just like real people do, bringing warmth and authenticity to digital human representations.
Meshcapade's journey began at the prestigious Max Planck Institute for Intelligent Systems in Germany. Michael Black, a renowned AI researcher, had been working on human body modeling for years. His experience ranged from the first AI boom in the mid-1980s to groundbreaking work at Xerox PARC and Brown University. Black's vision of creating foundational technology for estimating 3D human body shape and motion led him to co-found BodyLabs in 2013, which was later acquired by Amazon in 2017.
Naureen Mahmood, now CEO of Meshcapade, brings a unique perspective to the table. With a background in interactive systems and computer vision, Mahmood's journey took her from Pakistan to Texas A&M University and finally to the Max Planck Institute, where she became the go-to person for 3D scanning and motion capture projects.
Talha Zaman contributes engineering leadership as the CTO of Meshcapade, built from years of experience working at both big tech companies and tiny startups.
The trio's complementary skills and shared vision led to the founding of Meshcapade, a company dedicated to building embodied intelligence that goes beyond traditional AI capabilities.
At the heart of Meshcapade's innovation is SMPL (Skinned Multi-Person Linear Model), a parametric model of human behavior that compresses complex human actions into just 100 numbers. This model captures motion, soft tissue movement, facial expressions, and more, providing a compact representation of human behavior that large language models can understand.
"SMPL is not just another academic model," explains Black. "We built it using standard computer graphics techniques and then used machine learning to make it as good as possible. It turned out to be better, more accurate, and much faster than previous models, and completely compatible with existing systems."
This compatibility has led to SMPL becoming a de facto standard in both academia and industry. Meshcapade has made SMPL and related technologies freely available for academic use, fostering a growing community of researchers and developers around their technology.
Meshcapade's approach goes far beyond traditional motion capture methods. Instead of relying on specialized suits and controlled environments, their technology can capture 3D human motion from any video, considering the context and surrounding environment.
"We're capturing the behavior of real people doing real things in the real world," Mahmood explains. "This involves speech, music, dance, and all the nuances of human interaction. We think in a very multimodal way - text, audio, video descriptions, and even AI-generated descriptions of what's going on."
This scalable approach has allowed Meshcapade to amass a rich dataset, crucial for training their AI models. The company leverages collaborations with various institutes worldwide to collect motion capture data, ensuring proper consents and permissions for commercial use.
Meshcapade's efforts are paying off. In the past year alone, the company has seen a 130-fold increase in avatar creation on their platform. With no marketing budget and a minimal sales team, they've attracted over 73,000 registered users, including some of the world's largest animation, gaming, and sports companies.
"We've grown purely through word of mouth," Mahmood reveals. "We've been releasing products onto our platform, and it's been growing organically. These are customers from some of the biggest animation, gaming, and sports companies or sports teams in the world right now."
The startup's achievements have not gone unnoticed in the investment community. Last summer, Meshcapade closed a seed round led by Matrix Partners, turning that investment into a state-of-the-art platform that, according to Black, "nobody else has."
As impressive as recent advancements in generative AI have been, Meshcapade's founders argue that there's still a significant gap when it comes to understanding and representing human behavior.
"Generative AI doesn't understand humans very well yet. It makes lots of mistakes," Black emphasizes. "If you notice the woman in the Tokyo street video generated by Sora [OpenAI's text-to-video model], she's got some little problems with her legs. They don't quite work properly."
Mahmood elaborates on this point: "All these video diffusion models really struggle with generating humans doing normal, complex human things. Just walking was hard enough. If you saw any of the Olympics videos from generative models, they were not good."
One of the key challenges in this field is the complexity of human behavior and the difficulty of labeling it effectively. As Black explains, "If we've learned anything about generative AI in the last few years, it's that the quality of your labels matters. Labeled data is key. And when you think about human movement, how do you label it?"
Current video diffusion models often rely on text labels, which Black argues is inadequate for capturing the nuances of human motion. "Most of what humans do has no name at all, and we never needed a name. Because, in a sense, humans are a generative model of human motion."
This is where Meshcapade's SMPL model comes in. By providing a compact, numerical representation of human behavior, it offers a more effective "language" for AI models to understand and generate human motion.
The potential applications of Meshcapade's technology span various industries. In healthcare, for instance, it could enable more empathetic AI-driven interactions.
"Vinod Khosla had a vision in which medicine is widely available and almost free, delivered over the internet," Black recalls. "But if it gets delivered by a chatbot, that's not going to be very effective. There are estimates that 55% of human communication is non-verbal. If you're getting bad news from a doctor, those non-verbal cues, the empathy... that isn't going to come from a chatbot. The AI doctor has to see you, and you have to see them."
This doesn't necessarily mean creating exact digital replicas of human doctors. As Mahmood suggests, "It might be AI actors, but yeah, we have no interest in cloning Hollywood actors. We imagine the creation of really artificial characters. They may not even look entirely human."
The key, according to Black, is the behavior and understanding behind these AI avatars. "What's important is that when we communicate with a person, it's timing. I see how you're reacting, you're nodding your head, and I react. That timing has to be there, otherwise it feels totally wrong."
Training current video diffusion models requires enormous computational resources and energy consumption. Mahmood suggests a solution: "Combining different kinds of methods can bring significant benefits , and being able to recognize and identify where this is possible is very important. . Combining video diffusion with SMPL as a control signal is a huge win for companies training video diffusion models. We've internally trained this kind of model very quickly & cheaply just so we could test this hypothesis. We're not a video diffusion company, so now we're using this to help other companies who are training large video diffusion models."
The company is also pushing the boundaries of what's possible with their technology. They've developed what they claim is the world's first 3D human foundation model, a large vision-language model that understands 3D human pose. This model can exploit the rich knowledge that a large vision model has about the world and apply it to the challenging task of estimating and reasoning about 3D human behavior.
Looking ahead, Black sees enormous potential in what he calls "four-dimensional understanding of humans and their behavior." He argues that this goes beyond the current excitement around spatial intelligence and robotics. "There is nothing more important to model in the world than us humans," he asserts.
As AI continues to advance, Meshcapade's human-centric approach could prove crucial in creating more realistic and empathetic artificial agents. Their work underscores the importance of understanding and accurately representing human behavior in the next generation of AI technologies.
The company's customer pipeline already includes top NASDAQ companies and major sports teams, signaling strong industry interest in their technology. However, the founders remain focused on their larger goal. As Black puts it, "We want to change the world, and the world has to change with us. One bit at a time, but it's going well."
Meshcapade's journey from academic research to a promising startup highlights the potential for AI to become more human-like in its understanding and interactions. By focusing on the nuances of human behavior - from motion and facial expressions to non-verbal cues and contextual understanding - they're paving the way for a new generation of AI that could transform industries ranging from entertainment and sports to healthcare and beyond.
As we stand on the brink of this new era in AI, it's clear that companies like Meshcapade will play a crucial role in shaping the future of human-AI interaction. Their work reminds us that as we push the boundaries of what machines can do, we must not lose sight of the uniquely human qualities that make our interactions meaningful. In bringing the human dimension to AI, Meshcapade is not just advancing technology - they're helping to ensure that our AI-augmented future remains profoundly human.