The most natural human-computer interface is your voice.
It's already working while you're still talking. While Stanley, Pine's CEO, is still asking what needs his attention, Pine AI is already searching his inbox and calendar, so the moment he stops, the answer is on screen: a $3,000 invoice, a bug report, an iOS build ready to ship. And it talks through his interjections instead of cutting off at every little sound, the way a person would.
Voice is the highest-bandwidth way to talk to a computer.
Look again at the clip above. Before Stanley finishes his first question, Pine AI is already at work, and by the time he stops, the answer is waiting. That is the whole idea in one exchange: it listens, thinks, and works at the same time, so talking to it feels less like running an app and more like talking to a capable person.
The bet behind the product is simple. You talk about four times as fast as you type, and your tone and urgency come through for free; typing is a translation step between what you mean and what the machine gets, and voice skips it. Say "voice AI" and most people picture one of three things: a bot that takes calls, a companion to chat with, or a voice for a podcast. We're after something bigger — voice as the main way you actually get work done. For as long as we've had computers, it has played second fiddle to the keyboard. Once the interaction gets good enough, that flips.
There's no app to navigate, no form to fill in, no commands to memorize, and no session to start. The conversation is the interface; you just talk, the way you would with a person. OpenClaw went viral on exactly that feel, sessionless and close to human; Pine AI carries it into voice.
And it isn't voice-only. Talking is the quick way in, but for the way back, reading beats listening, so Pine AI answers with a generative UI and keeps voice for discussion and confirmation. The parts that have to be exact, you type. As in the clips, you're reading the screen, listening, and talking at the same time. That's the two-way, multimodal loop the Knowledge Navigator imagined.

You read the screen, listen, and talk at the same time.
Massive text or structured data is the one place voice is the wrong tool. Reciting a half-dozen booking constraints back for confirmation drowns the conversation, and reading a wall of search results out loud is worse. So Pine AI keeps voice for the discussion and gives the dense parts a home in the app — cards and buttons to scan and tap through as the work happens, plus a task list and wiki that hold the real state of it: what's done, what's pending, what was decided. That's the part a pure talk-only agent misses; the sessionless feel is great, but when the transcript is the only record, the ground truth gets lost in it.
Tap to confirm; talk to decide. A restaurant booking with tricky constraints — an exact time, a hard no-go window, a couple of special requests — turns into a tap-through confirmation card with the spec laid out, while Pine AI keeps talking the options through instead of reading every one back.
Voice for the gist, card for the rest. Asked about hard-to-book campsites for a holiday weekend, Pine AI doesn't pretend slots exist that don't — the long write-up of reservation rules goes onto a card on screen rather than getting read aloud, and voice carries just the gist while the search keeps running.
Sensitive information is the other place voice is the wrong tool, and we're extremely careful with it. Pine AI will stop you mid-sentence rather than let an SSN, password, or account number land in the voice channel — those go to a private typed field instead, where they aren't overheard or saved in a transcript.
Say it out loud, or type it in private. Booking flights and hotels for the whole team on a budget, Pine AI collects everyone's details the safe way. When it reaches an SSN it stops you from reading it aloud and opens a secure field to type into instead. The booking itself runs as a background task while you keep talking.
The dream is old. The pieces only just arrived.
Ask people what they actually want from AI and you hear the same few names: Samantha from Her, Jarvis, the Knowledge Navigator. It's all one wish. A voice you talk to that goes and does things for you.
Apple's Knowledge Navigator concept, 1987. The 39-year-old vision of a voice you simply talk to that manages your day, places calls, and gets things done.
Apple put that wish on film back in 1987, under then-CEO John Sculley. The Knowledge Navigator showed a professor talking to an assistant that checks his calendar, pulls up research, and rings a colleague, all by voice. It stayed fiction for nearly four decades, because the two halves of it never came together.
- Agents that can act, the ones that use a computer, run tools, and write code, are typed, slow, and silent. No voice.
- Voice assistants that feel alive, the real-time ones you can interrupt, are shallow. A web search here, a reminder there, but no real reasoning, no serious computer use, and no way to direct other tools and agents.
Pine AI closes that gap — one general agent that does both. It answers naturally and in real time, it does the deeper thinking in the background, and it acts: calling tools, doing research, editing files, writing code, and directing other agents like Claude Code and Codex (rolling out soon).
It books, it calls, you go do something else. Pine AI reserves a table for two on Sunday evening, then places a real phone call to invite a friend along. Both run in the background while you do something else, tracked on a reservations card that updates as they land, and it checks in only when it needs a detail like the time.
Listen, think, speak, and act, all at once.
Most voice agents today fall into one of two categories: turn-based and full-duplex.
Turn-based: A turn-based voice agent works like a walkie-talkie. You talk, it waits until you finish, then it answers. This makes the system easier to build, but conversations can feel slow and unnatural. The agent cannot easily interrupt, react while you're speaking, or provide simple acknowledgments like "uh-huh" and "I see" at the right moments.
Full-duplex: A full-duplex voice agent works more like a person. It can listen while it speaks, react before you've finished a sentence, handle interruptions, and adapt continuously as the conversation unfolds. Instead of taking turns, both sides can communicate at the same time.
Our approach is a full-duplex voice agent built around two complementary systems: fast thinking and slow thinking.
Fast thinking: The fast-thinking model powers the live conversation. It listens, thinks, and speaks in real time with very low latency. It understands both the user's speech and its own speech, allowing it to maintain a continuous view of the conversation. It also tracks conversational signals such as pauses, hesitation, interruptions, turn-taking cues, and backchannels like "uh-huh" and "oh", making interactions feel more natural and fluid.
Slow thinking: The slow-thinking model runs alongside the conversation. It observes what is being discussed, reasons about the user's goals, and takes actions on the user's behalf. It can plan, retrieve information, use tools, and manage longer-running tasks while the conversation continues uninterrupted.
Fast thinking keeps the conversation flowing. Slow thinking gets things done. By separating real-time interaction from deeper reasoning and task execution, the agent can respond naturally in the moment while simultaneously working behind the scenes on the user's behalf.
We have been running this architecture in production since early 2025, across hundreds of thousands of real phone calls and just as many hours spent helping people get things done. Much of that work is serious: real negotiations, real commitments, and real money on the line.

Frontier intelligence, in real-time voice conversations.
Everything above is the real product, recorded as-is. But a video only shows the calls we chose to record. For an independent, measured check, we ran Pine AI on τ²-bench, a benchmark from Sierra, and specifically its hardest variant, the live voice track. To pass, an agent has to do two hard things at once: reason like an expert, and hold a live conversation while doing it.
What τ²-bench actually is
τ²-bench doesn't quiz an agent on trivia. It drops it into an actual customer-service job. On the other end is a simulated customer, itself an LLM, with a goal and a personality. The agent has to work out what they need and actually do it, using the same tools a human rep would (look up an account, change a booking, run a diagnostic, issue a refund) against a simulated business system, all while following that domain's written policy. There are three domains, airline, retail, and telecom, and 278 tasks in total.
And it's graded on the outcome, automatically. Did the database end up in the right state, and were the rules followed? Not "did it sound helpful." It's all-or-nothing.
The voice track keeps all of that but runs it as a live phone call instead of typed turns, which forces both at once. We'll take them one at a time.
Why it takes frontier-level intelligence
Underneath the conversation, each task is a hard problem in its own right, and this is the part that matters most. Take one airline call. A customer has several flights booked and wants to change or cancel some of them. The agent has to know the airline's policy cold (which fares are refundable, what a change costs, what's allowed on which segment), pull up each reservation, work out which moves are actually permitted, and call exactly the right tools in the right order to make them happen. All while the customer keeps pushing for things the policy doesn't allow.
None of that is a scripted FAQ. Pick the wrong tool, misread a fare rule, or cave to a pushy caller, and the task fails outright. The telecom calls go further: one vague complaint can hide half a dozen separate faults that have to be found and fixed one by one. Navigating this reliably takes frontier-level reasoning and tool use. A weaker model gets lost.
Why it takes human-like voice interaction
The voice track wraps all of that in a live phone call, which brings its own kind of hard. We sat and listened through hundreds of them.
- The caller is impatient, and will give up on you. Leave a stretch of dead air while you "think" and the simulated customer cuts in with "…are you still there?" Keep stalling and it disengages, and the task fails. Latency isn't a nice-to-have here; it's the single most common thing that sinks a call, and the tension between answering fast and thinking enough is most of what the voice track tests.
- The audio is a mess, the way real calls are. The caller talks over the agent and barges in mid-sentence, the line cuts out, the signal is poor and laggy, background noise comes and goes, and now and then the caller sneezes. The agent has to stay on task through it, and handle barge-in cleanly.
- It has to listen accurately. The audio is phone-grade and noisy, and tasks often hinge on catching a name, email, or account ID by ear, spelled out over a bad line. Mishear one character and the action fails.
What the results show

Pine Realtime 1.0 Preview against the other real-time voice agents on the published τ²-bench voice leaderboard.
That's the top score on τ²-bench's voice track, by a wide margin.
How we use it ourselves.
The real test of a product is whether the people who build it live on it. We do. A benchmark call is one bounded task; here is the same capability inside our own work: our research, and our hiring.
Whisper coding
Our own research is where we push Pine AI furthest. The fashionable idea of an autonomous researcher is to hand the agent the whole problem and walk away. We don't. The researcher stays the manager and the advisor, the one with the judgment; the agent does the coding and runs the experiments; and we work the problem together, out loud. We call it whisper coding: talking the work through while the code changes in the background.
We have written more than ten papers with Pine AI. The first, Incompressible Knowledge Probes (IKP), is already public; the rest are in human review and out soon. Given a starting idea from a human researcher, the agent ran every experiment and wrote the paper itself, down to the last line of LaTeX. We treat each agent as a digital co-worker that works around the clock. Every day:
- Morning. We read the report from the night before, then discuss it with the agent by voice: what we learned and what to try next.
- Daytime. It invokes Claude Code and Codex to write the code, run the experiments, and draft the paper, on local GPUs and through LLM APIs.
- Evening. It hands back another report. We discuss it before turning in, and it keeps working through the night.
That cadence turns out a paper in about a week. On a single task, the agent does in half a day what would take a human researcher a week.
The collaboration is what makes it work. Even the strongest agents today lack research taste: they struggle to carry a novel project forward alone, and when an experiment comes back negative, they patch and pivot instead of rethinking from first principles. So the judgment — the redesign, the more elegant idea — stays with us, and whisper coding is how we put it in: talking the agent through the turn it should take while it does the tireless work.
And it isn't just for researchers. Everyone on our technical team now works as a new kind of individual contributor, a manager of agents, each of us running dozens in parallel.
Autonomous recruiter
Our hiring runs on Pine AI as well. We brief a fully autonomous recruiter on the role by voice, and it sources candidates from LinkedIn and elsewhere, runs background checks and matching, screens and filters, and even conducts the first technical interviews.
It all runs on one bet: fast human guidance on top of tireless autonomous work. That is why we built Pine AI, and we're not keeping it to ourselves — agents like these are coming to everyone soon. We've even put it on hardware: a small device with no screen at all, woken by voice and running on its own over nothing but a network connection. Screenless voice is one of the places this is headed.
Come talk it through.
Understanding what you say and speaking back naturally are mostly solved. The hard part is the live interaction in between: staying real-time without going shallow. That's what decides whether you'd actually hand an AI your work, and it's what we build Pine AI for.






