Podonos / OnePin, Gauravi Linjara

Case snapshot

Role

Founding Product Designer · led design, one of two designers, 0→1

Team

10 people, founder (voice researcher), engineering, two designers (I led design)

Timeline

Q1, Q2 2026 · incl. 3 weeks on-site in Seoul

Tools

Figma · Figma Make · Claude Code & Python

What I did

Owned OnePin's production surface end-to-end: research, platform definitions, the chat-agent and node interface, and the automated pipeline. Designed and built by me, shipped live. Engineering built the eval backend the surface drives.

Impact

Collapsed a ~10-day localized-audio pipeline to hours, replaced a five-step manual chain with one decision layer, and removed 9,000+ throwaway QA audio files per script.

Constraints

A 0→1 product with no precedent, built to sit on top of any third-party TTS model across many languages.

02 · Problem

nobody knew if the voice was even right.

Can't judge quality

No shared way to tell if an AI voice even sounds human.

Which API, which language?

Every API wins for a different language. Teams just guessed.

Defensive over-production

One 3,000-line script → ~9,000 files, all sent for slow human eval.

A 10-day pipeline

Five hand-offs before a single line ships.

“

the signal · interviews + reddit

which API works best for Spanish?

AITubersPodcastersEnterprise broadcasters

Asked over and over, and every interview confirmed the same 10-day reality.

03 · The shift

from a 10-day chain to one layer.

A football match in English, headed for a global audience. Before vs. after OnePin.

The bet behind it. The obvious lever is faster generation, produce the ~9,000 files quicker. But the bottleneck wasn't producing audio, it was judging it: deciding which voice was right was the slow, human part. So OnePin isn't another generator, it's a decision layer that picks and evaluates, and the defensive bulk runs disappear.

a 10-day manual chain→

localized audio in hours, not days.

Ten days of hand-offs and defensive bulk runs, replaced by a layer any team, or agent, can run end-to-end.

the throughline

automate the taste, not just the task.

04 · Process

designed for humans, and for agents.

Straight to the people living the pain, then prototype fast and lock direction in a month.

in their own words

“I can run ten different TTS APIs, but I still can't tell which one actually sounds right for Spanish.”

Diego R.AITuber · Multilingual

“We over-produce everything defensively, thousands of files, just so a human can catch the bad takes later.”

Hana S.Audio Producer

“By the time a single line is approved, it's been through five hand-offs and about ten days.”

Marcus L.Production Lead

the bet

Another human-only TTS dashboard caps the product, it can’t plug into the pipelines customers already automate. So I built a chat-agent + node interface a person or an agent can drive.

reframe the question

Stopped asking "is this voice good?" and started asking "does it meet these testable criteria?", turning taste into a rubric a team and an agent could both act on.

prototype the decision layer

Designed one surface that picks the model and explains why, making the machine's judgment legible instead of burying it in charts.

validate against the manual chain

Tested the layer against the real 10-day workflow on live multilingual scripts, the decision surface collapsed it to hours.

ship the system, not screens

Wrote platform definitions from scratch and automated the manual steps as one-command skills, a human or an agent can run a production end-to-end.

05 · Designing the judgment

teaching a machine to hear taste.

OnePin is, at its core, an opinion about AI output. My job was to turn "is this voice good?", a gut call, into a flow a team and an agent could run the same way every time. It's node-based with a chat agent on top, so you can wire the steps by hand or just describe what you want, the same judgment runs underneath either way.

Script or voice note

Drop in text, a script, or a voice note, the raw material to localize.

correct

Fix the audio

Phoneme injection from a 4.4M-word IPA dictionary, normalization (“$1,250” → “one thousand two hundred fifty dollars”), and boosting to even out delivery.

validate

Score it against criteria

Score every line on naturalness, word accuracy, noise & pronunciation, then flag anything under the bar for a human, no more "sounds fine to me."

out

Ship what you chose

Pick the corrections & validators that matter for this job; OnePin returns the output that meets them.

models30+ TTS engines benchmarked per language

pronunciation4.4M-word IPA dictionary

validatorsnaturalness · word accuracy · noise · pronunciation

Instead of one hidden, one-size-fits-all pipeline, I exposed every correction and validator as a node the user, or an agent, can switch on or off. An AITuber and an enterprise broadcaster don't share the same bar for "good", so the judgment had to be tunable, not baked in.

the bet

A fixed, black-box pipeline forces everyone to trust one opinion of “good”, and gives an agent nothing to reason about. Instead, corrections and validators are selectable, so the same engine serves a casual creator and a broadcast team.

06 · Solution

the smart layer, made visible.

TTS makes voice. OnePin makes it production-ready.

One surface where a human, or an agent, drives a production end to end: describe the job in chat, watch it resolve as a live node graph, script in, corrected, validated, and ready to ship. Four design decisions made that judgment trustworthy:

1 in script or voice note

2 decide pick the model, show why

3 validate score, flag below the bar

4 out ship what clears

1Built for a human and an agent

One input a person types or an agent calls, the surface doesn't care which. That let the pipeline plug into automations customers already run.

principle · conversation as interface · copilot UX

2The judgment is legible

It doesn't just pick a model, it shows the score and the why, so a user can trust the call or override it, instead of a black box.

principle · explainable AI · verdict → reasoning

3Uncertainty is surfaced, not hidden

Lines that miss the bar are flagged for a human before they ship, friction placed exactly where it protects quality.

principle · appropriate friction · surface uncertainty

4Taste is tunable, not baked in

Every correction and validator is a node you switch on or off, so a casual creator and a broadcast team run the same engine at different bars.

principle · node-based UX · no fixed pipeline

the real thing, recorded

OnePin productionlibrarysettings

OnePin agent panel: describe the localization job in plain language and the agent routes the model, injects phonemes and explains its choice

OnePin node graph: script in, corrected, validated against naturalness/word accuracy/noise/pronunciation, ready to ship

the agent driving a real production, end to end

the OnePin production surface · some details redacted out of respect for the team

becoming my user

I learned the problem by living in it. Three weeks on-site in Seoul, whiteboarding the pipeline with the founder, a voice researcher, I learned to hear what "good" actually means: where a model's accent drifts, where a player's name is mispronounced, where a take is technically clean but lands wrong. Those sessions are why the validators are the ones a real localization lead would trust, not a generic quality score.

Whiteboard sketch of the OnePin flow: input, translate, model, validate, errors → notify, Seoul on-site

Whiteboard working through audio normalization and correction examples, Seoul on-site

behind the build · whiteboarding the OnePin pipeline, 3 weeks on-site in Seoul

The Podonos team together at NAVER D2SF, Seoul

the Podonos team · NAVER D2SF, Seoul

07 · How I built it

from Figma to shipped, with AI.

my process · from idea to shipped

own

ship, not just design

immerse

3 weeks in seoul

scope

judgment, not generation

systemize

build-ready components

build

figma make · claude code

ship

live · onepin.ai

iterate

validators keep learning

For me the Figma file was the starting point, not the deliverable. I designed in clean components and standard layout patterns on purpose, so the interface could be built straight from the design, then I took it the rest of the way myself.

From there I moved into Figma Make and Claude Code to turn the chat-agent and node interface into working code, and wrote the pipeline, phoneme injection, normalization, and the validators as Python skills a person or an agent can run. AI removed the execution bottleneck; the judgment and the taste stayed mine.

08 · Impact

what it changes, three ways.

For the user

Days → hours

Validated against the real 10-day manual chain on live multilingual scripts: localized audio in hours, no API guesswork, no defensive bulk runs.

For the business

Eval, as a product

Quality judgment that used to live in one engineer's head becomes a layer any team, or agent, can run, opening markets the manual effort never justified.

For the org

Agent-native 0→1

Platform definitions from scratch; the manual workflow automated so the steps disappear, built and shipped as a real system.

built & shipped · now live at onepin.ai

09 · Reflection

we turned a 10-day chain into one decision.

This wasn't automation for its own sake. It was about giving a small team back their judgment, replacing a five-step manual chain with one clear decision layer they could trust.

Every step was rooted in the real workflow, the throwaway files, the late human reviews, and designed to remove the waste without removing the human where it matters.

The work wasn't linear. It meant sketching, coding, reworking the rubric, and lots of back-and-forth between design and engineering until the bar felt right.

And it keeps sharpening. Teams teaching the validators their own taste, edge cases feeding back, the system should keep learning long after the first version shipped.

this work is under wraps.

the intelligent layer for global voice.

fluent in english, broken everywhere else.