Clarifying language and naming across Captions, Transcriptions, and Translation to support accessibility and user understanding.
Role
Content designer and strategist
Goal
To clarify language and naming across complex similar terms
Stakeholders
Content Design, Product Design, Product Management
When Zoom was preparing to launch a new Captions feature, we encountered significant conflicts in language and naming with existing Transcriptions and Translation features, exposing deeper issues in how product language was used across the platform. I led a cross-functional effort to evaluate, restructure, and clarify Zoom's language framework to support accessibility, accuracy, and user understanding.
The introduction of Captions revealed deep inconsistencies in how terms were used across product surfaces:
These inconsistencies increased cognitive load, reduced predictability, and created confusion for users who depended on clear text representations of spoken content.
Full audit across all relevant surfaces. I mapped every instance of language use (in-meeting UI, cloud recordings, admin settings, docs) to identify patterns of inconsistency. We assessed:
Outcome: The inventory became the baseline for understanding scope and ambiguity.
Accessibility anchored the framework. Working with the Accessibility team, we:
Outcome: The framework was anchored in external conventions and user expectations, not just internal preferences.
Structured naming with three core dimensions:
Outcome: This multidimensional approach ensured every term was distinct, predictable, and scoped—so users could infer meaning from structure rather than guesswork.
The final structure simplified and clarified the full ecosystem. We also standardized the language for manual tools—e.g., Assign manual captioner and Assign manual transcriber—making roles and workflows explicit and reducing ambiguity for users and assistive-tech workflows.
| Final name | Description | When it's used | Why this name |
|---|---|---|---|
| captions | Real-time text displayed visually during a live meeting | Live meetings where users need immediate, on-screen comprehension | Aligns with accessibility standards and user expectations for live, visual text |
| translated captions | Live captions automatically translated into another language | Live meetings with multilingual participants | Preserves the captions mental model while clearly signaling language transformation |
| transcription | Scrollable, live text feed representing spoken audio | Live meetings where users want a readable, referenceable record | Differentiates text-as-record from text-as-visual aid |
| translated transcription | Live transcription translated into a different language | Live meetings requiring readable, translated text | Maintains structural consistency across the framework |
| cloud recording transcript | Post-meeting transcription tied to a saved cloud recording | After a meeting ends, during review or sharing | Anchors the text to its asynchronous, post-meeting context |
| manual captioner | Assigned role for a participant providing human-typed captions | Meetings requiring high-accuracy or compliance-driven captions | Makes human authorship explicit and distinct from automation |
| manual transcriber | Assigned role for a participant providing a human-generated transcription | Meetings requiring manual transcription or editing | Mirrors captioner's language while clearly defining the purpose |
Terminology guide documentation
We took the finalized terms and added them into our product terminology guide so it could be shared with marketing, engineering, and localization. This ensures consistent terminology across teams—whether they're writing copy, building features, or translating the product—everyone references the same source of truth.
Grounding definitions in how people experience language—not just internal specs—drives clarity.
Language aligned with assistive contexts supports everyone, not only a critical demographic.
Even micro-level naming decisions can dramatically affect how intuitive a feature feels.
A framework that anticipates new feature touchpoints avoids future naming collisions.
Reduced confusion
Eliminated overlapping or misleading terms across product surfaces.
Source of truth
Cross-team alignment on naming and language.
Improved accessibility
Language defined by function and presentation.
Future foundation
Scalable framework for naming work and taxonomy efforts.