Platform language framework

Clarifying language and naming across Captions, Transcriptions, and Translation to support accessibility and user understanding.

Background

Role
Content designer and strategist

Goal
To clarify language and naming across complex similar terms

Stakeholders
Content Design, Product Design, Product Management

Overview

When Zoom was preparing to launch a new Captions feature, we encountered significant conflicts in language and naming with existing Transcriptions and Translation features, exposing deeper issues in how product language was used across the platform. I led a cross-functional effort to evaluate, restructure, and clarify Zoom's language framework to support accessibility, accuracy, and user understanding.

The challenge

The introduction of Captions revealed deep inconsistencies in how terms were used across product surfaces:

  • Transcription vs. Translation — Used interchangeably in UI and documentation
  • Subtitles, Captions, Live Transcript — Applied inconsistently depending on surface and context
  • Accessibility users — Unclear about what each term meant and when to use them

These inconsistencies increased cognitive load, reduced predictability, and created confusion for users who depended on clear text representations of spoken content.

The approach

Language audit

Full audit across all relevant surfaces. I mapped every instance of language use (in-meeting UI, cloud recordings, admin settings, docs) to identify patterns of inconsistency. We assessed:

  • Scope — Where language was most ambiguous
  • Overlap — Terms used interchangeably
  • Gaps — Missing or unclear distinctions
Language matrix first iteration

Outcome: The inventory became the baseline for understanding scope and ambiguity.

Partnering with accessibility

Accessibility anchored the framework. Working with the Accessibility team, we:

  • Defined precise language distinctions driven by functional differences and user needs.
  • Clarified Captions vs. Transcriptions — Visual placement, live vs. post-meeting contexts.
  • Aligned with industry standards — Real-world assistive technology patterns.

Outcome: The framework was anchored in external conventions and user expectations, not just internal preferences.

Framework development

Structured naming with three core dimensions:

  • Language origin — Translated vs. original-language text.
  • Presentation surface — Scrollable vs. in-meeting visual text.
  • Input method — Automated vs. manual content sources.
Transcriptions rename and framework

Outcome: This multidimensional approach ensured every term was distinct, predictable, and scoped—so users could infer meaning from structure rather than guesswork.

Final language framework

The final structure simplified and clarified the full ecosystem. We also standardized the language for manual tools—e.g., Assign manual captioner and Assign manual transcriber—making roles and workflows explicit and reducing ambiguity for users and assistive-tech workflows.

Final name Description When it's used Why this name
captions Real-time text displayed visually during a live meeting Live meetings where users need immediate, on-screen comprehension Aligns with accessibility standards and user expectations for live, visual text
translated captions Live captions automatically translated into another language Live meetings with multilingual participants Preserves the captions mental model while clearly signaling language transformation
transcription Scrollable, live text feed representing spoken audio Live meetings where users want a readable, referenceable record Differentiates text-as-record from text-as-visual aid
translated transcription Live transcription translated into a different language Live meetings requiring readable, translated text Maintains structural consistency across the framework
cloud recording transcript Post-meeting transcription tied to a saved cloud recording After a meeting ends, during review or sharing Anchors the text to its asynchronous, post-meeting context
manual captioner Assigned role for a participant providing human-typed captions Meetings requiring high-accuracy or compliance-driven captions Makes human authorship explicit and distinct from automation
manual transcriber Assigned role for a participant providing a human-generated transcription Meetings requiring manual transcription or editing Mirrors captioner's language while clearly defining the purpose
Product terminology guide

Terminology guide documentation

We took the finalized terms and added them into our product terminology guide so it could be shared with marketing, engineering, and localization. This ensures consistent terminology across teams—whether they're writing copy, building features, or translating the product—everyone references the same source of truth.

Product terminology guide spreadsheet showing terms, definitions, and publication status

Key learnings

Terms defined for users > terms defined for engineers

Grounding definitions in how people experience language—not just internal specs—drives clarity.

Accessibility as a design constraint strengthens the system

Language aligned with assistive contexts supports everyone, not only a critical demographic.

Small changes = large user impact

Even micro-level naming decisions can dramatically affect how intuitive a feature feels.

System language should scale with product complexity

A framework that anticipates new feature touchpoints avoids future naming collisions.

Why this work matters

Reduced confusion

Eliminated overlapping or misleading terms across product surfaces.

Source of truth

Cross-team alignment on naming and language.

Improved accessibility

Language defined by function and presentation.

Future foundation

Scalable framework for naming work and taxonomy efforts.

Scroll