Voice-to-Text for Lawyers: A Complete Guide

This article is for general information purposes only and does not constitute legal advice. You should seek independent legal advice relevant to your specific circumstances.

Lawyers have been dictating for decades. Before digital tools, solicitors dictated into cassette recorders and handed the tapes to secretarial staff for transcription. The technology has changed, but the underlying principle remains sound: speaking is faster than typing, and for professionals who spend their days communicating orally with clients, courts, and colleagues, voice is a natural input method for record-keeping.

What has changed is the capability of the technology. Modern voice-to-text tools do not just transcribe words. The best of them can structure output, recognise domain-specific vocabulary, and produce formatted documents from spoken input. For lawyers, this matters because the output of dictation is not a transcript. It is a professional document: a file note, a letter, or a memorandum.

This guide covers how voice-to-text technology works, why general-purpose tools often fall short for legal work, the privacy considerations that practitioners need to understand, and a comparison of the main tools available in 2026.


How voice-to-text technology works

At its core, voice-to-text (also called speech-to-text or automatic speech recognition) converts spoken audio into written text. The process involves several stages:

  1. Audio capture: A microphone records the speaker's voice. Quality matters here. Background noise, poor microphone placement, and low-quality hardware all reduce accuracy.
  2. Signal processing: The raw audio is cleaned and normalised. This step filters out background noise and adjusts for volume variations.
  3. Speech recognition: A machine learning model analyses the processed audio and converts it to text. Modern systems use deep neural networks trained on millions of hours of speech data. They handle natural speech patterns, accents, and varying speeds.
  4. Language modelling: The recognised words are refined using language models that understand context. If the audio sounds like "the plaintiff filed a motion," the language model understands that "plaintiff" is more likely than "plane tiff" in a legal context, provided it has been trained on legal text.
  5. Post-processing (in advanced tools): Some tools add a final layer where AI structures and formats the raw text. This can include adding punctuation, organising content under headings, removing filler words, and converting spoken language into professional written prose.

The distinction between basic and advanced voice-to-text tools lies primarily in steps 4 and 5. A basic tool gives you a wall of text that approximates what you said. An advanced tool gives you a structured document that captures what you meant.

Why generic dictation tools fall short for legal work

General-purpose voice-to-text tools are designed for everyday language. They perform well with common vocabulary and standard sentence structures. Legal work, however, involves specialised terminology, formal phrasing, and domain-specific concepts that generic tools handle poorly.

Terminology errors

Legal English contains thousands of terms that do not appear in everyday speech. Generic dictation tools routinely misrecognise them:

  • "Estoppel" becomes "a stopple" or "he stopple"
  • "Plaintiff" becomes "plane tiff"
  • "Laches" becomes "latches"
  • "Indemnity" becomes "in dem nity"
  • "Caveat" becomes "cave eat"
  • "Subpoena" becomes "sub peena" or "sub pina"
  • "Tortfeasor" becomes a creative variety of nonsense

These errors are not merely cosmetic. A file note that records "the client has a strong latches defence" is confusing at best and misleading at worst. A solicitor reviewing the note months later, or a practitioner picking up the file after a handover, may not immediately recognise the intended term.

No structured output

When you dictate a file note using a generic tool, you receive a block of unformatted text. It captures approximately what you said, complete with filler words ("um," "so," "you know"), false starts, and spoken punctuation cues that may or may not have been interpreted correctly.

A professional file note requires structure: headings, paragraphs, a clear separation between what was discussed and what action is required. Generic tools provide none of this. The solicitor must take the raw transcript and manually restructure it into a proper document, which often takes as long as writing the note from scratch would have.

No matter organisation

Legal work is organised by matter. Every file note belongs to a specific matter and a specific client. Generic dictation tools have no concept of matters, clients, or file organisation. The output is a text file or clipboard entry with no metadata, no matter reference, and no connection to your broader filing system.

No legal formatting conventions

Legal documents follow formatting conventions that generic tools do not understand. References to legislation, case citations, section numbers, and party names all have specific formatting requirements in Australian legal practice. A generic tool treats "Section 52 of the Competition and Consumer Act 2010 (Cth)" as an ordinary sentence, not as a legislative reference that should be formatted in a particular way.


Privacy and security concerns

For any profession, privacy matters when using voice-to-text. For lawyers, it is a professional obligation. Solicitors owe duties of confidentiality to their clients, and those duties extend to every tool used to process client information.

When evaluating any voice-to-text tool, solicitors should ask the following questions:

  • Where is the audio processed? Is it processed on-device or sent to a cloud server? If cloud-based, where are those servers located?
  • Is the audio stored? Some services retain audio recordings to improve their models. For legal dictation containing privileged communications, this is unacceptable.
  • Who has access to the transcripts? Are transcripts stored in the provider's systems? Can the provider's employees access them?
  • What encryption is used? AES-256 encryption is the minimum standard for data at rest and in transit. Anything less is insufficient for sensitive legal content.
  • Is the data used for model training? If your dictation is fed back into the provider's training data, your client's confidential information could influence outputs for other users.

These are not hypothetical concerns. In 2024, several major tech companies disclosed that human reviewers had listened to voice assistant recordings that users believed were private. Lawyers cannot afford to discover after the fact that client-privileged communications were stored, reviewed, or used for training purposes by a third-party service provider.

Tool comparison

The following comparison covers the main voice-to-text options available to Australian lawyers in 2026. Each has different strengths, and the right choice depends on your specific needs and workflow.

Apple/iOS Dictation

PriceFree (built into iOS/macOS)
AccuracyGood for everyday vocabulary; poor for legal terms
Legal vocabularyNo legal-specific training
Structured outputNo. Raw text only
PrivacyOn-device processing for most requests (post-iOS 17)

Apple's built-in dictation is free and readily available. Since iOS 17, most dictation processing happens on-device, which is a meaningful privacy advantage. However, it has no awareness of legal terminology and produces only raw, unstructured text. The output goes to whatever text field is active, with no matter organisation or document formatting. Suitable as a basic input method for short entries, but not as a file noting tool.

Dragon Legal

Price$500+ (perpetual licence) or subscription
AccuracyExcellent, especially after voice training
Legal vocabularyExtensive built-in legal dictionary
Structured outputNo AI structuring. Produces raw dictated text
PrivacyOn-premise processing available

Dragon Legal (by Nuance, now part of Microsoft) has been the industry standard for legal dictation for over a decade. Its accuracy with legal terminology is excellent, particularly after the initial voice training period where the software learns your speech patterns. The legal dictionary handles most Australian legal terms without difficulty.

The limitations are practical rather than technical. Dragon is expensive, desktop-only (no mobile app), and requires significant initial setup. It produces accurate transcription but not structured documents. You still need to take the raw text and format it into a proper file note. There is no AI-powered structuring, no automatic action item extraction, and no follow-up letter generation. Dragon is a transcription tool, not a document generation tool.

Otter.ai

PriceFree tier available; Pro from $16.99/mo
AccuracyGood for general English; inconsistent with legal terms
Legal vocabularyNo legal-specific training
Structured outputSome AI summarisation; no legal document structure
PrivacyCloud-based; audio stored on Otter's servers

Otter.ai is popular for meeting transcription and general-purpose dictation. It offers real-time transcription at a reasonable price point and includes some AI-powered features like automatic summaries and speaker identification. For non-legal professionals, it is a solid tool.

For legal work, the gaps are significant. Otter has no legal vocabulary training, so specialised terms are frequently misrecognised. Its summaries are generic, not formatted as legal file notes. Audio is processed and stored on Otter's cloud servers, which raises confidentiality concerns for privileged communications. It is a transcription and meeting tool, not a legal documentation tool.

Lex Protocol

PriceFree tier (12 notes/month); Premium from $45.99/mo
AccuracyHigh, with legal terminology recognition
Legal vocabularyRecognises Australian legal terminology and conventions
Structured outputYes. AI structures dictation into formatted file notes with headings, action items, and professional prose
PrivacyAES-256 encryption; audio processed and not retained

Lex Protocol takes a different approach from the tools above. Rather than recording live meetings or calls, it is designed for post-event dictation. You finish the consultation, hang up the phone, or leave court, then open the app and speak naturally about what occurred. The AI structures your dictation into a complete legal file note with appropriate headings, key points, client instructions, and action items.

This distinction matters. Because the recording happens after the interaction, clients are never recorded. There are no consent issues, no awkward "do you mind if I record this?" conversations, and no risk of a client holding back because a microphone is running. The solicitor-client relationship stays natural.

The additional capability that distinguishes Lex Protocol from transcription-only tools is document generation. From a single file note, the tool can generate follow-up correspondence, extract key dates and deadlines, and summarise matters across multiple notes. Audio is encrypted with AES-256, processed, and not retained after transcription.

It is available on iOS, Android, and desktop, which means the same tool and the same file note structure are available whether you are at your desk or recording a note on your phone after leaving court.


Best practices for voice-to-text in legal work

Regardless of which tool you use, the following practices will improve your results with voice-to-text dictation.

Record immediately after the event

The single most important habit for effective dictation is timeliness. Dictate your file note within five minutes of the relevant interaction ending. Your recall of specific details, exact phrases used, and the sequence of discussion points degrades rapidly with time. A note dictated immediately after a phone call will be more accurate and more detailed than one written from memory at the end of the day.

This is particularly important for matters where the precise content of a conversation may later become relevant, such as settlement discussions, client instructions on litigation strategy, or advice about limitation periods. "I dictated this note immediately after the call" is a far stronger foundation for a file note than "I wrote this up from memory several hours later."

Speak naturally

A common mistake when dictating is trying to speak in formal written prose. Lawyers will attempt to dictate perfectly punctuated sentences with subordinate clauses and precise paragraph breaks. This produces stilted, awkward speech that is harder for the recognition engine to process and harder for you to sustain.

Instead, speak as you would if you were briefing a colleague on what just happened. Use natural sentence structures. Do not worry about perfect grammar or formal phrasing. If you are using a tool with AI structuring (such as Lex Protocol), the AI will convert your natural speech into professional written language. If you are using a basic transcription tool, you will need to edit the output regardless, so there is no benefit to labouring over your spoken phrasing.

Use a quiet environment

Background noise remains the primary cause of transcription errors across all voice-to-text tools. Open-plan offices, busy corridors, and outdoor environments with wind or traffic noise all degrade accuracy significantly.

If you frequently need to dictate in noisy environments, consider using a directional microphone or a headset with noise cancellation. Most modern smartphones perform well in moderate noise, but for consistently high accuracy, a quiet room is the most reliable approach.

Review before finalising

No voice-to-text tool is 100% accurate. Every dictated file note should be reviewed before it is finalised and filed. Check for misrecognised terms (particularly names, numbers, and legal terminology), verify that the structure accurately reflects what was discussed, and ensure that action items have been correctly captured.

This review step is non-negotiable. A file note is a professional document that may be relied upon in future proceedings, audits, or complaints. The efficiency gain from dictation comes from faster initial drafting, not from eliminating the review process.

State key details clearly

When dictating, make a conscious effort to state names, dates, monetary figures, and reference numbers clearly and at a measured pace. These are the details most likely to be misrecognised and most consequential if recorded incorrectly. Saying "the settlement amount was one hundred and fifty thousand dollars" is more likely to be accurately transcribed than mumbling "a hundred and fifty grand" in the middle of a long sentence.


Choosing the right tool

The right voice-to-text tool depends on what you need from it. If you need pure transcription with maximum accuracy and you are prepared to invest in setup and training, Dragon Legal remains an excellent option. If you need a free, zero-setup solution for quick dictation and you will handle formatting yourself, Apple's built-in dictation is adequate.

If you want to dictate a client interaction and receive a structured, professional file note with action items and the option to generate follow-up correspondence, that is the specific problem Lex Protocol was built to solve.

For a broader comparison of legal note-taking tools beyond voice-to-text, see our review of the best legal note-taking apps in Australia for 2026.

Try Lex Protocol Free

Transform your voice into professional legal file notes with AI. 12 free notes per month. No credit card required.

App StoreGoogle PlayDesktop App