All Posts
CybersecurityCompliance

Metadata Leaks in Legal Documents: A Microsoft 365 Security Checklist for Law Firms

· Infonaligy

Law firm documents carry hidden metadata that can expose privileged information. A practical M365 checklist for managing partners.

Every Word document, Excel spreadsheet, and PDF your firm sends outside the building carries invisible data that has nothing to do with the content your attorneys drafted. Author names, tracked changes from prior revisions, embedded comments, internal file paths, printer names, and even GPS coordinates from photos pasted into exhibits all travel with the file. When that metadata reaches opposing counsel, a regulator, or a journalist, it can expose privileged strategy, client identities, and internal deliberations that your team never intended to share.

This is not a theoretical problem. Courts have addressed metadata disclosure in discovery disputes for over two decades, and the ABA has issued specific guidance on a lawyer’s obligation to prevent it. For managing partners running Microsoft 365, the gap between “we tell people to be careful” and “we have policies that enforce it” is where ethics violations, malpractice exposure, and cyber insurance complications live.

What Metadata Actually Travels with Legal Documents

The term “metadata” covers a broad category of hidden information, and the specific types vary by file format. Understanding what your firm’s documents carry is the first step toward controlling it.

Microsoft Word files are the most common offenders. A typical .docx file can contain the document author and last editor (pulled from the user’s Microsoft 365 profile), tracked changes and revision history (including deleted text from prior drafts), embedded comments and annotations, the template used to create the file, total editing time, the file server path where the document was last saved, and custom properties that may reference internal matter numbers or client codes. The revision history is particularly dangerous in litigation contexts. A settlement demand letter that still carries three rounds of internal edits showing the client’s actual bottom line is a serious breach of privilege.

Excel spreadsheets carry similar author and revision metadata, but they add hidden worksheets, named ranges, data connections to external sources (which may reveal internal database names or server addresses), and cell-level comments. Financial models shared during M&A due diligence are a frequent source of unintended disclosure.

PDF files are often treated as “clean” because converting to PDF feels like a final step. That assumption is wrong. PDFs retain XMP metadata (an Adobe standard that includes author, creation application, timestamps, and sometimes GPS data), embedded fonts with licensing information that can identify the originating machine, form field data that has been flattened but remains extractable, and layer data from documents assembled by merging multiple sources. A recent Adobe Acrobat zero-day vulnerability also highlighted that PDF reader exploits can extract data beyond what the document was intended to expose.

Embedded images deserve separate attention. Photos pasted into documents or exhibits may carry EXIF data, including the camera model, date and time of capture, and GPS coordinates. A whistleblower’s photograph embedded in a legal filing that contains GPS data pointing to their home address is an extreme but real example of the stakes involved.

The ABA’s Position: This Is an Ethics Obligation

Metadata scrubbing is not just a best practice. The ABA has addressed it directly, and state bars have followed.

ABA Model Rule 1.6© requires lawyers to make “reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client.” Metadata embedded in outgoing documents falls squarely within this obligation. A firm that routinely sends documents with tracked changes, internal comments, or author information visible to recipients is failing to make reasonable efforts under this rule.

ABA Formal Opinion 477R (2017) extended confidentiality obligations to electronic communications, requiring lawyers to assess the sensitivity of information being transmitted and act accordingly. The opinion recognizes that different methods of communication carry different risks and that lawyers must match their protective measures to the sensitivity level. Sending a merger agreement with three rounds of tracked changes to opposing counsel over unencrypted email fails this standard on two separate fronts.

ABA Formal Opinion 483 (2018) established obligations around data breach detection and response, including the duty to monitor for unauthorized access to client information. If your firm discovers that metadata from a privileged document was exposed, Opinion 483 requires prompt assessment and potentially client notification. Firms that lack data protection controls to detect these exposures in the first place have a gap in their compliance posture.

Several courts have addressed metadata disputes directly. In Williams v. Sprint/United Management Co. (D. Kan. 2005), the court examined whether metadata in electronically produced documents was discoverable, establishing that metadata can constitute substantive evidence. The SCO Group v. IBM litigation involved disputes over metadata in source code documents where hidden revision data became relevant to establishing authorship timelines. In the pharmaceutical space, metadata from internal documents has surfaced in product liability cases where tracked changes revealed that companies edited safety disclosures, with the original language becoming evidence of what the company knew and when.

Texas attorneys should also consider the Texas Disciplinary Rules of Professional Conduct Rule 4.04, which addresses a lawyer’s obligation when receiving materials that were inadvertently sent. The receiving attorney’s obligations are relevant, but the sending attorney’s failure to scrub metadata in the first place is the ethics problem that triggers the entire chain.

Microsoft 365 Controls That Prevent Metadata Leaks

Microsoft 365 includes several tools specifically designed to address document metadata, but most are not enabled by default. A firm running M365 without configuring these controls is relying entirely on individual attorneys remembering to scrub documents manually, which is not a sustainable compliance strategy.

Document Inspector. Built into Word, Excel, and PowerPoint, the Document Inspector scans for and removes hidden metadata including comments, revisions, personal information, custom XML data, headers/footers, and invisible content. It works, but it requires the user to run it manually before every external send. Training your attorneys to use it is a minimum baseline, not a complete solution.

Sensitivity Labels (Microsoft Purview Information Protection). This is the strongest metadata control available in M365. Sensitivity labels allow you to classify documents by confidentiality level (Public, Internal, Confidential, Highly Confidential/Attorney-Client Privileged) and attach policies that travel with the document. A label policy can automatically strip metadata on download or external sharing, apply encryption so that only authorized recipients can open the file, prevent forwarding, copying, or printing, and add visual markings (headers, footers, watermarks) that indicate classification. Sensitivity labels are included in Microsoft 365 E5 and Microsoft 365 E3 with the Compliance add-on. Firms on Business Premium have access to a more limited set of manual labels. Configuring these properly requires understanding your firm’s matter classification structure, which is why a Microsoft 365 consulting engagement typically starts with a data classification workshop.

Data Loss Prevention (DLP) Policies. M365 DLP can scan outbound emails and attachments for sensitive content patterns (Social Security numbers, financial account numbers, case-specific keywords) and block or quarantine messages that match. For metadata specifically, DLP policies can be configured to flag emails where attachments contain tracked changes, comments, or embedded objects. The DLP policy engine operates across Exchange Online, SharePoint Online, OneDrive for Business, and Teams, which covers the four primary channels through which attorneys share documents externally. Configuring DLP for a law firm requires rules tuned to legal document patterns. Generic DLP templates generate too many false positives in environments where discussing Social Security numbers, financial data, and medical records is the actual work product. This tuning is a core part of how email security services should be configured for legal environments.

Information Rights Management (IRM) and Azure Rights Management. IRM restricts what recipients can do with documents and emails after they receive them. An IRM-protected document can prevent the recipient from printing, forwarding, or taking screenshots. For law firms, IRM is most useful when sharing documents with co-counsel, expert witnesses, or clients where you need to retain control after the file leaves your M365 tenant. IRM protections persist even if the file is saved locally, forwarded, or uploaded to another system.

SharePoint and OneDrive External Sharing Policies. By default, Microsoft 365 allows generous external sharing from SharePoint and OneDrive. For a law firm, these defaults should be restricted. At minimum, configure external sharing to require authentication (no anonymous links), set link expiration for externally shared files, limit external sharing to specific security groups (not all users), and log all external sharing events for audit purposes.

The Metadata Security Checklist for Law Firms

This checklist covers the Microsoft 365 configurations, process controls, and training requirements that a law firm IT services program should address. Each item is specific enough to verify during an internal audit or a cyber insurance renewal review.

M365 Admin Center Configuration

  • [ ] Sensitivity labels created and published for at least three classification levels (Internal, Confidential, Privileged)
  • [ ] Auto-labeling policies configured for documents containing attorney-client privilege markers
  • [ ] DLP policies active on Exchange, SharePoint, OneDrive, and Teams targeting tracked changes, comments, and embedded objects in outbound files
  • [ ] External sharing restricted to authenticated recipients with link expiration enforced
  • [ ] Mailbox audit logging enabled for all user mailboxes (verify this, as Microsoft’s default retention is 90 days)
  • [ ] Unified audit log retention extended to at least one year for compliance
  • [ ] Conditional Access policies requiring compliant devices for access to SharePoint and OneDrive

Document Handling Procedures

  • [ ] Firm policy requiring Document Inspector review before any external file transmission
  • [ ] PDF conversion workflow that strips XMP metadata and flattens form fields (Adobe Acrobat Pro’s “Remove Hidden Information” action or an automated equivalent)
  • [ ] Standard operating procedure for removing EXIF data from photographs before embedding in legal documents
  • [ ] Template library with pre-configured sensitivity labels, so new documents inherit the correct classification
  • [ ] Prohibition on using personal Microsoft accounts for any firm-related document work

Staff Training

  • [ ] Annual training on metadata risks specific to legal documents, with completion records retained
  • [ ] Practical demonstrations showing attorneys how to inspect a document for hidden metadata
  • [ ] Training on sensitivity label application as part of new-hire onboarding
  • [ ] Periodic test exercises (sending a “test” document with metadata to a monitored mailbox to verify that staff catch it)

Monitoring and Response

  • [ ] DLP incident reports reviewed weekly by IT or the managing partner
  • [ ] Alerts configured for external sharing of files labeled Confidential or Privileged
  • [ ] Incident response procedure for metadata exposure events, aligned with ABA Opinion 483 notification obligations
  • [ ] Quarterly review of sensitivity label usage metrics to identify attorneys or practice groups that are not using labels consistently

Cyber Insurance and Texas Privacy Considerations

Metadata leaks intersect with two additional compliance areas that affect Texas law firms.

Cyber insurance applications increasingly ask about data classification and DLP controls. Carriers that audit law firm cyber insurance readiness want to see documented policies for handling sensitive documents, not just endpoint protection and MFA. A firm that can demonstrate sensitivity labels, DLP policies, and metadata scrubbing procedures is in a materially stronger position during underwriting. A firm that cannot may face exclusions for data handling failures, which is exactly the scenario a metadata leak would trigger.

The Texas Data Privacy and Security Act (TDPSA), which took effect July 1, 2024, creates obligations for businesses that process personal data of Texas residents. Law firms that handle client PII in documents, including Social Security numbers, financial account information, and medical records in litigation files, fall within the Act’s scope. Metadata that reveals client PII (an author field showing a client’s name on a confidential document, or tracked changes that expose personal details from an earlier draft) could constitute an unauthorized disclosure under the TDPSA. Firms should treat metadata controls as part of their broader data protection compliance framework, not as a separate IT hygiene task.

The Microsoft 365 security settings that SMBs commonly misconfigure are often the same ones that create metadata exposure risks. External sharing defaults, audit logging gaps, and missing DLP policies appear on both the general M365 security checklist and the legal metadata checklist. Addressing them together is more efficient than treating them as separate projects.

Getting Started

Metadata leaks are a solvable problem, but the solution requires configuration, not just awareness. A policy that tells attorneys to “be careful with document metadata” without enforcing that policy through M365 controls will fail the first time someone is under deadline pressure and forgets to run Document Inspector. The firms that handle this well treat metadata scrubbing as an infrastructure control, enforced by sensitivity labels, DLP policies, and automated workflows rather than individual memory.

If your firm has not audited its M365 tenant for metadata exposure risks, that audit is the right starting point. Map what metadata your documents currently carry, identify which M365 controls are configured and which are still at defaults, and build a remediation plan that addresses the highest-risk gaps first. The Dallas law firm cybersecurity case study on this blog walks through what that assessment looks like in practice for a mid-size Texas firm.

Need a Document Security Audit for Your Firm?

We help law firms configure Microsoft 365 to prevent metadata leaks and meet ABA ethics requirements.

Get a Free Assessment