In the previous posts, we covered identity, identity risk, devices, applications, and the network as the dimensions Zero Trust uses to make access decisions. Each one helps answer the same question from a different angle: should this request be allowed, and under what conditions?
This post closes out the access-decision pillars with the one that matters most once everything else has been decided: the data itself. Access controls determine whether someone gets in. They say very little about what happens to the data after they do. A user with a valid token, a compliant device, and a sign-in from the right network can still forward a confidential file to a personal address, paste it into an AI app, or download it to an unmanaged laptop.
Zero Trust treats data as something to be protected wherever it travels, not just at the point of access.
This post walks through how Microsoft Purview classifies and protects data, how data loss prevention enforces rules on its use, and how session-aware controls extend that protection beyond the moment of sign-in.
Licensing note
Microsoft Purview
The most permissive option is Microsoft 365 E5 / A5 / G5, which includes the full Purview feature set. Microsoft 365 E3 includes the foundations: manual sensitivity labelling, basic data loss prevention, basic retention, and eDiscovery Standard. The advanced capabilities (automatic and machine-learning-based labelling, Endpoint DLP, Teams chat DLP, and Insider Risk Management) require Microsoft 365 E5 or one of the add-ons: the Microsoft Purview Suite (formerly E5 Compliance), or the narrower Microsoft 365 E5 Information Protection and Governance and Microsoft 365 E5 Insider Risk Management add-ons. In general, users who are protected by or benefit from a Purview capability need the appropriate licence, and for broad policies applied to SharePoint, Teams, or groups you should validate the licensing scope before rollout. Purview licensing is feature-specific and should always be checked against the Microsoft Purview service description and your own licensing agreement before deployment.
Microsoft Defender for Cloud Apps
Session policies through Conditional Access App Control require Microsoft Defender for Cloud Apps and Microsoft Entra ID P1 for the Conditional Access policy that routes sessions to Defender for Cloud Apps. Defender for Cloud Apps is included in Microsoft 365 E5 and Microsoft 365 E5 Security, and is available as a standalone add-on.
Why Data Comes Last in the Access Decision
The order of this series is deliberate. Identity, devices, applications, and network are all controls on the act of access. They decide who gets a token, from what device, to which application, over which network path. They are the front door, and most of a Zero Trust programme is spent getting that front door right.
But access controls share a blind spot. Once a legitimate user is holding legitimate data, the access decision has already been made. If that user copies a customer list into a personal OneDrive, emails a contract to the wrong recipient, or uploads source code to an unmanaged AI app, traditional sign-in Conditional Access policies will not stop them on their own, because none of those actions is a sign-in. They are things done with data that has already been accessed legitimately.
This is why data protection sits at a different layer. It is not about deciding whether to issue a token. It is about understanding what the data is, attaching protection that travels with it, and enforcing rules on how it can be used and moved after access has been granted. The three capabilities in this post map to three questions: what is this data (classification), can it leave this context (data loss prevention), and is someone behaving in a way that suggests it is about to (insider risk and session controls).
Knowing What You Have: Classification and Sensitivity Labels
You cannot protect what you have not identified. The foundation of the Data pillar is classification: working out which data is sensitive, and marking it so that other controls can act on it.
In Microsoft Purview, the marking mechanism is the sensitivity label. A label is a piece of metadata applied to a document, email, Teams message, or container (a SharePoint site, a Microsoft 365 group, a Teams team). Crucially, the label is stored in the content's own metadata, so it travels with the file wherever it goes. A document labelled Highly Confidential carries that label when it is downloaded, emailed, or copied to another location, and any system that understands Purview labels can read it and act accordingly.
Labels do two distinct kinds of work. The first is visual marking: headers, footers, and watermarks that tell a human reader how to treat the content. The second is protection: encryption and usage rights that restrict who can open the content and what they can do with it. A label can do either, both, or neither. A label that only applies a watermark is a guideline. A label that applies encryption with usage rights is an enforced control that follows the file even outside your tenant, because an unauthorised recipient simply cannot decrypt it.
Sensitivity label with content marking and restrictions configured
Labels can be applied three ways. Users can apply them manually, which depends on training and good will. Default labels can be set on containers and document libraries, so that everything created in a sensitive location inherits a baseline. And automatic labelling can apply labels based on content inspection: a document containing patterns that match a credit card number, a national identifier, or a custom sensitive information type can be labelled without any user involvement. Automatic and machine-learning-based labelling is the most powerful of the three.
Labels also apply to containers, not just files. A sensitivity label on a SharePoint site, a Microsoft 365 group, or a Teams team controls settings such as external sharing, guest access, and whether access is allowed from unmanaged devices. Container labels and file labels work together: the container label governs the boundary of the workspace, while file labels travel with the individual documents that leave it. Labelling your most sensitive sites is often a faster win than labelling every file inside them, because it sets a guardrail on the whole workspace at once.
The classic deployment mistake is to design an elaborate taxonomy of fifteen labels before applying a single one. A practical starting configuration is four labels: Public, Internal, Confidential, and Highly Confidential. Configure visual marking on Confidential and encryption with usage rights on Highly Confidential. This covers most organisations, and label adoption matters far more than label granularity. A label that nobody applies protects nothing. Start narrow, default labels on your most sensitive sites, and expand once people are actually using them. Publish the set to a pilot group first, and then expand to the high-value departments (legal, finance, HR, and the teams handling regulated data) before going tenant-wide. This keeps the taxonomy small enough that users can reason about it while putting real protection on the content that needs it most.
Two tools in the Purview portal help you see the state of your data as you go. Content Explorer shows where labelled and sensitive content actually lives across your estate, which is useful for understanding exposure before you write enforcement policies. Activity Explorer shows what is happening to that content over time: label changes, downgrades, and the activities DLP would act on. Both are worth checking early, because they turn classification from a blind rollout into something you can measure.
The reason labels come first in the Data pillar is the same reason identity comes first in the series as a whole: everything downstream depends on it. Data loss prevention rules can use a label as a condition. Session policies can allow or block a download based on a label. Without labels, those controls fall back to inspecting content on the fly every time, which is slower and less reliable than reading a label that was applied once and travels with the file.
Labels also matter increasingly for generative AI. Microsoft 365 Copilot respects existing permissions and sensitivity labels when it retrieves content and generates responses, so the classification work done here helps preserve data context in AI-assisted workflows. Labels do not replace permissions hygiene, but they are an important foundation for AI governance.
Enforcing Usage: Data Loss Prevention
Classification tells you what the data is. Data loss prevention (DLP) acts on it. A DLP policy answers a single question: can this data leave this context? Where a sensitivity label is a property of the content, a DLP policy is a rule about movement, evaluated at the moment data is about to cross a boundary.
Microsoft Purview DLP operates across several locations. It can inspect email in Exchange Online, files in SharePoint Online and OneDrive, messages in Teams, and, with the higher licence tiers, activity directly on endpoints (Endpoint DLP) and inside the browser. The conditions a policy can match on include built-in and custom sensitive information types (a credit card number, an IBAN, a national identifier, a custom regex pattern), trainable classifiers, and, importantly, sensitivity labels. This is where classification pays off: a DLP policy can say "block any document labelled Highly Confidential from being sent to an external recipient" without having to re-inspect the content, because the label already carries the verdict.
When a policy matches, it can take a set of actions. It can simply audit the activity for visibility. It can show the user a policy tip that warns them what they are about to do and why it may be a problem. It can require a business justification before allowing the action. Or it can block the action, with a message explaining how to proceed. This graduated model matters: starting in audit or warn mode lets you understand real usage before you enforce, which avoids the classic failure where an overly aggressive block policy interrupts legitimate work and gets switched off in frustration.
One capability worth calling out specifically, because it has become urgent rather than theoretical, is DLP for the browser in Microsoft Edge for Business. It can help prevent sensitive content from being shared with unmanaged AI apps such as ChatGPT, Gemini, or DeepSeek, including scenarios involving copy and paste or file sharing, depending on the policy configuration and the supported browser and client state. This addresses a genuinely new exfiltration path: an employee pasting a confidential document into a chatbot to summarise it is not a download, an email, or an upload in the traditional sense, and older DLP coverage did not see it. As generative AI apps become part of daily work, this is a control worth understanding early.
DLP policy rule that detects externally shared files with the Internal Confidential label and blocks access for external users:
DLP policy rule to detect external sharing of Internal Confidential files
DLP-enforced restrictions
Internal Confidential file cannot be shared externallyWatching Behaviour: Insider Risk Management
DLP acts on the data, Insider Risk Management (IRM) watches the behaviour around it. The two are complementary: DLP asks whether a specific action on specific data is allowed, while Insider Risk Management looks at patterns of activity over time and flags when they suggest elevated risk.
The scenarios it is built for are the ones that access controls and single-action DLP rules miss. An employee who has resigned and begins downloading unusually large volumes of files in their final weeks. A user who suddenly accesses and exfiltrates data outside their normal pattern. A sequence of actions, each individually unremarkable, that together look like staged data theft. IRM uses machine learning to correlate signals across Microsoft 365 (file activity, label downgrades, exfiltration indicators, and, where configured, HR signals such as a resignation date) and surfaces the cases that warrant a human review.
It is worth being precise about what this is and is not: IRM is a detection and investigation capability, not a blocking one. It does not stop an action in the moment the way DLP does. Instead it builds a risk picture that a security or HR team can act on. It also operates with deliberate privacy controls, including pseudonymisation of usernames by default, because the capability is sensitive by nature and most organisations deploying it have works council, privacy, or regulatory considerations to satisfy. In a European context in particular, deploying Insider Risk Management is as much a governance and legal exercise as a technical one, and it should be scoped with that in mind.
Insider Risk Management is an advanced capability and requires the higher licence tiers (Microsoft 365 E5, the Purview Suite, or the dedicated Insider Risk Management add-on). For many organisations it is a later-stage investment, deployed once classification and DLP are mature, and once the legal groundwork for monitoring employee behaviour has been done properly.
Protecting Data Beyond Login: Session-Aware Controls
The controls so far protect data within Microsoft's own services and on managed endpoints. The harder problem is the session that happens in a browser, on a device you do not manage, after a perfectly valid sign-in. A contractor on a personal Mac, a director checking OneDrive from a hotel business centre, an employee on a personal phone: each has authenticated legitimately, but the data they can now reach is sitting one download away from leaving your control entirely.
This is the gap that session-aware controls close, and the mechanism will be familiar from earlier posts in this series: Conditional Access App Control in Microsoft Defender for Cloud Apps. We referenced it in the Applications and Network posts as a way to enforce conditions after sign-in. For the Data pillar, it is the tool that lets you allow access while restricting what can be done with the data during the session.
A Conditional Access policy routes the user's session through Defender for Cloud Apps, which then sits inline for the duration of that session and can act on individual activities. Instead of the binary choice between allowing and blocking access, you get a third option: allow the session, but control what happens inside it. Defender for Cloud Apps can block a download, block an upload, block copy, paste, and print, block uploads of sensitive files that do not have the required sensitivity label, or apply a sensitivity label to a file on download.
The actions available in a session policy map directly onto the data controls already discussed. A session policy can block an action and notify the user. It can protect a file by applying a sensitivity label as it is downloaded. It can require step-up authentication (Preview) before a sensitive action completes. And the filters that decide when a policy fires can use the same building blocks as DLP: file type, file name, content inspection for sensitive information types, and, again, sensitivity labels.
Practical configuration: block downloads to unmanaged devices
A common starting scenario: allow users to access SharePoint Online or OneDrive from unmanaged devices, so they remain productive, but prevent them from downloading files to those devices.
This requires two pieces working together. First, a Microsoft Entra Conditional Access policy that routes the relevant sessions to Defender for Cloud Apps by enabling Conditional Access App Control and selecting "Use custom policy". The role of this policy is primarily to hand the matching sessions over to the Defender for Cloud Apps proxy. Second, a session policy in Defender for Cloud Apps that does the actual enforcement, including the granular device-state filtering.
The session policy is created in the Microsoft Defender portal under Cloud Apps → Policies → Policy management → Session policy. The configuration in outline:
- Session control type: Control file download (with inspection).
- Activity source: filter on the device being unmanaged, for example Device tag does not equal Intune compliant or Microsoft Entra hybrid joined.
- File filters: optionally narrow to files carrying a specific sensitivity label, or matching a sensitive information type, so the policy only acts on data that matters.
- Action: Block, with a customised message that tells the user why the download was prevented and what to do instead.
Session policy to block file download from unmanaged devicesA few practical points are worth knowing. Session controls apply to browser-based sessions, so to prevent users bypassing them you typically pair the session policy with an access policy that blocks legacy or native clients for the same scenario. Also, routing a session through the reverse proxy introduces a small amount of latency and rewrites URLs during the session (the address picks up an .mcas.ms suffix), which is visible to users and worth mentioning in your rollout communications. And session controls work across Microsoft 365 apps and a list of supported third-party SaaS applications, which makes them one of the few ways to apply consistent data controls to non-Microsoft cloud apps.
Holding Less: Data Governance and Minimisation
Every control discussed so far protects data you are keeping. The cheapest data to protect is the data you no longer hold. This is the part of the Data pillar that is easiest to ignore, because deleting things feels less like security work than configuring labels and policies, but it is one of the most effective ways to reduce exposure: a breach cannot leak a document that was disposed of two years ago when it stopped being useful.
Microsoft Purview handles this through data lifecycle and records management: retention labels and policies that keep data for as long as it is needed and then dispose of it, applied automatically based on type, location, or classification. The same labelling foundation that drives protection can drive retention, so a Highly Confidential contract can carry both an encryption policy and a retention schedule. The goal is to stop sensitive data accumulating indefinitely in mailboxes, SharePoint sites, and Teams chats long after it has served its purpose.
Minimisation also means reducing duplication. Every copy of a sensitive file is another thing to protect and another potential leak point. Favouring in-place access and sharing over making copies, and cleaning up the duplicates and stale sites that accumulate over time, shrinks the surface that all the other controls have to defend. For organisations with regulatory retention obligations, this is not optional housekeeping; it is the difference between holding data because you must and holding it because nobody ever decided to delete it.
Where Data Protection Fits in the Bigger Picture
Data is the fifth and final access-decision dimension in this series, and it occupies a particular position: it is the pillar that assumes the others may fail. Identity can be phished, a device can be compromised, a token can be stolen, a network path can be spoofed. When one of those controls fails, data protection is what limits the damage. An attacker who has compromised a session still cannot open a file encrypted by a sensitivity label they have no rights to. A stolen account still trips DLP rules when it tries to move data in bulk. A download to an attacker-controlled device is still blocked by a session policy.
This is the layering principle the whole series has been building towards. No single control is sufficient, and the data controls are the last line precisely because they keep working when the access decision turns out to have been wrong. They also depend on the others: a sensitivity label is most useful when identity and device controls mean the label is usually being read in a trustworthy context, and session controls build directly on the Conditional Access foundation from the earlier posts.
The practical sequence reflects this dependency. Classification comes first, because DLP and, where integrated and supported, session controls can both consume labels. DLP comes next, because it is usually the broadest enforcement layer and often the most practical starting point before session controls and Insider Risk Management. Session controls and Insider Risk Management come later, as the protection matures and the licensing and governance groundwork is in place.
Baseline Recommendations
A practical starting point for the Data pillar:
- Start with classification, not with a fifteen-label taxonomy. Define a small set of labels, apply default labels to your most sensitive SharePoint sites and libraries, and prioritise adoption over granularity.
- Use encryption on your most sensitive labels. A label that only watermarks is guidance; a label that encrypts with usage rights is a control that travels with the file.
- Deploy your first DLP policy in audit or simulation mode. Pick one label or one sensitive information type, observe what it catches for two weeks, tune, then move to warn and only then to block.
- Address the generative AI exfiltration path explicitly. If your users have access to consumer AI apps, DLP for the browser can help prevent sensitive content from being pasted into them, depending on policy configuration and supported browser state.
- Use sensitivity labels as DLP and session-policy conditions. The classification work pays off when other controls can act on the label instead of re-inspecting content every time.
- Treat session controls as the answer to the unmanaged-device question. Block or protect downloads to unmanaged devices rather than choosing between full access and no access.
- Plan Insider Risk Management as a later-stage, governance-led project. The technical setup is the easy part; the privacy, legal, and works council groundwork is what makes it deployable, particularly in Europe.
- Reduce what you hold. Use retention and lifecycle policies to dispose of sensitive data when it is no longer needed, and favour in-place sharing over making copies. The most secure data is the data you no longer have.
Wrapping Up
Access controls decide who gets in. Data controls decide what happens to the information once they are inside, and they are the controls that keep working when an access decision turns out to have been wrong. Classification tells you what you have, data loss prevention enforces rules on how it moves, insider risk management surfaces the behaviour that precedes a leak, and session-aware controls extend all of this into the browser sessions that access controls alone cannot reach. Underneath all of it, holding less data in the first place is the quietest and most durable control of the lot.
This post closes the five access-decision pillars: identity, devices, applications, network, and data. Each adds a different kind of context to the same underlying question, and Zero Trust is the discipline of layering them so that no single failure is catastrophic.
From here, the series shifts focus. The next posts move from the individual pillars to how Conditional Access ties them together: policy architecture, how to structure and layer policies without creating an unmanageable sprawl, break-glass design, and how to operate Conditional Access at scale as the tenant and the organisation change over time. The pillars are the signals; Conditional Access is where those signals become decisions.
Let's Connect
Are you in need of assistance from our Cloud Experts, or want to discuss how these principles apply to your organization? Don't hesitate to fill out the contact form below!