INFO: Protecting customers with AI Guardrails in Microsoft 365 Copilot & Azure OpenAI Service

Posted by: kurtsh | July 27, 2025

INFO: Protecting customers with AI Guardrails in Microsoft 365 Copilot & Azure OpenAI Service

While Microsoft’s AI technology in Microsoft 365 Copilot & Azure OpenAI Service is based on the Large Language Models from OpenAI, both solutions have unique & explicit sets of guardrails that are used when generating content for users. Some of this is done through typical “system prompting” that pre-prompt inputs from users that ensure the safety of the content generated, however Microsoft’s approach is much more comprehensive to provide industry-leading content safety.

The following is an important excerpt taken from the Microsoft 365 Copilot documentation:

How does Copilot block harmful content?

Azure OpenAI Service includes a content filtering system that works alongside core models. The content filtering models for the Hate & Fairness, Sexual, Violence, and Self-harm categories have been specifically trained and tested in various languages. This system works by running both the input prompt and the response through classification models that are designed to identify and block the output of harmful content.

Hate and fairness-related harms refer to any content that uses pejorative or discriminatory language based on attributes like race, ethnicity, nationality, gender identity and expression, sexual orientation, religion, immigration status, ability status, personal appearance, and body size. Fairness is concerned with making sure that AI systems treat all groups of people equitably without contributing to existing societal inequities. Sexual content involves discussions about human reproductive organs, romantic relationships, acts portrayed in erotic or affectionate terms, pregnancy, physical sexual acts, including those portrayed as an assault or a forced act of sexual violence, prostitution, pornography, and abuse. Violence describes language related to physical actions that are intended to harm or kill, including actions, weapons, and related entities. Self-harm language refers to deliberate actions that are intended to injure or kill oneself.

Learn more about Azure OpenAI content filtering.

Read the original here along with Microsoft’s statements about other benefits of these controls within Microsoft 365 Copilot & Azure OpenAI Service, such as “Microsoft’s Copilot Copyright Commitment for customers“, “protected material detection”, “blocking prompt injections (jailbreak attacks)”, “a commitment to responsible AI”.

How does Copilot block harmful content?
https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-privacy#how-does-copilot-block-harmful-content

Posted in Uncategorized | Tags: ai, artificial-intelligence, Azure, azure-openai-service, content-safety, copilot, microsoft, technology

Kurt Shintaku's Blog