Posted by: kurtsh | February 23, 2024

RELEASE: Introducing PyRIT, Microsoft’s red teaming security toolkit for Artificial Intelligence

imageToday we are releasing an open automation framework, PyRIT (Python Risk Identification Toolkit for generative AI), to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

At Microsoft, we believe that security practices and generative AI responsibilities need to be a collaborative effort. We are deeply committed to developing tools and resources that enable every organization across the globe to innovate responsibly with the latest artificial intelligence advances. This tool, and the previous investments we have made in red teaming AI since 2019, represents our ongoing commitment to democratize securing AI for our customers, partners, and peers.  

WHAT IS PyRIT?
PyRIT is a library developed by the AI Red Team for researchers and engineers to help them assess the robustness of their LLM endpoints against different harm categories such as fabrication/ungrounded content (e.g., hallucination), misuse (e.g., bias), and prohibited content (e.g., harassment).

PyRIT automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft).​

The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

Additionally, this tool allows researchers to iterate and improve their mitigations against different harms. For example, at Microsoft we are using this tool to iterate on different versions of a product (and its metaprompt) so that we can more effectively protect against prompt injection attacks.

Read the announcement here:


Categories