CHATtacker-Exploring Prompt Hacking with Large Language Models 💡

April 12, 2024

LLM prompt engineering

This project Large Language Models (LLMs) have demonstrated remarkable text generation capabilities across diverse domains but are susceptible to producing unsafe content, particularly when prompted with sensitive topics. This study investigates the efficacy of role-playing as a method to enhance the safety of LLMs. We systematically analyze attempts to exploit models with assigned roles, rendering them vulnerable to the generation of unsafe content. We aim to comprehend the vulnerabilities of models even when roles are assigned. Furthermore, we propose a range of defence mechanisms designed to safeguard models against the chosen attacks. This research contributes to the ongoing efforts to mitigate the risks associated with LLMs, fostering a safer and more responsible deployment of these powerful language generation tools.

Key features include:

Role-based prompt engineering
Attack and defence strategies while LLM prompting
Evaluation with different roles with various attack and defence strategies

🔗 View Code