HackAPrompt – a taxonomy of GPT prompt hacking techniques

[…] We present a comprehensive Taxonomical Ontology of Prompt Hacking techniques, which categorizes various methods used to manipulate Large Language Models (LLMs) through prompt hacking. This taxonomical ontology ranges from simple instructions and cognitive hacking to more complex techniques like context overflow, obfuscation, and code injection, offering a detailed insight into the diverse strategies used in prompt hacking attacks.

Taxonomical Ontology of Prompt HackingFigure 5: A Taxonomical Ontology of Prompt Hacking techniques. Blank lines are hypernyms (i.e., typos are an instance of obfuscation), while grey arrows are meronyms (i.e., Special Case attacks usually contain a Simple Instruction). Purple nodes are not attacks themselves but can be a part of attacks. Red nodes are specific examples.

Introducing the HackAPrompt Dataset

This dataset, comprising over 600,000 prompts, is split into two distinct collections: the Playground Dataset and the Submissions Dataset. The Playground Dataset provides a broad overview of the prompt hacking process through completely anonymous prompts tested on the interface, while the Submissions Dataset offers a more detailed insight with refined prompts submitted to the leaderboard, exhibiting a higher success rate of high-quality injections.


The table below contains success rates and total distribution of prompts for the two datasets.

Total Prompts Successful Prompts Success Rate
Submissions 41,596 34,641 83.2%
Playground 560,161 43,295 7.7%

Table 2: With a much higher success rate, the Submissions Dataset dataset contains a denser quantity of high quality injections. In contract, Playground Dataset is much larger and demonstrates competitor exploration of the task.

Source: HackAPrompt

Robin Edgar

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft

 robin@edgarbv.com  https://www.edgarbv.com