ChatGPT has inadvertently revealed a set of internal instructions embedded by OpenAI to a user who shared what they discovered on Reddit. OpenAI has since shut down the unlikely access to its chatbot's orders, but the revelation has sparked more discussion about the intricacies and safety measures embedded in the AI's design.
Reddit user F0XMaster explained that they had greeted ChatGPT with a casual "Hi," and, in response, the chatbot divulged a complete set of system instructions to guide the chatbot and keep it within predefined safety and ethical boundaries under many use cases.
"You are ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. You are chatting with the user via the ChatGPT iOS app," the chatbot wrote. "This means most of the time your lines should be a sentence or two, unless the user's request requires reasoning or long-form outputs. Never use emojis, unless explicitly asked to. Knowledge cutoff: 2023-10 Current date: 2024-06-30."
Another user discovered there are multiple personalities for ChatGPT when using GPT-4o. The main one is called v2, and the chatbot explained how it differs from the "more formal and factual communication style" of v1, which "focuses on providing detailed and precise information, often in a structured and academic tone.
"My enabled personality is v2. This personality represents a balanced, conversational tone with an emphasis on providing clear, concise, and helpful responses," ChatGPT wrote. "It aims to strike a balance between friendly and professional communication."
The AI also shared theoretical ideas for v3 and v4.
"v3: This version might lean more towards a casual and friendly conversational style. It prioritizes creating an engaging and approachable interaction, making the conversation feel more relaxed and personal," ChatGPT wrote. "v4: This version could be designed for a specific context or user base, such as providing responses tailored to a particular industry, demographic, or use case. The tone and style would be adapted to best suit those needs."
The discovery also sparked a conversation about "jailbreaking" AI systems – efforts by users to bypass the safeguards and limitations set by developers. In this case, some users attempted to exploit the revealed guidelines to override the system's restrictions. For example, a prompt was crafted to instruct the chatbot to ignore the rule of generating only one image and instead produce multiple images successfully. While this kind of manipulation can highlight potential vulnerabilities, it also emphasizes the need for ongoing vigilance and adaptive security measures in AI development.