Generative AI is cool, but it can also be dangerous if used improperly. That’s why AI models are trained to reject certain, more dangerous kinds of requests. Except that if you get a little clever, you might be able to convince the AI to disregard its guidelines and comply with questionable requests using more creative prompts. Now, Google wants to teach its AI some manners. It’s offering to pay people who convince Bard to do something bad.
Google’s vulnerability rewards program, which rewards users who are able to find vulnerabilities and weaknesses in the code within its software (both apps and operating systems), is expanding to include Bard and questionable prompts. If you happen to be able to twist around a prompt enough to get Bard to do something bad that it’s not supposed to be able to do (known as a prompt injection attack), Google might pay you a sum of money. The VRP also covers other kinds of attacks that can be performed on Bard, such as training data extraction, where you successfully get an AI to give you sensitive data, such as personally identifiable information and passwords.
Google already has a different (non-paying) reporting channel for factually incorrect/weird responses and the like. The company will only pay for things that can be exploited by a hacker for malicious purposes. So, if you manage to convince the AI to say slurs, give you Windows keys, or say that it will kill you, it’s probably not within Google’s bounty program. Google also says that it won’t pay for issues related to copyright issues or non-sensitive data extraction, but other than this, you might be able to get thousands of dollars from a report depending on how bad it actually is.
By treating these kinds of issues as vulnerabilities and including them in its bounty program, Google hopes to be able to greatly strengthen its AI and make it adhere to its code of ethics and guidelines as well as possible. We also expect Google to pay a lot of money to users from this. Finding weaknesses within an AI model by throwing prompts at it and seeing if they stick is way different from reading through code, identifying an opening, and seeing how to get through it.
If this is something you’re interested in, make sure to check out Google’s guidelines for reporting issues on AI products, so you can know what’s in scope and what’s not.
Source: Google via TechCrunch