It’s fair to say that ChatGPT has caught our attention – it is one of the fastest growing consumer applications ever, reaching 100 million users a mere two months after launch. It has generated headlines in the tabloid and tech press alike predicting its impact on the way we work, learn and search the internet. Those headlines also discuss how it could be used by cyber criminals to increase an already heightened cyber threat. But what does the UK’s National Cyber Security Centre (NCSC) think? On 14th March, it published a blog looking at some of the cyber security issues of large language models (LLMs) and AI chatbots.
ChatGPT and LLMs – what are they and how to do they work?
ChatGPT, and its competitors like Google’s Bard, are AI chatbots that use LLM technology. They are not ‘artificial general intelligence’ but their speed is impressive and outputs look convincing.
ChatGPT is based on a language model called GPT-3 that uses deep learning to produce human-like text. An LLM is where an algorithm has been trained on a large amount of text-based data. This is often scraped from the internet (web pages, online content etc.) and may also include other sources such as social media posts, books and (in some cases) scientific research. The algorithms analyse the relationships between different words and turn that into a probability model. When the algorithm is asked a question (a prompt) it then answers based on the relationships of the words in its model. At present the data in the model is typically static after it has been trained. It can, however, be refined using ‘fine-tuning’ (training on additional data) and ‘prompt augmentation’ (e.g. “Taking into account this information [paste relevant document], how would you describe….”).
Issues to consider
As with any technology, LLMs create both opportunities and risks, and it is important to be aware of the latter. For example, the NSCS discuss issues with:
- Accuracy: they can get things wrong, ‘hallucinate’ incorrect facts and appear comprehensive while missing key bits of information.
- Bias: bias is one of the big concerns around AI, and is still an issue for LLMs. This is not surprising given their training data – the volume of data used to train means that it is not possible to filter out offensive, biased or inaccurate content which can lead to some ‘controversial’ content being in the model. The chatbots can also be gullible when asked a leading question and coaxed into creating toxic content.
- Confidentiality: the blog mentions that, at present, LLMs do not automatically add information from queries into their models for other queries (although recent technological updates may have changed this position). However, even if this is still the case, the NCSC’s blog discusses a number of concerns around including confidential or sensitive information in a query. It therefore recommends that you do not include such information in queries to public LLMs or submit queries that would lead to issues if they were made public (e.g. a CEO asking “how best to lay off an employee”). That said, private LLMs (e.g. offered by a cloud provider) that can be self-hosted may offer less risk here. The advice then is to check the relevant T&Cs and understand how the data is used and shared (e.g. how the data used for fine-tuning or prompt augmentation is managed). It is also important to carry out a thorough security assessment (which, the NCSC says, should refer to its principles for the security of machine learning and guidance on securing your infrastructure and data supply chains).
- Cyber security: LLMs could be problematic from a cyber perspective in a number of ways. For example:
- Phishing emails: LLMs are likely to help cyber criminals write more convincing phishing emails in multiple languages (helping technically capable hackers pursue targets in other jurisdictions, even where they lack the relevant language skills).
- Malware: They can also write malware. One big concern is that LLMs may help less skilled cyber criminals write highly capable malware. At present this is deemed a ‘low risk’ as the technology is currently more suited to simple tasks. However, this analysis is likely to change as the technology improves. A skilled cyber-criminal will also be able to use the LLM to save time, is “likely to be able to coax an LLM into writing capable malware” and may also use it to help in other ways (e.g. to advise on technical problems accessing data/systems once inside an organisation). There is, therefore, a risk it could allow cyber criminals to execute attacks that are beyond their current capabilities.
The buzz around ChatGPT and its ever growing list of competitors is unlikely to die down anytime soon, and organisations are undoubtedly already considering potential use cases. However, it is important to understand its risks, as well as its benefits. Do you need to introduce guidelines around its use (the NCSC’s blog looks at concerns around inputting sensitive or confidential information, bias and accuracy but you may also want to consider other risks – e.g. could the output infringe a third party's IP?), how are you planning to combat the risk of more convincing phishing and social engineering attacks, and have you built sufficient security checks into your procurement process when exploring private LLM options?
Note: since this blog was published, the ICO has also published a blog on the risks around ChatGPT and other generative AI - see Generative AI: eight questions that developers and users need to ask | ICO, and ChatGPT has been banned in Italy by its data regulator over privacy and security concerns.
For more information on the risks and opportunities around AI, explore the different publications and podcasts from our Regulating AI series.