Large language models are changing the way humans and machines interact, but could engineering prompts for LLMs ever become a career in its own right?
First, there was machine code: the lowest-level CPU interface that programmers could access. Then came assembly languages, freeing programmers from the tedium of memorizing numeric codes and calculating addresses. After that came the high-level languages, starting with Fortran, which made the programmer’s job even easier by introducing elements of natural language. Today, there are hundreds of high-level languages, each with its own applications, advantages and advocates.
But what about tomorrow?
The rise of foundation models—large machine learning (ML) models trained on huge volumes of data—is transforming the way humans and machines interact before our eyes. People with no programming knowledge or experience can use these models to create their own code from requests made entirely in natural language. Of course, the quality of the code varies, as do model outputs in general.
Programming novices are beginning to appreciate the well-worn concept of garbage in, garbage out (GIGO). As a result, a new discipline is emerging in response to the rise of foundation models like OpenAI’s ChatGPT. Welcome to the world of prompt engineering!
If you work in any proximity to ChatGPT and its ilk, you probably don’t need to scroll far in your LinkedIn feed to find references to prompt engineering, typically in the form of sage advice from self-described experts on how to take your artificial intelligence (AI) game to the next level. It seems like more than a coincidence that these tend to be the same people who would have been pushing NFTs a few years ago. That might lead you to wonder whether this is a legitimate skill or just more marketing pablum.
To put the question another way: Is prompt engineering really engineering?
The answer, as it turns out, is more complicated than a simple ‘Yes’ or ‘No’.
An Introduction to Prompt Engineering
Context is the core of prompt engineering. Since large language models (LLMs) can temporarily learn from prompts, users that provide an LLM with a sufficient amount of context through their prompts should get better outputs than those who do not. What does “a sufficient amount of context” look like? That’s where prompt engineering is supposed to come in.
The simplest example of a prompt—also known as zero-shot prompting—involves giving a model some input data and a description of the expected output. For example, an engineer could provide an LLM with a series of electrical waveforms and ask it to classify that data into distinct categories, such as normal operation, voltage sag, voltage swell, harmonics and transients.
A more nuanced approach—also known as few-shot prompting—would provide the model with labelled examples in addition to the input data and expected output. Using the previous example, a few-shot prompt would include example waveforms for normal operation, voltage sag, etc. in addition to the data to be classified. Essentially, few-shot prompts are supposed to give LLMs a leg up by providing further context.
Another popular approach is chain-of-thought prompting, in which a model is encouraged to solve a problem in a series of intermediate steps before providing a final answer. Chain-of-thought prompts have been shown to improve model performance on tasks that require logical thinking or multiple steps to solve, such as arithmetic word problems. While this could be seen as a subset of few-shot prompting, there is evidence that simply requiring the model to include “Let’s think step by step” yields similar results with zero-shot prompts.
There are many more prompt engineering techniques, with new ones being discovered or invented (depending on your perspective) as LLMs grow in popularity and usage. Assuming they actually do reliably improve model performance, a working knowledge of prompt engineering techniques would be valuable to anyone working with LLMs, including engineers.
Is that enough to consider it an engineering discipline in its own right? Perhaps not on its own, but it is worth looking at what else prompt engineering can do.
What is Prompt Injection?
Every new way of interacting with a machine introduces new possibilities for that machine to be exploited by malicious users, and large language models are no different. Prompt injection is the prompt engineering equivalent of hacking, i.e., using security exploits to get LLMs to do things they aren’t supposed to do.
As with prompt engineering more generally, there are a variety of prompt injection techniques. For example, jailbreaking involves prompting the model to roleplay as a character or otherwise pretend something that will violate its content policy, while prompt leaking involves persuading a model to reveal pre-prompts that are normally hidden from users. These types of exploits are relatively benign, all things considered.
A much more malicious type of prompt injection, known as token smuggling, poses real risk to companies using LLMs. Token smuggling can take advantage of an LLM’s ability to write code or access websites by concealing a nefarious prompt in a code writing task or on a webpage. While there have been attempts to mitigate the risks of token smuggling and other forms of prompt injection by giving models an ability akin to metacognition (i.e., thinking about thinking), the added compute costs of doing so is currently prohibitive.
Prompt Engineering Careers
We’ve looked at the question of whether prompt engineering is really engineering from the perspective of what a prompt engineer can do, but we can also approach the question from an economic standpoint and ask: Can someone get a job as a prompt engineer? While there are occasionally job postings (with impressive salaries) for prompt engineers, it should not be surprising that simply knowing how to write good prompts is not sufficient qualification even for postings with “prompt engineer” in their title.
Indeed, it’s been argued that being skilled at getting LLMs to produce a desired output is akin to being skilled with a word processor. However, there is a clear difference between being an Office Suite power user and being adept at prompt engineering. Those who are skilled with word processes can make documents look more polished or professional, but those skills won’t help them improve the content of those documents. In contrast, a skilled prompt engineer should be able to generate better content from an LLM than a novice, at least in theory. The difficulty with evaluating this premise is that prompt engineering is still so new that there are not yet rigorous standards by which to measure prompt performance.
Sure, we can compare the accuracy of outputs from prompts on a simple word problem, but the real question is whether prompt engineering makes a difference (or whether it’s even necessary) when working with LLMs that have been trained on enterprise or other domain-specific data. If it does, then there might really be a future workforce of prompt engineers.
The Problem with Prompt Engineering
For now, prompt engineering may best be understood as a collection of tricks and workarounds for getting LLMs to overcome their current deficits—at least in a limited fashion. There have yet to be techniques that can fully address the fundamental shortcomings of large language models, such as their insensitivity to the difference between statistical likelihood (or mere popularity) and truth, or the inherent biases in their training data. If we could do that, then prompt engineering might really replace feature engineering, architecture engineering and other lower-level processes of managing large neural networks.
At the end of the day, is prompt engineering really engineering?
Rather than simply saying “No,” a better answer for the moment is, “Not yet.”