Skip To Content
Cambridge University Science Magazine

Making computers that can understand and use human-like language is one of the most difficult problems to solve in computer science. Uniting biologists, mathematicians, psychologists, anthropologists and computer scientists, the goal of teaching computers to speak is arguably part of the central problem in creating artificial intelligence. Solving it could lead to a new generation of computers that are as intelligent as people.

For computers to be able to produce human language, they first need a vocabulary, for example; Peter, young, saw, a, an, apple, rhino, ate, big, see, the, giraffes, sees, notorious. But this is not enough. With a vocabulary alone, a computer cannot distinguish between meaningful sentences, such as “Peter saw an apple”, and strings of words like “Big saw the a ate rhino”; it has no means of combining the words into sentences that are meaningful to humans.

To construct meaningful sentences from a vocabulary, a computer, like humans, needs grammar – a set of rules for combining the words. The brute force approach would be to simply list all the possible grammatically correct combinations. This might work on small vocabularies, but with the hundreds of thousands of words in any human language, that would be far too time-consuming.

As a shortcut, words can be categorised into subsets based on their grammatical function. To capture this mathematically, we can introduce an equivalence relation. If words can be used in the same place in a sentence, they are categorically equivalent to one another and therefore belong to the same subset. So the vocabulary above would be split into nouns (Peter, apple, rhino, giraffes), verbs (saw, ate, sees, see), articles (a, an, the) and adjectives (young, big, notorious). This allows grammatically correct structures to be defined by combinations of the subsets. For example, defining adjective-noun-verb-article-noun as a sentence gives us a variety of options, from “Young Peter sees an apple” to “Notorious apple ate a rhino”. However, this method still allows grammatically incorrect sentences such as “Apple see an big giraffes.”

The problem is that words from distinct subsets are not independent. For example, the correct verb form (see, sees, saw) depends on the subject and tense. To solve this, more restrictions are needed. Perhaps a set of verb stems (play, help etc.) could be combined with a set of suffixes (–ed, –ing etc.), and complemented with a set of auxiliary verbs that denote the tense (will, had, was etc.). Rules would be needed to define the correct combinations, and irregular verbs would have to be considered separately.

It soon becomes clear that even an apparently simple approach to teaching a computer to produce a simple sentence quickly becomes difficult. More complex sentences would require many more rules and restrictions. Even a complete vocabulary and a set of grammatical rules will still not allow a computer to actually understand humans and respond to them appropriately.

A different approach is to limit the topics of conversation we expect a computer to engage in. One example of this is ELIZA the computer psychiatrist, a program written in 1965 by Joseph Weizenbaum. ELIZA mimics a stereotypical psychotherapist. It focuses attention on the patient and hence avoids answering questions. By using simple pattern matching techniques, ELIZA can determine a grammatically correct response to the patient and, in combination with a few standard phrases, this can seem convincing.

But ELIZA depends on the phrase structure of the input sentences and borrows most of the vocabulary from the patient. If “Peter saw an apple” is a grammatically correct sentence, then ELIZA works out that so are “Did Peter see an apple?” and “He saw it”. Fixed phrases, such as “Do you believe it is normal to...” can also be combined with input language. For example, ELIZA could use the structure of the sentence “Peter saw an apple” to convert it to a question and ask “Do you believe it is normal to see an apple?” So ELIZA merely rearranges patients’ words and encourages them to continue the conversation, it does not truly interact with them.

In contrast, SHRDLU was a system developed by Terry Winograd from 1968-1970 that could interact with an English speaker. Like ELIZA, its ability to converse was limited; it was confined to a tiny room containing geometric objects and could only talk about these objects. But it was an improvement in that there was a direct interaction with humans. It responded to commands such as “Move the blue box” and could also answer questions such as “Is there a bigger box than the one you are holding right now?” or “Where is the red sphere?” in a grammatically correct way. The key to the success of SHRDLU was that anything in its world could be described using only about 50 words, so the number of possible combinations was small enough to allow computation. But while its interaction with humans was direct and meaningful, it was still limited to just one subject that had little productive use.

However successful they may be in their own universes, ELIZA and SHRDLU are mostly useless in more complex situations. The simple rules of combining and transforming phrases sooner or later combine to form more complex rules and eventually generate grammatically incorrect sentences. They are unable to capture the properties of human language in a way that makes them useful.

Nowadays, unimaginable amounts of data are stored on the internet, so statistical analysis of natural language is possible. Google Translate uses huge numbers of documents that are written in pairs of languages to develop and maintain a usable statistical automatic translation system. It simply counts how many times certain words occur close to each other and determines the most likely translation.

Google Translate is therefore capable of producing translations without having been programmed to recognise all possible combinations of words. It does not always produce grammatically correct sentences, but it usually captures the general meaning of the text. Despite this, it is still far from being able to replace human translators. This is especially true for less common languages, in which there are few texts with translations available. Furthermore, this system is unable to produce language or interact directly with humans.

Teaching computers to use human language would transform their application. At the moment, non-specialists rely on computer scientists to write programs that enable them to use computers. Once computers learn to ‘speak’ and ‘understand’, every person could use their natural language to program them. Computers could also act as direct translators and allow easy communication between people who do not speak the same language.

The limited success of natural language processing systems demonstrates the difficulties involved. However, the perseverance and large amounts of funding in this field show the importance of overcoming them. The potential benefits mean that scientists around the world, from many disciplines, continue to strive for a mathematical model of language that would enable conversation between people and computers.

Anja Komatar is a first year Mathematics Tripos student