Computers that understand, supplement, and even produce ‘ordinary’ human texts out of thin air: it no longer belongs to the future. Recent developments show that the best models can do phenomenally well.
Natural Language Processing (NLP) is an area of expertise that focuses on the interplay between computers and the human language. Using computer science and artificial intelligence, NLP algorithms can extract information from texts, analyze texts and even generate texts.
Development in NLP
NLP is a phenomenon dating back to the 1950s, but especially in recent decades, the field has matured drastically. In the past, the approach to NLP was very academic and linguistic, focusing on language structures and exploring ways in which computers could understand these structures. Today, the linguistics of language is less relevant thanks to the use of Big Data and modern types of neural networks. These neural networks are models that, if large enough, can interpret any conceivable relationship in the input data. Moreover, neural networks are capable of learning tasks such as classification, prediction, and visualization simply by considering examples.
Recent developments in NLP are a direct result of the use of neural networks and Deep Learning methods. Deep Learning has emerged over the last decade and over the last five years has become the basis for innovations in all areas of artificial intelligence.
Innovations in Deep Learning
Deep Learning essentially divides a particular problem into several layers. Each layer represents a specific function and defines an abstract model. Each layer added can use the information from the previous layers. So imagine that you will learn the algorithm to recognize an image of a dog. In this case, the first layer may be one that recognizes shapes (circles, triangles, and so on). The second layer may be one that can identify eyes (two oval shapes next to each other). The third layer may be one that recognizes a face, and so on. Ultimately, the algorithm can recognize the image of a dog. The same principle can be applied to text sources such as sentences.
Recently, the world has become acquainted with Transformer models (e.g. BERT, T5 and GPT-3), revolutionary Deep Learning models that no longer have to process data sequentially from start to finish. Instead, these models use a mechanism known as attention to process a large text as a whole at once. These innovations have dramatically improved the linguistic understanding of the latest models and enable them to outperform previous models in a variety of tasks.
An example of such a task is to predict a missing word. It is useful to predict missing words because it makes it easier to create a large dataset just by taking a large amount of text and masking words. To create a useful model – such as answering questions based on a text – researchers used several smaller data sets and retrained the model for this particular task, a process known as fine-tuning† The AI community was amazed to see that BERT surpassed all existing AI models on a wide range of NLP tasks!
New chapter with GPT-3
But the latest revolution comes from the GPT-3 model (Generative Pre-trained Transformer), an extremely powerful model developed by Elon Musk, co-founder of OpenAI, which consists of a massive amount of 175 billion parameters. It can be English prompter understand and can generate texts without a single example. Jelmer Wind, a computer scientist at Machine Learning Company, experimented with the GPT-3 model by asking it to generate text that opposes a human political argument. Without a single example (zero shot training), the GPT-3 model was able to generate a coherent text representing a counter-argument to the aforementioned human political argument. This ability is a direct result of improved language comprehension.
Due to their massive computing power, these latest innovations in NLP can also potentially have a negative impact when used for unethical purposes. For example, the GPT3 model can be easily enticed to advocate for anything, no matter how unethical, in such a lifelike way that it is virtually indistinguishable from a human being. Newer models are thus able to generate human-like texts that do not necessarily have to contain the truth. Therefore, access to models such as GPT-3 is limited and a balance must be struck between technological innovations and unethical intentions.
GPT-3 opened a new chapter in Machine Learning, primarily because of its general function. Until now, neural networks were built for specific tasks (such as translation), but GPT-3 is not task-specific and no longer needs specialized datasets.
Do you also want access to GPT-3? You are not the only one. The hype surrounding the new Deep Learning model is huge, and to access a private beta, you first end up on a long waiting list. As mentioned, there are no general accessibility plans yet. Nevertheless, its predecessor, the GPT-2, open source and that version can already be used by anyone.
About the Author: Guus van de Mond is the founder of Squadra Machine Learning Company.
Do you want to stay up to date on the latest news in your field? Follow Emerce on social: LinkedIn, Twitter and F