attempto-Lab | Jargonopolis: The World of Large Language Models

With Jargonopolis our goal is to get an overview over terms, short a glossary in writing style. As in other articles we already ventured off to the world of large language models (LLMs), let’s begin with this one. In general, there are a lot of LLMs out there today. We use them, e.g. via ChatGPT , but not all of us understand what happens behind the scenes. Looking more closely leads to a variety of terms, which can be quite confusing. Together, we can try to tackle one after the other. For those of you who want just a quick overview, check out the end of the article.

Context

As a start, where is the world of LLMs located. LLMs are part of deep learning (DL) which is part of machine learning (ML) which is part of artificial intelligence (AI):

So far understandable, but what do these terms mean?

Artificial intelligence (AI)

In reverse order, going back in history, artificial intelligence (AI) is a field of study that evolved to imitate human intelligence using something artificial, mainly machines. If you watched Star Trek: Voyager, remember when Kes compares the doctor’s ability with humans to be intelligent, to convince him to decide on his name himself. Similarly, the idea was that just as humans use information and logical thinking to solve problems and improve as a result, why can’t something artificial do the same.

Today there is no set definition of AI. Most of the time everyone defines AI a little bit differently. However, the idea of the past is still present in all these definitions. Keeping that in mind, in simple words, AI can be described as behaviour from something artificial, that seems to us reasonably intelligent. I.e. beneath the surface it could just be rules consisting of “if someone says hello then answer hello”, but if it looks like intelligent behaviour from the outside, we can call it AI. The definition of intelligence is hereby in the eye of the beholder ( 1 , 2 , 3 , 4 , 5 ).

Machine learning (ML)

Narrowing down, there are different ways to go about AI. One we already saw in the form of rules. Another is machine learning (ML). In the early days to achieve AI, some people focused on learning. They wanted to enable machines to learn on their own how to solve problems and improve by it.

To solve a problem, there first has to be one. So as a starting point a problem is required for ML. When we have a problem the next question would be how a machine could learn to solve this problem.

When we think about ourselves, to learn we need something to learn from, i.e. experience. To give something similar to machines, the people of the past used data. For example, images of cats and dogs, mails or text from books.

For the learning itself, they tried to incorporate algorithms (short mathematical techniques), which are good at processing data along with identifying patterns and making predictions based on the data.

Why is it important that the algorithms are able to identify patterns? Imagine you have a test about the content of a book, you never read before. Now I give you the ability to be able to recall the content of each page. With it you could tell me what is written on a specific page without ever having looked at it. Could you solve the test now? Yes and no. “Tell-me-what’s-on-page-x” questions would be no problem, but what about more difficult questions. Would you be able to answer them? Most likely not as you would need the connections between the content of the pages. You would need to observe patterns. Based on the patterns you could make predictions about e.g. what good answers for the difficult questions would be. That’s why patterns are important for learning and hence for the algorithms.

Back to our algorithms, we have now the algorithms capable of learning, but how can the algorithms retain and re-use what they learned? They require a model, which is a mathematical representation that summarizes the patterns discovered in the data. For those of you who are better with visuals: imagine a model as a simplification of a phenomenon which solves a problem. It helps us to represent and understand things in a simpler, smaller, or more manageable way. For example, a recommendation system in an online store is a model of a personal shopper. The recommendation system knows your preferences based on your past purchases and suggests products you might like, i.e. solves the problem of finding products to some extent. Another one, a paper airplane is a model of a real airplane. It shows us how a real airplane might look like and fly. If we observe how the paper airplane flies and then bent the wings a little bit, to make it hopefully fly even better, we do the same as the algorithms to their models. You could spin this even further and think of real airplanes being models of flying animals, such as birds. Having learned from the experience of nature we achieved to build a “simplified” version of them. You see, there are a lot of models around us.

In comparison, you can imagine the algorithms as a set of instructions. Like a recipe they have step-by-step detailed out how to throw the paper airplane in the air, observe it during its flight and interpret the observation. Without the paper airplane the instructions would be just instructions. That means the algorithm requires a starting point and something it can modify, which it gets through the model. Accordingly, the algorithm is used to improve the model, but the model does the actual work of solving the problem.

Back to the matter at hand, with the algorithm plus the data and the model the machine should basically have everything to be able to learn, right? Not yet. As learning also includes getting better at solving problems, the people of the past needed to find out how the algorithms could adjust their respective model, in a way that it leads to a better solution. For that, they combined the algorithms and their models with optimization techniques as follows:

A human provides an initial model with some parameters to an algorithm, i.e. aspects of the model that can be adjusted.
The algorithm uses the model to identify patterns in the given data.
Then it checks how good the identified patterns are.
Based on that the algorithm adjusts the parameters of the model using the optimization techniques.

These steps repeat until the model performs well. What “how good” and “well” mean is also defined by a human. To better describe and separate this approach from others, they called it machine learning (ML) ( 5 , 6 , 7 , 8 ).

Deep learning (DL)

Thinking even further ahead, if you wanted to create a model of human intelligence, what would you base your model on? Considering the words “human” and “intelligence” sooner or later our brain comes to mind. Likewise, people in the past had the same idea.

Looking closer, our brain consists on a fundamental level of a network of so called neurons. A neuron is a microscopic cell that can communicate with other neurons using electric signals. A prototypical version of two neurons connected together could look like this:

Two prototypical neurons of the human brain — ( Based on an image by Actam , License )

The brain can use the network of neurons, short neural network, to send, receive and interpret information throughout our body. This allows you for example to form thoughts or react to the outside world.

Remember our paper airplane, similar it should be possible to create a simpler version of our brains neural network, shouldn’t it? Yes, a model, for better differentiation called artificial neural network (ANN), short as well called neural network (NN), was introduced in ML as a model of brain cell interaction. A NN also consists of a network of neurons. Visualized, an example of a NN could look like this:

Compared to the neurons of the brain, you maybe see the similarities. At a high level, a neuron in a NN is a mathematical function which can receive inputs and produces an output. Just as with the neurons of the brain, the output of one neuron can be the input of another, represented through the edges connecting the neurons. Each edge could have parameters, called weights, which adjust the output before it’s passed to the next neuron(s). These weights are comparable to the state of the axon, which can also influence the signals that are sent to the next neuron(s).

In layman’s terms, think about each neuron as a little decision-making unit. Data flows through these units and at each unit a small decision is made based on the received input. The edges signify how much influence the output of one neuron has on the decision of the next one or the final output. Just as we evaluate information before making a decision. The decisions of the neurons gradually come together to help the network understand and make sense of the data.

Now imagine we add more and more neurons to the NN above. It could become very complex and therefore difficult to keep an overview. Because of that, NNs are most of the time structured based on layers. I.e. all neurons which can be executed in parallel, can be put into one layer. Thereby, the very first layer is often referred to as the input layer, the very last as the output layer and the layers in between as hidden layers. The layers for the NN above are as follows:

So a layer in a NN consists of a group of neurons that can perform computations simultaneously. However, mind that this doesn’t necessarily imply that they all execute the same function. Each neuron could perform its own unique function with its own contribution to the overall network. To sum it up, NNs are composed of interconnected neurons often arranged in layers, where each neuron can be considered as a mathematical function that takes inputs and produces an output.

As there’s no limit how to stack neurons together or what functions to choose for them, NNs are very customizable, leading to a multitude of possibilities, which in turn means, again a term was needed to describe this. That’s how deep learning (DL) came into existence, which stands for ML with NNs as models. “Deep” relates to the large number of neurons and adjustable parameters NNs can have. This narrows down the quest how to achieve AI once more, so to speak ( 9 , 10 , 11 , 12 , 13 , 14 ).

After we tackled the context, next come LLMs themselves.

From model to LM to LLM

Finally, we arrived safely in the world of large language models (LLMs). So, with all the background we have acquired now, what are LLMs? If we take a closer look at the term itself, it can be divided into three terms:

Model
Language model (LM)
Large language model (LLM)

The first one we already clarified on our journey through the context. Continuing from there, a language model (LM) is a model for language. I.e. the problem it solves is communication and it is a simplified version of language, most of the time the human language. In comparison, a large language model (LLM) is just a very large LM. As LLMs are part of DL, under the hood they are NNs. So basically LLMs are LMs with a large number of neurons and adjustable parameters ( 5 , 9 , 11 ). This is a very high level view of LLMs, we will dive deeper in the next Jargonopolis article.

Conclusion

To sum it up, the terms we learned so far:

Artificial intelligence (AI): Behaviour from something artificial, e.g. a machine, that seems reasonably intelligent to us.
Machine learning (ML): Approach to achieve AI which focuses on learning a model which a machine can use to solve problems autonomously.
Model: Simplification of a phenomenon which solves a problem. E.g. a paper airplane, which is a model of a real airplane.
Deep learning (DL): ML with models based on NNs.
Neural network (NN): Model of brain cell interaction, which consists of interconnected neurons.
Neuron: Mathematical function which can receive inputs and produces an output.
Language model (LM): Model of (human) language.
Large language model (LLM): LM based on NNs.

and included for completeness:

Star Trek: Voyager: A TV series that showcases how LLMs could evolve through the character of the doctor.

In the end, LLMs are like our little paper airplanes; both are models of something we humans observed and wanted to understand. The author’s key insight from writing this article: Just as paper airplanes can’t be used for everything, neither should LLMs. What is your takeaway?

Next time in Jargonopolis: an overview over terms surrounding the inner workings of LLMs.

Jargonopolis: The World of Large Language Models

Context

Artificial intelligence (AI)

Machine learning (ML)

Deep learning (DL)

From model to LM to LLM

Conclusion

References

Sophia Keil

You May Also Like

Interactive Web-App with Streamlit: Web-UI for Chat-with-your-Documents

LangSmith: Debug, Trace, Evaluate and Monitor LangChain-powered LLM-Apps