Types of AI and Input Models Explained

ChatGPT is one of the fastest-growing artificial intelligence (AI)-powered apps of all time, but it is likely just the tip of the generative AI iceberg. With AI technology evolving unbelievably quickly, it can be difficult to keep up with the latest terminology and understand how it may impact your day-to-day life, both personally and professionally. In this post, the first of several in a forthcoming series, we’ll break down the fundamentals by defining key AI buzzwords and drawing a typical AI fine-tuning pipeline.

Top AI Use Cases

AI is not always a brainstorming partner or writing assistant. Gartner notes that “enterprise uses for generative AI are far more sophisticated” and identifies five main use cases for generative AI:

  1. Drug design
  2. Material science
  3. Chip design
  4. Synthetic data
  5. Parts design

In addition, Gartner highlights seven sectors for which these use cases will be of particular relevance:

  1. Automotive and vehicle manufacturing
  2. Media
  3. Architecture and engineering
  4. Energy and utilities
  5. Healthcare
  6. Electronic product manufacturing
  7. Pharmaceuticals

Other analysts, such as the UK Office for National Statistics, offer a slightly different perspective on the trend: improving cybersecurity, creating efficiencies, and offering more personalized services are among the most frequent AI use cases across industries. But regardless of how AI is used, it is usually talked about using one or more of these three terms: generative AI, LLMs and foundation models.

The AI Triad: Generative AI, LLMs, & Foundation Models

Generative AI

Put simply, “generative AI is a broad term that can be used for any AI system whose primary function is to generate content,” as explained here. It is a subdomain of the larger AI space. AI is a term that broadly covers automated or semi-automated machine processes that consist of complex tasks usually performed by humans. This can include a variety of functions that are not necessarily related to content creation, such as classifying or grouping data, modeling trends, or executing some actions.

Generative AI systems comprise systems that are able to generate images (think about Midjourney or Stable Diffusion), audio (VALL-E or resemble.ai), code (Copilot), and language (LLaMA, GPT-4, PaLM, or Claude).

Large Language Models (LLMs)

Large Language Models (LLMs) are one type of generative AI. A language model is a statistical model of language which estimates the probability of observing a sequence of words appearing within a particular context. This context could be spoken dialog, written text, computer code, or summarizations of text, to name a few. In recent years, LLMs have become increasingly sophisticated, corresponding to a better approximation of a real-world language model. This is driven in part by the increase in the number of parameters used to approximate the language model. ChatGPT 4 is the latest version of OpenAI’s Generative Pre-trained Transformer (GPT) LLM. Over time, the size of ChatGPT has grown from approximately 1.5 billion parameters with ChatGPT 2, to 175 billion parameters with ChatGPT, and is rumored to have grown to 100 trillion parameters with ChatGPT 4.

Foundation Models

Foundation model is a term coined by the Center for Research on Foundation Models (CRFM) at Stanford University. Although the term is not consensual, it is generally understood to cover AI systems that are generic enough to be adapted to a variety of tasks, which may or may not have been anticipated by the developer of the model. Foundation models are sometimes called general purpose AI. Most of the notable LLMs, such as ChatGPT-4, BLOOM, PaLM LLM, are usually described as foundation models. In reality, the line between foundation models or general purpose AI and other models is rather blurry: all can become input models and receive additional training rounds or undergo specialization processes. So we will use the term “input model” to denote a prepackaged model which can be either used directly or tuned for a more specific purpose.

The emergence of input models is the reason why many in the industry are convinced that AI adoption will greatly accelerate in the coming years. Why? Input models reduce curation requirements and centralize model training. They are pre-trained on uncurated data, or more precisely, uncurated or lightly curated data mined from huge quantities of information from public websites, such as Wikipedia, Reddit, and other social media sites. The data is considered uncurated because it is often untagged, lacking metadata that provides more details about what the data is (e.g. sensitive vs. insensitive or part-of-speech). Input models help to offset the immense cost of training LLMs by centralizing model training, and then licensing trained models.

Of note, input models such as ChatGPT-4 have also been trained on curated training input, such as specific examples of instructions and response scores.

AI Model Training Phases

Now that we know AI models can become input models and undergo specialization processes, let’s take a closer look at how AI models are developed and trained.

Pre-training vs. Fine Tuning

What’s the difference between pre-training and fine tuning? Fine tuning is the process of adapting pre-trained models to specialized applications or narrower domains. LLMs, for example, are trained on a diverse data set used to represent a more generalized language model. On the other hand, fine tuning uses more specialized data, applied through a generalized framework, to gain insights. Essentially, fine tuning is a pre-trained LLM’s college education.

The benefit of specialization is reduced data demands and cost, when compared with developing a model from scratch. Whereas a LLM requires terabytes of data, fine tuning will require megabytes or gigabytes. A LLM may require a few months to train; fine tuning could be done in days. In effect, fine tuning LLMs democratizes their utilization. The expectation is that we will see a handful of input models developed by a relatively small number of organizations, and that the majority of deployed LLMs will be fine-tuned refinements of these input models.

How to Fine-Tune an AI Model

The fine-tuning process usually requires defining a curated data set. A curated data set is one which categorizes and annotates the data in a way that simplifies its analysis. This could be marking parts of speech, categorizing entities within the text, defining sources, or assessing the veracity of the information.

Using a model and a curated data set as inputs to the fine-tuning process, it’s possible to draw a notional training and deployment flow. The diagram below is a representation of a fine-tuning workflow:

https://www.immuta.com/wp-content/uploads/2023/08/Blog-Post-AI-Models-v2-scaled.jpg
  1. We start with selecting a LLM (four notable examples are shown here, but this list is currently expanding).
  2. We then take our curated data set (shown above) and extract meaningful chunks of words in a process known as tokenization to capture the semantic and syntactic structure of the text.
  3. The tokens for both the prompt and response are fed into a training process using the input model as a warm starting point. In traditional modeling, the starting parameters are often guesses, either confident (e.g. acceleration due to gravity is roughly 10 m/s2), or purely random. Starting with an input model allows us to have an informed starting point for these parameter values.
  4. The training process modifies these parameters based upon the new data.
  5. The resulting model is evaluated, and if performance is acceptable, deployed.
  6. Once deployed, the fine-tuned model can be queried by external users, and it will return responses based on user prompts. Throughout the process, even after deployment, results and performance need to be monitored.

As the drawing shows, governing access to the curated data set used during the fine-tuning phase is key to mitigating the risk of sensitive information ending up in the model or its responses. This is an important first step incorporating security into AI workflows, which we’ll start to explore in this blog on AI risks.

What's the Worst That Could Happen?

AI risks explained.

Read More
Blog

Related stories