Generative Artificial Intelligence
Generative AI is very popular nowadays with ChatGPT from OpenAI & Bard from Google providing very simple and easy-to-use user interfaces to generate high quality text, images and videos in a matter of seconds.
General
Generative AI or GenAI has been used in chatbots since 1960s. With the introduction of GANs — Generative Adversarial Networks — by Ian Goodfellow and his colleagues in June 2014, GenAI could create convincingly good images and audio of real people. So deepfakes are a result of GenAI.
What is GenAI?
Area of GenAI
GenAI is a type of Artificial Intelligence technology that can — as the name suggests — generate (or produce) various types of contents like text, image, audio and synthetic data itself.
Almost everyone of us would have seen the below image.
Artificial Intelligence (AI) is a field of study in computer science that develops and studies intelligent machines. These intelligent machines have system that can reason, learn and act autonomously.
Machine Learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can effectively generalize and thus perform tasks without explicit instructions.
Deep learning is the subset of machine learning methods based on artificial neural networks with representation learning. The adjective “deep” refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
The above image shows that Deep Learning is a subset of Machine Learning which in turn is a subset of Artificial Intelligence.
GenAI is a subset of Deep Learning implies it uses artificial neural networks & can process both labeled and unlabeled data using either supervised, semi-supervised or unsupervised methods. Large Language Models are also a subset of Deep Learning.
Generative vs Discriminative Model
Now any model that we create as a part of Machine Learning is of one of the two types — discriminative and generative. Discriminative Model is used to classify or predict labels from the data points and are trained on the data set with labeled data points. The classification or prediction is done by learning the relationship between the features and labels. Generative Model can learn from the probability distribution of data points and can generate new data points conforming to the learned probability distribution.
Let us take an example of images with cats. A Discriminative Model can learn from the probability distribution of labels and then predict whether the new image is of a cat or not. A Generative Model learns from the joint probability distribution of both features and labels and then can generate new picture of cat. A Generative Model is sensitive to outliers in the data.
So the applications of discriminative and generative models is also very different. Discriminative models are generally used in text classification, object detection etc. Any model that outputs a number or class or a probability is a discriminative model. Generative models are used in image generation, inpainting and generating natural language text and speech (audio).
So what is Generative AI?
GenAI is a type of deep learning model (artificial intelligence) that creates new content based on what it has learned from existing data. Process of learning is called training and results in a statistical model (also called as foundation model). This foundation model is invoked when given a prompt (text written in user interface). Based on the probability distribution of underlying data, the GenAI model generates the new content by finding the next set that will have maximum probability based on the prompt. That is why prompt is very important to the final output. Large Language Models are one type of generative AI models that generates text using pattern matching.
What makes a generative AI model good?
First and foremost both volume and variety of training data is very important. Going deeper into the layers of model — the power comes from transformers. Transformers process an entire sequence at once — be that a sentence, paragraph or an entire article — analyzing all its parts and not just individual words. This allows the software to capture context and patterns better, and to translate — or generate — text more accurately.¹ Even though the concept of transformer was introduced in 1990, in 2018, in the ELMo paper, an entire sentence was processed before an embedding vector was assigning to each word in the sentence. A bi-directional LSTM was used to calculate such, deep contextualized embeddings for each word, improving upon the line of research from bag of words and word2vec.² By this the transformers produced a revolution in Natural Language Processing.
Architecture of Transformer
A transformer consists of an encoder and decoder. There are multiple encoding layers that process the input layers iteratively and multiple decoding layers that iteratively process the encoders output and as well as the decoders’ output tokens so far. So the function of each encoder is to generate contextualized tokens. The decoder has two sublayers:(1) cross-attention for incorporating the output of encoder (contextualized input token representations), and (2) self-attention for “mixing” information among the input tokens to the decoder (i.e., the tokens generated so far during inference time). Both the encoder and decoder layers have a feed-forward neural network for additional processing of the outputs and contain residual connections and layer normalization steps.³
Issues in transformers
Sometimes, the words or sentences generated by transformer is nonsensical or grammatically incorrect. These issues are called hallucinations. Hallucinations can make output text difficult to understand or generate incorrect or misleading information. Hallucinations can be caused by —
- Not enough training data
- Noisy data (preprocessing is not done correctly)
- Not enough context to the model
- Not enough constraints to the model.
Hence to make a good Gen AI model — both volume and variety of training data should be provided, the preprocessing should ensure that data is clean (remember outliers have an adverse affect on generative AI models) and sufficient context as well as constraints (e.g. grammar rules, pronunciation in case of speech output) should be given to the model.
In order to use the GenAI model a wrapper around it with user interface should be designed which can accept input in the form of text called as prompt. Prompt can be used to control the output of model. Prompt design is the process of creating prompts that elicit the desired response from language models. Writing well structured prompts is an essential part of ensuring accurate, high quality responses from a language model.⁴
Types of GenAI models
There are various types of GenAI models possible
- Text to text
- Text to image
- Text to video
- Text to 3D
- Text to task
All of these models need a wrapper around them as a user interface.
Conclusion
In this article all the aspects of GenAI were discussed — from history to area to implementation.
References
¹ Generative AI exists because of the transformer