Home  > Resources  > Blog

Generative AI Engineering

 
September 25, 2023 by Denis Vrdoljak
Category: Big Data

These course chapters have been adapted from the Web Age course WA3309: Generative AI Training to introduce you to the basics of Generative Artificial Intelligence (AI), its applications, and the techniques used to develop and engineer these systems.  Contact us for the full version of this live, hands-on course taught by an expert instructor.

Introduction to Generative AI

  • Agenda
    • Introduction to Machine Learning
    • Historical overview of Machine learning and Generative AI
    • Understanding Generative models
    • Comparison between generative and discriminative models
    • The original LLM models –from BERT to GPT
    • Conclusion
  • Machine Learning and Generative Models
    • What is ML? Machine learning(ML) is a subset of Artificial Intelligence (AI) that allows computers to learn from data without any programming intervention.
    • ML algorithms use data to identify patterns and learn from them, making it possible to automate tasks and predict outcomes.
  • A Brief History of Machine Learning
    • 1950s -1960s: Birth of ML
    • 1959 – Arthur Samuelcoins the term “Machine Learning.” His checkers-playing program was an early example of a self-learning program.
    • 1959 – Frank Rosenblatt creates the Perceptron-the building block of neural networks today
    • 1970s -1980s: The AI Winter
    • The AI Winter and the Emergence of Rule-Based Systems
    • Reduced funding for ML research, focus on rule-based systems
    • 1980s -1990s: Probabilistic Models
    • Researchers started exploring probabilistic models for generative tasks
    • Hidden Markov Models (HMMs) and Bayesian networks were used for speech recognition, language modeling, and natural language processing.
    • 1990s -2000s: Revival of ML
    • ML Algorithms(like Decision Trees, K-Nearest Neighbors, and Support Vector Machines ) revive interest/funding in ML
    • Ensemble methods like Random Forest and Boosting.
    • 2000s -2010s: Big Data and Deep Learning
    • With the Internet and Big Data, ML becomes crucial to generate insight
    • Neural Network advances: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)
    • Deep Learning develops
    • Variational Autoencoders (VAEs) were introduced, which marked a significant advancement in generative AI.
    • VAEsprovided a probabilistic framework for generating new data samples, enabling various applications in image generation and data compression
    • 2010s – Present: Rise of AI and Generative Models
    • Advances in computational powerenables ML breakthroughs
    • Models like GPTand BERTrevolutionized NLP
    • Generative Adversarial Networks (GAN) pushed the boundaries of what’s possible in image generation (i.e., Generative AI)
    • Advances in Transformer models lead to LLM’s

Understanding Generative models

What are Generative Models?

Generative models are a class of machine learning models that aim to generate new data samples that resemble the distribution of a given training dataset.

These models learn the underlying patterns and structures present in the training data and then use that knowledge to produce new, synthetic data samples that are similar to the real data (as opposed to classifying or predicting).

Transformer Models: Attention based models, common in LLM’s, understand context very well. Also used in Image GPT.

Generative Adversarial Networks (GANs): Consists of two models. A Generator creates new data instances, while a Discriminator evaluates them for authenticity/quality.

Variational Autoencoders (VAEs) -models that use encoder and decoder neural networks to learn a compact representation of input data in a lower-dimensional latent space, enabling the generation of new data samples with similar characteristics to the training data.

Normalizing Flow Models -models that use invertible transformations to map a simple distribution (e.g., Gaussian) to a complex data distribution, enabling efficient sampling and exact likelihood computation.

Energy-based models (EBMs) -model that assigns an energy score to each input sample, and the probability of the sample is inversely proportional to its energy, allowing the model to capture complex dependencies and patterns in the data.

AutoRegressive models (AR): These models generate sequences by modeling the probability of each subsequent item based on the items that preceded it.

DCGAN (Deep Convolutional Generative Adversarial Network) -a variant of Generative Adversarial Networks that uses deep convolutional neural networks to generate high-resolution and realistic images.

PixelRNN and PixelCNN -autoregressive models capable of generating images pixel by pixel, where each pixel’s color depends on previously generated pixels.

transformer architecture to generate coherent and contextually appropriate text, often used for tasks like text generation and completion.

DRAW (Deep Recurrent Attentive Writer):
uses recurrent neural networks and attention mechanisms to generate images in a step-by-step manner.

Flow++ -flow-based model used for density estimation and image generation, capable of producing high-quality samples.

BERT

Image Generation and Synthesis -create realistic images of objects, scenes, and even people. It has applications in art, design, and entertainment industries for generating visual content.

Text Generation and Language Modeling -language models and transformers can generate human-like text, which has applications in natural language processing, chatbots, and content creation.

Code Generation and Language Modeling -language models are also able to generate code from text prompts or other code. Language models can be trained on programming languages too, not just human languages.

Data Augmentation – augment
training datasets by generating new
data samples, helping improve the
performance and generalization of
machine learning models.

Text Generation and Language
Modeling – pharmaceutical research
to design and discover new
molecules and drugs, accelerating
drug development processes.

Difference between Generative and Discriminative Models

Discriminative Models -focus on modeling the conditional probability distribution of the labels given the input data. They are primarilyused for making predictionsand classifying new data samples.

Generative Models-aim to model the joint probability distribution of the input data and labels. They learn the underlying data distribution and can generate new data samples.

Discriminative Models-generally simpler than generative models because they focus on modeling the decision boundary between classes rather than the complete data distribution.

Generative Models -tend to be more complexthan discriminative models because they need to model the entire data distribution. This complexity can make them computationally expensiveand require more data for training.

Discriminative Models-commonly used for tasks that involve classification and prediction, such as image recognition, sentiment analysis, and natural language processing.

Generative Models -often used for tasks that involve generating new datasamples, such as image synthesis, text generation, and music composition. They can also be used in unsupervised learning and semi-supervised learning scenarios.

Discriminative Models-may struggle with imbalanceddata since they focus on learning the decision boundary, and the model’s performance may be biased towards the majority class.

Generative Models -can handle imbalanced data well since they model the entire data distribution. They are less sensitive to imbalanced class distributions during training.

The original LLM models –from BERT to GPT

  • Released in 2018 by Google AI.
  • Introduced bidirectional context to pre-training language models.
  • Utilized a masked language model (MLM) pre-training objective.
  • Pre-trained on massive amounts of text data.
  • State-of-the-art performance on various NLP tasks.
  • Released in 2019 by OpenAI.
  • Left-to-right unidirectional training objective.
  • 1.5 billion parameters.
  • Autoregressive decoding during generation.
  • Impressive text generation capabilities.
  • Released in 2020 by OpenAI.
  • Scaled up the model size to 175 billion parameters.
  • Largest publicly known LLM at the time.
  • Few-shot and zero-shot learning capabilities.
  • Performance across a wide range of tasks.
  • Continuation of the GPT-3 series.
  • Improvements in language understandingand generation capabilities.
  • Latest release in the is GPT-4, launched in 2023.
  • Expected that GPT-4 will continue the pattern of increasing model size and performance.

Exponential growth in parameter number

Exponential growth in parameter number

Model complexity and exponential scaling

Model complexity and exponential scaling

New players emerging with very large models

New players emerging with very large models

Conclusion

In this chapter, we:

  • Discussed the historical path of Generative AI
  • Acquired a comprehensive grasp of the essence of Generative AI
  • Discussed different types and examples of Generative models
  • Highlighted differences between Generative models and Discriminative models

Case Studies and Real-World Applications in Software and Data Engineering

Agenda

  • Generative AI for Text
  • Generative AI for Media
  • Generative AI for Code

Generative Text App Examples
Example Applications:

  • Copy.ai for writing
  • Jasper for marketing
  • Bing for search + GenAI LLM
  • ChatGPT

Copy AI:

  • Generate blog posts and other copy
  • Uses Prompt Templates
  • Uses Prompt Engineering for context
  • Chat feature adds chat conversation as context to each subsequent prompt API call
  • Pros:
    • Generate blog posts and other copy
    • Uses Prompt Templates
    • Uses Prompt Engineering for context
    • Chat feature adds chat conversation as context to each subsequent prompt API call

JASPER

  • Built on top of custom, proprietary AI language models and other third-party models like Cohere, OpenAI, and Anthropic
  • Fine-tuned for marketing content
  • Trained by reading approximately 10% of published information online
  • Pros:
    • Reliable platform
    • Jasper generally passes AI detectors
    • Export reports
  • Cons:
    • Not great for longer content
    • No fact-checking
    • No free version
  • Combines multiple user inputs
  • Constructs the Prompt API call with these inputs
  • Uses Prompt Templates to construct calls
  • Based on OpenAI’s GPT3 LLM
  • Also provides an auto-complete feature, that Prompt Engineering to add context to each prompt API call.
Jasper.ai

Bing

  • Incorporates Chat style responses
  • Also provides links to source material
  • Done using RAG
  • Uses OpenAI GPT as underlying LLM

ChatGPT

  • Offers common users free access to basic AI content
  • Key Features:
    • Natural Language Understanding
    • Conversational Context
    • Open-Domain Conversations
    • Language Fluency
  • Pros
    • Provide more natural interactions and accurate responses
    • Free tool for the general public
  • Cons
    • Prone to errors and misuse
    • Cannot access data or current events after September 2021
    • Uses ChatGPT 3.5or ChatGPT 4 (paid)
    • GPT-4 was trained based on Microsoft Azure AI supercomputers
  • Chatbot-style question-and-answer
  • Uses OpenAI’s GPT3 (free version) or GPT4 (paid version) as underlying LLM
  • Uses Prompt Engineering to pass session/conversation as context to LLM API call
  • New feature, “Custom Instructions,” uses prompt templates to add context/instructions to each new prompt
ChatGPT

Image Synthesis

Image Synthesis is about creating new images from scratch or transforming existing ones. Examples include DALL-E by OpenAI which creates images from text descriptions, and StyleGAN, a model by Nvidia capable of creating highly realistic human faces.

  • Techniques:
    • Generative Adversarial Networks (GANs): Models that can generate novel, realistic images by learning from a dataset of existing images.
    • Variational Autoencoders (VAEs): Models that can generate new images by learning the underlying distribution of training data.
  • Applications:
    • Art and Design: AI can generate unique artwork and design elements.
    • Advertising: AI can create eye-catching and unique visual content for ads.
    • Medical Imaging: AI can help visualize medical conditions or predict patient outcomes.

DALLE

  • Generates images based on text prompts
  • Application uses prompt templates to select different styles of art
  • This is a nascent technology, not as developed as LLM’
  • Has issues with mistakes in images, including some common mistakes:
    • Hands and feet in wrong place, or not to scale
    • Objects floating or not in a realistic location
  • Subsequent prompts can refine these errors, but not completely

Image Synthesis

  • Opportunities:
    • Custom Art Creation: Artists can use AI to generate unique pieces of art.
    • Image Enhancement: AI can enhance low-quality images.
  • Challenges:
    • High computational requirements: Generating high-quality images requires significant computational power.
    • Avoiding illegal or harmful image generation: Ensuring AI doesn’t generate offensive, inappropriate, or copyrighted images.
    • Future:
    • With advancements in technology, AI could create high-resolution, photorealistic images for any given text description.
    • AI could be used to create custom designs for everything from furniture to clothing.

Music Composition

  • AI can create new music pieces or enhance existing ones by learning from a dataset of music.
  • Examples include MuseNet by OpenAI, capable of composing music with 10 different instruments, and Jukin Media’s Jukin Composer, which generates royalty-free music for videos.
  • Techniques:
    • LSTM: Used for sequence prediction problems, such as predicting the next note in a melody.
    • Transformers: Used to understand relationships between different elements of a piece of music.
    • Autoencoders: Can learn efficient representations of music data and generate new pieces based on those representations.
  • Applications:
    • Background Scores: AI can create background music for videos or games.
    • Personal Music Creation: Individuals can create their own unique music with the help of AI.
    • Music Education: AI can assist in teaching music composition and theory.
    • 228
  • Opportunities:
    • Democratizing Music Creation: AI can help amateur musicians create high-quality music.
    • Endless Customization: AI can create endless variations of a single piece of music.
  • Challenges:
    • Creating Emotive and Culturally Nuanced Compositions: Music is deeply tied to emotion and culture, which can be challenging for AI to fully grasp.
  • Future Prospects:
    • As AI continues to improve, we might see AI-created music becoming more common in mainstream media.
    • AI could also help musicians by taking care of routine tasks, such as mixing and mastering, allowing artists to focus on the creative process.

Video Creation

  • AI can create new video content or modify existing videos.
  • Examples include DeepArt’s video creation tool, which turns simple videos into moving digital paintings, and Deepfake technology, which can superimpose existing images or videos onto source images or videos.
  • Techniques:
    • GANs and VAEs: Used to generate or modify video frames.
    • LSTM: Used to maintain coherence across video frames.
  • Applications:
    • Advertising: AI can create engaging video ads
    • Video Editing: AI can automate routine editing tasks.
    • Film Making: AI can assist in creating special effects or generating background scenes.
  • Opportunities:
    • Automated Video Editing: AI can take care of routine editing tasks, speeding up the video production process.
    • Personalized Video Creation: AI can create custom video content based on user preferences.
  • Challenges:
    • High computational requirements: Creating or editing videos requires significant computational resources.
    • Maintaining Quality Across Frames: Ensuring AI maintains consistent quality and coherence across video frames.
  • Future Prospects:
    • AI might be used to create high-quality video content, including short films or advertisements.
    • AI could also assist in live video production, such as sports events or news broadcasts.

Generative AI for Code

Code Generation

  • Automatically generating code snippets, templates, or even complete programsbased on high-level specifications or natural language instructions
  • Accelerate the development process
  • Reduce the need for repetitive coding tasks

Code Generation: GitHub Copilot

  • Based on OpenAI Codex-LLM built on GPT-3
  • Trainedon both natural language and code
  • Users accepted on average 26% of all completions shown
  • Integrated in Visual Studio Code, JetBrains IDEs…
  • Designed to generate the best code possible given the contex
  • Can only hold a very limited context
  • May not make use of helpful functions defined elsewhere or even in the same fil
  • Does not test cod
  • Code may not always work
  • May suggest deprecated libraries or APIs
  • Non-English commentsto code comes with performance disparities
  • Certain languages perform better
  • May pose privacy or security risks
GitHub Copilot

Follow Us

Blog Categories