Table of Content
- Where to Find Value?
- What is Generative AI?
- What is a Generative AI App?
- Types of Generative AI Models
- Generative AI Tech Stack
- Proprietary or Closed Source Foundation Models (OpenAI, Google Bart) Pre-trained Models (connecting with APIs)
- Models Hubs (platforms to host and share models)
- Compute Hardware GPUs TPUs (accelerator chips for model training)
- Building Generative AI Apps
- Key Takeaway
Generative AI is in the early stages of development, with players needing more differentiation and user retention, so it is unclear how these generative AI applications will generate value. But as they advance with technical capabilities, some will successfully emerge and consolidate their AI end products.
Enterprises with established business models and large customer bases are adopting generative AI to quickly enhance their current end-user applications and improve their processes.
In this paper, we will discuss generative AI concepts and details on how the technology works, how the tech stack is composed, and other aspects for clients interested in discussing their AI development path.
Where to Find Value?
Companies adopting generative AI apps are raising the standard by improving their operational performance and building advanced products and services.
What is Generative AI?
What is a Generative AI App?
These application types represent different ways of applying generative AI techniques, and they all have their unique potential benefits and challenges.
Here are some types of generative AI applications:
- Text Generation Apps: These applications generate human-like text. They are widely used in various fields, including creative writing, journalism, content marketing, customer support (in chatbots), etc. Apps based on models like GPT-3 and GPT-4 fall into this category.
- Image Generation Apps: These applications generate realistic images. They can be used to create art, design elements, or even to generate fake but plausible images. Generative Adversarial Networks (GANs) are often used in these applications. For example, transforming the style of an image, changing the time of the day, etc.
- Music and Audio Generation Apps: These applications can generate music or other forms of audio content. They can create background scores, jingles, or sound effects for various purposes. OpenAI’s MuseNet is an example of an app capable of composing music in different styles.
- 3D Model Generation Apps: These applications generate 3D models that are especially useful in fields like video game development, architectural design, or virtual reality.
- Video Generation Apps: These applications can generate new video clips. They are still at an early stage of development, but they have the potential to revolutionize fields like film, advertising, and social media.
- Data Augmentation Apps: These applications generate synthetic data to augment existing datasets. This can be particularly useful when data is limited and expensive to collect. For example, synthetic patient data in healthcare can be used for research without compromising privacy.
- Style Transfer Apps: These applications apply the style of one data set (e.g., an artist’s painting style) to another (e.g., a photograph). This allows for a lot of creativity and customization.
Types of Generative AI Models
The following are the most modern types of neural networks currently used for generating high-quality results
Variational Autoencoders (VAEs): VAEs are a type of autoencoder, a neural network that learns to copy its input to its output. They differ from traditional autoencoders in that they are designed to add constraints on the encoded representations of the input. For example, they compress the input into a latent-space representation and then reconstruct its output. This process results in the generation of new, similar data.
Generative AI Tech Stack
Apps (end users) Without Proprietary Models
End-user-facing generative AI applications interact with the end user, using generative AI models to create new content (text, images, audio) or solutions based on user input. These apps without proprietary models use open-source, publicly available AI models without developing or owning the models.
These apps are easy to use, affordable (usually free), scalable, and secure. Their main challenges are that they may be biased depending on the training data used, may collect user data, and may need to be more accurate (depending on the task). In addition, content may not be truly original, which may require revision for context.
- Jasper —text generation
- Github Copilot — write code
- MidJourney — image generation
Proprietary or Closed Source Foundation Models (OpenAI, Google Bart) Pre-trained Models (connecting with APIs)
OpenAI’s GPT-3, short for “Generative Pretrained Transformer 3,” is an autoregressive language model employing deep learning to yield human-like text. With 175 billion machine learning parameters, it was trained on a diverse compilation of internet text. As a result, GPT-3 can generate text, translate languages, produce creative content, and answer questions informatively.
Google’s BART, standing for “Bidirectional and Auto-Regressive Transformers,” is a denoising autoencoder that pre-trains sequence-to-sequence models. Particularly efficient at text generation and rewriting tasks, BART showcases the versatility of foundation models.
Notably, these models fall under the “Closed Source” category, implying that while they can be accessed and used via APIs, their core code, specific training data, and process details are not public. This measure prevents misuse, safeguards intellectual property, and manages the resources required for such extensive model releases.
For instance, using OpenAI’s GPT-3 entails making API calls where a prompt is sent and a generated text is returned. Users leverage the trained model without having access to or the ability to alter the code used for its training or the specific data on which it was trained.
The benefits of using closed-source foundation models are their high accuracy, the production of high-quality content, scalability to meet the needs of many users and security against unauthorized access. However, they present challenges too. For example, their development and maintenance can be costly, and there can be bias based on the training data. Additionally, there is potential for misuse, such as generating harmful content like hate speech or misinformation.
Closed-source foundation models also extend to image generation, as demonstrated by DALL-E 2 and Imagen. Both are trained on datasets of images and text to create realistic images from text descriptions. Despite challenges, these closed-source foundation models provide immense benefits, including accuracy, scalability, and security, signaling their immense potential in AI.
Closed source foundation models can be connected with APIs through a direct connection, which is efficient but potentially expensive; indirect connection via third-party services, which can be less expensive but less efficient; and hybrid connections, which combine both methods for optimal efficiency and cost-effectiveness.
Models Hubs (platforms to host and share models)
Several renowned Model Hubs are currently available, providing developers with a wealth of resources:
- Hugging Face Model Hub: A collaborative platform catering to Natural Language Processing (NLP) tasks, where developers can upload, annotate, and use machine learning models. It houses a broad spectrum of models, such as BERT, GPT-2, and RoBERTa.
- TensorFlow Hub: Serving as a repository for TensorFlow models, this platform includes pre-trained models for diverse tasks, which can be reused in TensorFlow programs with minimal effort.
- PyTorch Hub: A platform akin to TensorFlow Hub but specifically designed for PyTorch. It houses models for many tasks, including image recognition and natural language processing.
- ONNX Model Zoo: This hub offers a collection of pre-trained, state-of-the-art models in the ONNX (Open Neural Network Exchange) format.
These hubs provide easy access to a broad range of pre-trained models, ready for immediate use, significantly reducing the time and resources required to get a model operational. Their interfaces allow users to conveniently search for models based on criteria such as task or language, ensuring an efficient user experience. They are designed to scale and meet the needs of a large number of users, ensuring reliable performance. Moreover, they often adhere to stringent security measures to protect user data.
Model Hubs also bolster community collaboration. By facilitating the sharing of models within a shared space, they foster a sense of community where developers can learn from each other and collaborate on enhancing existing models or creating new ones. They also support model versioning akin to code repositories, allowing for the accessibility of previous versions of models even as they are updated and improved. This can be particularly beneficial for reproducing academic research or ensuring stability in production environments.
The models hosted on these platforms typically follow standardized formats like ONNX, PMML, etc., making them readily usable across different programming languages and machine learning frameworks. Further, these models usually come with detailed documentation and usage examples, assisting developers in understanding and deploying them effectively.
However, while Model Hubs offer numerous benefits, they also present certain challenges. Depending on the data they were trained on, these models can introduce bias, warranting awareness of the potential for bias when utilizing a Model Hub. Moreover, privacy concerns may arise, as these hubs may collect and use user data in ways users may not fully comprehend. Finally, the accuracy of these models may vary based on the task for which they’re being used, necessitating an understanding of the potential for inaccuracies when using a Model Hub. Nonetheless, Model Hubs remain invaluable tools for generative AI, promising a wealth of possibilities for future development and innovation.
Hugging Face Model Hub and Replicate are two leading platforms for hosting and sharing pre-trained models, catering to a wide array of tasks, including natural language processing, image classification, and speech recognition.
Hugging Face Model Hub is a specialized platform focusing on natural language processing tasks. The platform is popular for sharing and utilizing Transformer models, a neural network particularly effective for natural language processing tasks. In addition, it functions as a collaborative community where developers can upload, annotate, and employ a diverse range of machine learning models such as BERT, GPT-2, and RoBERTa, among others. The Hub’s comprehensive library of pre-trained models is easily accessible and comes with in-depth documentation and usage examples to facilitate understanding and efficient deployment.
On the other hand, Replicate is a versatile Model Hub that enables developers to share, discover, and reproduce machine learning projects across various domains. Despite being newer than Hugging Face Hub, it has been growing rapidly, offering several features that make it an excellent choice for sharing and using pre-trained models.
One of Replicate’s key features is private sharing, which allows users to share their models with a selected group of users. This attribute can be crucial for collaboration or for safeguarding sensitive data. It also provides version control for machine learning models, enabling users to track changes over time. This could be particularly beneficial for debugging or tracking the performance improvements of your models. Furthermore, Replicate enables monitoring metrics such as accuracy and latency, which are crucial for evaluating model performance. These features and a Docker-based environment to streamline model deployment collectively contribute to Replicate’s objective of promoting reproducibility and transparency in machine learning research.
Open-Source Foundation Models (Trained models) (Stable diffusion) (Stability)
Open-source foundation models are large-scale machine learning models that are publicly accessible. They offer free access to their codebase, architecture, and often even model weights from training (under specific licensing terms). Developed by various research teams, these models provide a platform anyone can adapt and build upon, thus fostering an innovative and diverse AI research environment. This open-source nature is instrumental in product development, service innovation, and exploring new ideas.
These foundational models undergo pre-training on enormous datasets encompassing text, code, and images. This extensive training process, which can span several months or even years, equips these models to comprehend and reproduce a vast array of language patterns, structures, and information. Upon completion of the training, these models can generate novel content in multiple formats, including text, images, and music.
The cadre of notable open-source foundation models includes Google’s BERT and T5, OpenAI’s GPT-2, RoBERTa (RoBERTa (Robustly Optimized BERT Pretraining Approach), Transformer-XL, and DistilBERT. These models encompass various design approaches – from transformer-based architectures like BERT that understand the context of words by considering surrounding words to autoregressive language models like GPT-2 that generate human-like text. Models like T5 perceive every NLP task as a text-to-text translation task, while RoBERTa, a BERT derivative, enhances performance with a distinct training approach and larger data batches. Transformer-XL incorporates a recurrence mechanism to retain a longer memory of past inputs, and DistilBERT reproduces BERT’s functionality in a smaller, less resource-intensive design.
A recent entrant into the realm of open-source foundation models is Stable Diffusion. These models apply a process akin to natural diffusion to generate new data. Starting from random noise, Stable Diffusion models gradually transform it into meaningful data, such as an image or a piece of text. Despite their computational intensity, recent improvements have made these models increasingly accessible and applicable across various domains. Unique to Stable Diffusion models is their ability to generate samples at any point during the diffusion process, offering a blend of abstract and realistic outputs.
Frameworks like Hugging Face Transformers, PyTorch Lightning, and TensorFlow Hub significantly improve the accessibility and usability of these models. In addition, they offer libraries of open-source foundation models for various tasks such as text classification, text generation, question answering, and more.
Leveraging open-source foundation models brings several advantages, including high accuracy, the ability to generate high-quality content, scalability to large user bases, and transparency. This transparency allows users to comprehend the workings of these models and make necessary improvements. However, challenges exist, including their complexity, potential bias in the training data, and the risk of misuse for generating harmful content, such as hate speech or misinformation.
Open-source foundation models find applications across a diverse array of domains. These include text generation for news articles, blog posts, and books; image generation for realistic portraits, landscapes, and abstract art; question answering for factual queries, open-ended questions, and creative queries; various natural languages processing tasks, such as text classification, sentiment analysis, and machine translation; and various speech recognition tasks, including transcribing audio recordings, translating speech to text, and voice command device control.
End-to-end apps (end–user-facing applications with proprietary models)
End-to-end applications in the realm of generative AI are comprehensive software solutions that employ generative models to provide specific services to end users. Such applications typically include proprietary machine learning models that a particular company has developed and owns. They encapsulate these models within a user-friendly interface, concealing the intricate technicalities of the underlying AI.
The term “end-to-end” signifies that the application manages all process aspects, from the initial data input to the final output or action. This is especially pertinent to generative AI, where applications can take user inputs, process them via a proprietary AI model, and deliver an output within a single, seamless application.
Platforms like Midjourney and Runway ML exemplify tools that enable the creation of end-to-end applications utilizing proprietary models in the generative AI context. Midjourney empowers developers to construct, deploy, and scale AI applications, offering them a set of tools to leverage AI technologies without necessarily being experts in machine learning or data science. Developers can create end-to-end applications through Midjourney that utilize proprietary models to process user inputs and deliver generated outputs directly to the user.
Although the platform supports a variety of AI technologies, in the context of generative AI, it could be used to construct applications like an AI-powered design tool, an automatic content generator, or a predictive text application. All these applications are considered end-to-end as they handle the entire workflow from acquiring the user’s input, processing it with a proprietary AI model, and delivering the generated output back to the user.
Runway ML, on the flip side, is a creative toolkit driven by machine learning, aiming to democratize access to machine learning for creators from diverse backgrounds, such as artists, designers, filmmakers, and more. The platform offers an intuitive interface that lets users experiment with pre-trained models and machine-learning techniques without needing extensive technical knowledge or programming skills. Users can browse and select from a vast assortment of models, including generative models like GANs (Generative Adversarial Networks) and VAEs (Variational Autoencoders), to incorporate them directly into their projects. For example, in the scope of end-to-end applications, a user could employ Runway ML to build a generative art project, where the user provides a source image or a set of parameters, and the application generates an art piece based on that input. This entire process is managed within Runway ML’s interface, forming an end-to-end application for creating generative art.
End-to-end apps using proprietary generative AI models present numerous benefits. They are easy to use, providing user-friendly interfaces for content generation. They are often affordable or even free to use, scalable to accommodate many users and incorporate strong security measures for user data protection. However, there are challenges as well. These applications may exhibit bias, depending on the data they were trained on, and there could be privacy concerns as these apps may collect and use user data in ways unknown to users. The generated output may not always be accurate, depending on the task at hand. Additionally, these applications may not match human creativity levels and may fall short of generating truly original content. As generative AI technology continues to evolve, we can anticipate even more innovative and exciting applications.
Infrastructure: Cloud Platforms –cloud deployment model and how it runs model training and inference workloads
Generative AI models are developed to generate new content based on the patterns they learn from vast training datasets. However, given the size and complexity of these datasets, the process of training generative AI models is both computationally intensive and storage demanding. To overcome these challenges, AI practitioners leverage the power of cloud computing platforms, which provide the necessary resources without substantial investment in local hardware.
A range of cloud computing platforms facilitates the training and deployment of generative AI models, including prominent ones such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Each platform has a suite of services tailored for AI applications. For instance, AWS’s SageMaker, Azure’s Machine Learning Studio, and GCP’s Cloud ML Engine provide managed environments for effectively training and deploying machine learning models.
The following steps form the typical machine learning pipeline for training and deploying a generative AI model on a cloud platform:
- First, choose a cloud computing platform.
- Then, set up a development environment.
- Prepare the data.
- Train the model.
- Deploy the model.
Once deployed, these models serve various purposes depending on the application. A model trained to generate images might be used to create realistic graphics for advertising, while a model that generates text could be tasked with creating believable dialogue for video games or films.
Cloud-based infrastructure offers multiple advantages for running model training and inference workloads:
- Scalability: Cloud platforms can instantly scale resources to meet the demands of large datasets and intense workloads, optimizing cost and efficiency.
- Parallelism: Cloud infrastructure supports concurrent processing, allowing multiple training or inference tasks to run simultaneously, speeding up overall processes, and facilitating efficient hyperparameter tuning.
- Storage and Data Management: Cloud platforms provide robust storage solutions and data management services, simplifying tasks like data cleaning, transformation, and secure storage.
- Accessibility: These platforms are accessible from anywhere, encouraging global collaboration and providing access to cutting-edge hardware like GPUs and TPUs.
- Managed Services: Many cloud platforms provide managed AI services, abstracting away the details of infrastructure management and allowing developers to concentrate on crafting and refining their AI models.
Once trained, models are ready for inference – generating predictions based on new data. Cloud platforms offer services that host the model, provide an API for applications to interact with it, ensure scalable handling of multiple requests, and allow for monitoring and updates as needed.
However, the use of cloud computing platforms for training and deploying generative AI models comes with its challenges:
- Security: These platforms are susceptible to security breaches.
- Privacy: Cloud platforms may collect and use user data in unanticipated ways.
- Compliance: Compliance with all regulations, like HIPAA and GDPR, might only be assured across some cloud platforms.
Compute Hardware GPUs TPUs (accelerator chips for model training)
GPUs, initially designed for rapid rendering of images and videos, primarily for gaming applications, have been found to be well-suited for the types of calculations necessary for training machine learning models. They can perform many operations simultaneously due to their design which supports a high degree of parallelism. This is particularly beneficial for generative AI models, which often deal with large amounts of data and require complex computations. In these models, GPUs can concurrently execute typical operations like matrix multiplication, resulting in a significantly faster training process than a traditional CPU (Central Processing Unit).
On the other hand, Tensor Processing Units (TPUs), a type of processor developed by Google, are built to expedite machine learning workloads. They excel in accelerating tensor operations, a key component of many machine learning algorithms. TPUs possess a large amount of on-chip memory and high memory bandwidth, which allows them to handle large volumes of data more efficiently. As a result, they are especially proficient in deep learning tasks, often outperforming GPUs in managing complex computations.
Given these hardware capabilities, when planning to build generative AI applications, some key aspects to consider include:
- Dataset Size and Complexity: The size and complexity of the dataset will determine the necessary computing power required to train the model.
- Model Type: The model type will also impact the computing power required. For instance, recurrent neural networks (RNNs) are usually more computationally demanding to train than convolutional neural networks (CNNs).
- Desired Accuracy: Higher accuracy typically requires more training data and more computing power.
- Performance Requirements: The computational requirements of your model dictate the choice between CPUs, GPUs, and TPUs. Generative models, such as GANs (Generative Adversarial Networks), often demand a lot of computational power, and using GPUs or TPUs can significantly accelerate the training process.
- Cost: While GPUs and TPUs can be costly, they may reduce the training time of your models, potentially leading to long-term cost savings. Balancing the initial cost of these units against their potential to expedite your development process and reduce costs over time is crucial.
- Ease of Use: Certain machine learning frameworks facilitate the use of GPUs or TPUs. For example, TensorFlow, developed by Google, supports both. When selecting your hardware, the ease of integrating it with your chosen software stack should be considered.
- Scalability: As your application expands, you may need to augment your computational resources. GPUs and TPUs support distributed computing, enabling you to utilize multiple concurrent units to process larger models or datasets.
- Energy Efficiency: The energy consumption of training machine learning models can be high, which may lead to substantial costs and environmental impacts. TPUs, designed for energy efficiency, could be advantageous if extensive training is planned.
Lastly, selecting compute hardware is one facet of building a generative AI application. Other considerations include the choice of your machine learning framework, data pipeline, and model architecture, among other factors. Also, remember to factor in the cost, availability, and expertise required to use compute hardware effectively, as these elements can also impact the successful implementation of generative AI apps.
Building Generative AI Apps
- Mobile Apps
- Desktop Apps
- Web Apps
Desktop apps designed for personal computers can also be improved by generative AI. For example, it can be used to create custom graphics in a design tool based on user input or generate transitions, effects, or even entire scenes in a video editing tool. Furthermore, generative AI can be utilized in productivity tools to automate tasks, such as generating email responses or creating meeting agendas based on past meeting data. The advantage of using generative AI in desktop apps is that it can handle more complex tasks and larger datasets due to the increased processing power of desktop computers, facilitating more intricate and sophisticated generation tasks.
Web Apps: Web apps are accessible via browsers and are device agnostic, with the capacity for broad reach—their ease of development, deployment, and compatibility with generative AI position them as potent applications. With generative AI, they can create dynamic and personalized content tailored to user behavior, extending to product recommendations, predictive text, and customized news. Additionally, they can facilitate text translation and image generation and even spawn virtual assistants. This integration leads to a more interactive and personalized user experience and also enhances efficiency through the automation of content generation and data analysis.
Plugins are software add-ons (modules or components) that extend the functionality of existing software. Generative AI can enhance these plugins, improving a wide range of software, including web browsers, word processors, and image editors. For instance, a music production plugin might use generative AI to create new melodies or harmonies, while a web browser plugin might generate summary notes of a webpage. The advantage of employing generative AI in plugins is that it allows users to amplify their preferred software with advanced AI capabilities without switching to a new platform.
Extensions, much like plugins, modify or enhance software applications but are predominantly designed for web browsers. Generative AI can be used to develop extensions that elevate the functionality of a web browser in various ways, such as blocking ads, translating text, or generating images. For instance, an extension could use generative AI to recommend personalized content based on a user’s browsing history or generate dynamic themes based on the time of day or season. Generative AI in extensions leads to a personalized web browsing experience, assisting users in navigating the vast amount of online information more effectively.
Bots are software programs that automate tasks. Generative AI can create bots capable of performing various tasks, such as customer service, marketing, and data analysis. For example, a customer service bot could use generative AI to generate responses to customer inquiries, while a social media bot could use it to create posts or tweets. In addition, gaming bots could employ generative AI to form dynamic behaviors based on human players’ actions. The advantage of generative AI in bots is its ability to automate tasks responsively and adapt to specific contexts, decreasing the workload for human operators and delivering a more engaging user experience.
APIs, or Application Programming Interfaces, are pivotal in improving the functionality and user experience of a wide array of applications, predominantly by acting as the backend. Generative AI can be utilized to create APIs that deliver various services such as personalized content, text translation, image generation, data analysis, and more, enhancing the capabilities of other applications without requiring each to develop these intricate functions from scratch.
For instance, an API that generates personalized content can assist apps in providing more relevant and engaging content to users, thereby improving user engagement and experience. Likewise, an API that translates text can help apps broaden their user base by catering to an international audience and eliminating language barriers. Similarly, an API that generates images can enable apps to create visually captivating content to attract and retain users.
One of the key advantages of APIs, especially those powered by generative AI, is the abstraction of intricate AI functionalities. This allows developers without extensive AI training to seamlessly integrate AI into their applications, consequently enhancing their functionality and user experience. It’s important to note, however, that, unlike other application types such as mobile apps, desktop apps, and bots, APIs are primarily used by developers as a tool to create these applications and are typically not directly interacted with by end-users.
Moreover, an effective entry strategy could enrich your current apps with AI capabilities, thus strengthening your core business offerings. While owning proprietary data can be advantageous for refining your machine learning model, it should be noted that this path might necessitate more substantial capital expenditure. Thus, striking a balance between leveraging existing resources and investing in new assets is key to achieving success in generative AI.
Unleash the potential of your business with a Krasamo MLOps team! Our experts offer comprehensive machine learning consulting, starting with a discovery call assessment to identify your needs and opportunities and map the path to your success.