Google Search has been a torrential downpour of cash flow for Google for over 20 years.
With this in mind, Google had no incentive to disrupt the financial dominance of Search… until OpenAI’s ChatGPT launch.
That’s when the world changed for Google, and the rest of us.
An incredible new way to learn and create emerged — you can “talk” to information with ChatGPT and other LLMs.
Soon user behaviors started to shift.
We bypassed Google and interacted with AI instead.
Accordingly, Google’s incentives shifted. That’s when we met Google Bard — the first chatbot intelligence released by Google.
While Bard was not successful in replacing ChatGPT, it did give Google a vital window into how users interact with conversational intelligence. Engineers at Google DeepMind studied the natural language interactions happening in Google’s search product, and compared those to conversations in Bard.
In parallel to the Bard experiment, a massive stealth project was rapidly staffed in Google’s California and New York offices — Gemini.
Hypercomputers, Pathways & TPUs
Google began working on various problems in machine learning in the early 2000s. A decade ago, they turned the field of image understanding on its head when their deep learning models shattered benchmarks in object recognition. It wasn't just better, it was a paradigm shift.
The world of language hasn't been spared either. Transformers, monstrous creations like BERT and LaMDA, have pushed natural language processing to almost eerie levels of fluency. Don’t forget AlphaGo, the system that defeated a world champion in Go, a game so complex it was once thought impossible for a machine to master — famously there are more moves in Go than stars in the galaxy.
Here’s a rapid timeline to help us understand the procession. You can clearly see the roadmap, specifically how these breakthroughs align and amplify each other.
Google’s AI Timeline
Early 2000s: Google begins integrating machine learning concepts into its products to improve search and other functionalities.
2012: A pivotal moment as Google researchers demonstrate the power of deep learning for image recognition and speech recognition. This marks a shift towards modern neural network approaches.
2013-2014: Google applies deep learning across a range of products:
Google Photos: Improves image search and classification.
Google Translate: Revolutionizes machine translation.
2015: Deep learning integrated into Google's core search ranking algorithms leading to significant improvements in result quality.
2016:
AlphaGo: Google's DeepMind AI defeats a top Go player, a feat considered extremely difficult due to the game's complexity.
2017:
TensorFlow: Google open-sources its popular machine learning framework, accelerating AI research and democratizing access.
TPU (Tensor Processing Unit): Google designs specialized AI chips for significantly faster training of deep learning models
2019-2020: Google AI pushes the boundaries in natural language processing, with models like BERT demonstrating exceptional language understanding.
2021: Introduction of Pathways, a next-generation AI architecture designed for greater flexibility and the ability to handle multiple tasks simultaneously.
Before we explore Google’s Pathways, we need to talk tensors. Tensors are incredibly important in the current approach to AI.
To understand the power of Google’s TPU you need to grok the tensor.
What is a tensor?
Tensors offer a “way to extend familiar mathematical concepts into higher dimensions” — don’t worry, I’ll provide a multi-step example in plain english below:
Scalar: A single number (e.g., temperature: 25 degrees Celsius). This is a tensor of rank 0.
Vector: An ordered list of numbers (e.g., coordinates: [3, 2, -1]). This is a tensor of rank 1.
Matrix: A 2D array of numbers (e.g., image pixel values). This is a tensor of rank 2.
Beyond: Tensors can extend to even higher dimensions, capturing complex relationships between elements.
Let's imagine building a house to understand how tensors allow us to extend mathematical concepts into higher dimensions:
Foundation: Scalar (The Plot of Land)
A Single Value: Think of a scalar as the size of your plot of land. It's a single number (e.g., 500 square meters) that gives you basic information.
Dimension: 0D (a point)
1st Floor: Vector (Blueprint Outlining the Rooms)
Direction and Magnitude: A vector is like a basic house blueprint. It tells you the dimensions of each room (length and width). It has direction (which way the rooms extend) and magnitude (how big they are).
Dimension: 1D (a line)
2nd Floor: Matrix (Adding Walls and Detail)
Grid of Values: A matrix is like adding walls, windows, and doors to your blueprint. It's a grid where each cell represents a specific detail (e.g., the pixel brightness of an image or the relationship between two words in a sentence).
Dimension: 2D (a plane)
3rd Floor and Up: Higher-Order Tensors (The House in Reality)
Beyond Flat Surfaces: Now imagine stacking blueprints on top of each other. You could represent a 3D cube of data (like a video – height, width, and frames over time) or even more abstract relationships.
Dimensions: 3D, 4D, and even higher
Real-world data is rarely just a single number or a simple line.
Images, language, and many other kinds of data are multi-layered and have complex relationships.
Tensors let us capture that complexity — just like you can calculate the area of a room (multiplying values from a vector), we can perform advanced mathematical operations on tensors.
This is how AI models learn and manipulate data.
Tensors are the backbone of deep learning and many other AI applications because they provide a flexible and powerful way to represent and manipulate data.
Deep learning models involve a massive amount of mathematical computations (matrix multiplications, convolutions, etc.). These operations are naturally expressed and optimized using tensor operations. They also help us represent complex data, such as:
Images: An image can be represented as a 3D tensor (height, width, color channels).
Text: Sequences of words can be encoded as tensors, where each word is represented by a vector.
Time-series data: Sensor readings over time can be represented as a tensor where one dimension represents time.
The structure of tensors themselves can encode relationships. For example, in image processing, the proximity of values in a tensor corresponds to the proximity of pixels in an image. This allows neural networks to learn spatial patterns.
The Power of Higher Dimensions
Let's consider different types of relationships that can be represented with higher-dimensional tensors:
Spatial Relationships:
Images: In a 3D image tensor (height, width, color channels), the proximity of values within the tensor reflects the proximity of pixels. Neural networks then learn patterns in how those values relate to each other across different dimensions.
3D Data: Medical scans or scientific simulations with spatial dimensions (x, y, z) can also be represented as higher-order tensors.
Temporal Relationships
Video: A video can be a 4D tensor (height, width, color channels, frames over time). This structure allows AI models to learn patterns of movement and change across time.
Sensor Data: Imagine tracking temperature, humidity, and air pressure over time. Those relationships between different measurements at different points in time can be captured by a tensor.
Linguistic Relationships
Word Embeddings: Words represented by vectors (think of these as 1D tensors) can be combined into higher-order tensors to capture how they relate within a sentence or larger piece of text.
Machine Translation: Relationships between words in different languages can be modeled with tensors, helping power translation systems.
Multimodal Relationships
Image and Text: Imagine a tensor combining image features and text descriptions. This could allow an AI model to understand the relationship between an image and its caption.
Now that we have a solid understanding of tensors, we can more easily break down TPUs vs. GPUs.
What is a TPU - Tensor Processing Unit?
Purpose-built for AI: TPUs are specialized chips designed by Google primarily to accelerate the training and execution of machine learning models, particularly deep neural networks.
Focus on matrix operations: TPUs are optimized for the vast number of matrix multiplications and linear algebra computations that lie at the heart of neural network calculations.
Targeted reduced precision: Many AI workloads don't require extremely high-precision calculations. TPUs excel at lower-precision arithmetic, leading to faster speeds and improved energy efficiency.
What is a GPU - Graphics Processing Unit?
Origin in graphics: GPUs were initially created to handle the massive parallel computations needed for real-time, high-quality graphics rendering.
General-purpose versatility: The parallel nature of GPUs also made them well-suited for various computationally intensive tasks, including scientific simulations and, later, machine learning.
Flexibility: GPUs offer greater precision and more flexibility in the types of computations they can perform compared to TPUs.
TPU vs. GPU for AI
Benefits of TPUs for AI
Training Speed: TPUs accelerate the time-consuming process of training large neural networks, enabling researchers and developers to experiment more quickly.
Inference Efficiency: They provide fast and energy-efficient execution of trained AI models, making them well-suited for real-time applications.
Scaling: TPUs are designed for cloud-scale machine learning, often used in large clusters (called Pods) to handle massive computational demands.
So clearly for AI purposes, access to lots of high-grade TPUs is a competitive advantage.
Google invented the TPU.
They are on v5p or v6, depending on which set of documentation you reference — and you better believe their R&D is not only a full generation ahead on TPUs but are actively cultivating new branches in the technology tree.
The TPU paves the way for Google’s biggest hardware edge in the AI race— the Hypercomputer.
While Tech has a passion for adding hyper and similar prefixes to falsely add gravitas… Google’s AI Hypercomputer is worthy of actual gravitas.
The Hypercomputer doesn't exist in isolation. It's a key part of Google's AI infrastructure, closely integrated with Google Cloud and their research and development initiatives.
Google stresses that the Hypercomputer is designed as a complete, optimized system, rather than simply a collection of powerful components. This attention to how the software and hardware work in concert is crucial to achieving the promised performance.
Here's a breakdown of the key components of Google's Hypercomputer, along with a brief explanation of their significance:
Hardware
TPU Pods: The central compute power comes from massive clusters of Cloud TPUs. You know all about these now. The latest generation, TPU v5p, offers significant performance improvements over predecessors.
High-Density Footprint: The Hypercomputer occupies a relatively small area thanks to dense packing of compute and storage, reducing overhead compared to traditional supercomputers.
Networking: Jupiter, Google's very high-bandwidth data center networking technology, connects the components within the Hypercomputer, ensuring rapid communication between nodes.
Liquid Cooling: The system uses advanced liquid cooling to manage the heat generated by the dense configuration and powerful processors.
Software
Optimized Orchestration: Specialized software manages the distribution of AI workloads across the vast network of TPUs, ensuring efficient resource usage and high performance.
Open Software Support: Google emphasizes compatibility with popular open-source machine learning frameworks like TensorFlow, PyTorch, and JAX. This lowers the barrier to entry and makes the architecture broadly accessible.
Flexible Consumption Models: The Hypercomputer is designed to support different use cases with variations in how the resources are deployed, scaling from individual users working on small models to major research initiatives using the full power of the system.
Google is putting the power of the Hypercomputer to good use. It would not be possible to train and operationalize Google’s extremely secretive Pathways architecture without the power of the Hypercomputer.
Pathways → Google’s Improved Brain
Traditionally, AI models are highly specialized. A model that excels at image classification is very different from a model built for translating languages. This specialization leads to the need to train a vast number of separate models.
Training each of these task-specific models requires significant time, data, and computational resources.
Even 18-months ago specialized models often struggled to adapt or apply their knowledge to even slightly different tasks outside the domain they were trained for.
How Pathways is Different
Multi-Tasking: Pathways aims to train a single model that can handle thousands or even millions of different tasks, both within and across domains (image understanding, language, etc.).
Sparsely Activated Networks: Pathways doesn't utilize the entire model at once. It intelligently activates only the relevant portions of the model depending on the task at hand, leading to greater efficiency.
Learned Routing: The Pathways system includes trainable components that learn how to "route" information – deciding which parts of the model are most useful for specific tasks as it encounters them.
Benefits of Pathways
Efficiency: Training a single multi-tasking model reduces redundant computation and parameter storage compared to maintaining many separate models.
Generalization: A Pathways model can potentially learn from various tasks it's exposed to, improving its ability to generalize and potentially adapt to new tasks more easily.
Data Augmentation: The model's exposure to diverse tasks can act as a form of data augmentation (creating variations of data), improving its robustness.
Reduced Bias: Potentially, by training on a vastly wider variety of examples, biases that emerge in more specialized models might be reduced.
Technically Speaking… WTF is a Pathway?
Let's dive into the technical aspects of how Pathways works.
Pathways is an active research area, so some implementation details are subject to evolution.
Key Components of Pathways
Large-Scale Model: At its core, Pathways builds upon the concept of large-scale neural network models. However, the sheer size of the model alone doesn't enable multi-tasking. Pathways introduces the following elements to manage this complexity:
Sparse Activation: Instead of using the entire neural network for every task, Pathways selectively activates only the relevant portions of the model. Think of this as having a massive library of knowledge, but only consulting the necessary sections for each specific task.
Mixture of Experts (MoE): Pathways can be seen as an ensemble of many smaller 'expert' sub-networks. Each expert specializes in a particular type of task or domain.
Learnable Routing Mechanism: A key innovation in Pathways is a trainable component that determines which experts should be activated and how information should flow between them for a given task. This is similar to a librarian figuring out which resources to direct you to based on your research question.
How it Works (Simplified)
Input: A task is presented to the Pathways system. This could be an image, a piece of text, or some other form of data.
Routing: The routing mechanism analyzes the input and determines which combination of experts is most likely to be relevant.
Sparse Activation: Only the selected experts are activated, saving computational resources.
Processing: The activated experts process the input data, potentially interacting with each other if needed.
Output: The Pathways system produces output appropriate for the task, such as an image classification, a translation, or an answer to a query.
Challenges & Areas of Research
Scaling: Training and efficiently managing such a massive, sparsely activated model poses significant computational challenges.
Routing Efficiency: The routing mechanism is crucial for Pathways' success. Research focuses on making it highly accurate and computationally fast.
Task Representation: Understanding how to best represent different kinds of tasks to feed into the Pathways system for optimal routing and processing is an active area of study.
Gemini
What is Google Gemini?
Gemini isn't a single AI model, but rather a family of multimodal language models developed by Google DeepMind. Think of it like a set of siblings, each with different strengths:
Gemini Ultra: This is the largest, most ambitious of the Gemini models. It's designed for highly complex tasks that require reasoning and chaining together different types of information.
Gemini Pro: This model focuses on being a generalist, a reliable go-to for a broad range of tasks, from writing to generating code. It's a master of scaling.
Gemini Nano: Lean and mean, Nano is the smallest of the family. Its focus is on running efficiently on devices like smartphones and wearables, bringing a significant boost to AI capabilities even on resource-limited hardware.
Key Capabilities
What makes Gemini stand out isn't just its models, but what they can do:
Multimodal Understanding: The name of the game is versatility. Gemini models seamlessly handle text, images, code, audio, and video. This allows them to tackle tasks that would stump traditional text-only language models. Imagine asking for an email summary, but also getting a chart visualizing key points, or a short video clip highlighting important moments in a podcast transcript.
Flexible Assistance: Gemini aims to move AI from merely reacting to your requests to becoming more proactive helpers. It could be summarizing a complex document for you, offering creative suggestions, or even flagging potential issues with your code. It's like having a super-intelligent assistant at your fingertips.
The goal behind Gemini Ultra is not merely to slightly improve upon existing models, but to take a substantial leap in AI capabilities.
Think about the difference between a search engine listing websites and an AI that can synthesize information from across the web and answer complex questions directly.
Gemini has the potential to fundamentally change how we interact with technology. From making our smartphones and wearables significantly more intelligent and helpful, to revolutionizing productivity in creative and knowledge-driven fields.
Part of Gemini's design is to work at different scales. This means the same core technology can fuel large-scale, cloud-based AI tasks for researchers, while also empowering smaller models within individual devices.
How Gemini Fits in the Landscape
Let's discuss how Gemini could be integrated into both work streams and creation streams, as that seems to be the core of maximizing its potential.
Understanding Work Streams and Creation Streams
Work Streams: These are the processes, tasks, and workflows that keep our work lives running smoothly. Think of things like communication (emails, reports, meetings), project management, data analysis, and decision-making.
Creation Streams: These focus on generating new ideas, content, and solutions. This might include writing, designing, brainstorming, problem-solving, coding, or artistic expression.
Folding Gemini into Work Streams
Here's how Gemini's capabilities could enhance typical work patterns:
Augmented Communication: Imagine Gemini streamlining your emails: summarizing lengthy threads, drafting responses that fit your style, suggesting relevant information, and even detecting the tone of an email to gauge urgency.
Intelligent Research Assistant: Gemini could transform how you gather information. Summarize complex reports, identifying key trends in data, suggesting relevant external resources, and even offering potential counterarguments for balanced decision-making.
Proactive Problem Solving: Issues could be flagged, solutions proposed, and knowledge gaps pinpointed by Gemini. Imagine being alerted about potentially conflicting schedules, subtle errors in a financial spreadsheet, or alternative approaches for a project that seems off-track.
Folding Gemini into Creation Streams
Let's look at how Gemini might boost creativity:
Thought Partner: Gemini could serve as a brainstorming buddy, offering unique perspectives, suggesting alternative directions, and even generating initial drafts of creative text or code snippets.
Unbiased Critic: Gemini could help identify potential weaknesses or biases in your work, offer suggestions for improvement based on broader analysis, and even propose revisions.
Inspiration Engine: Imagine Gemini finding relevant and visually stimulating references for a design project, pulling out interesting patterns across a large dataset to inspire insights, or even helping you translate a vague concept into a more structured form.
The vision is a world where AI feels less like a smart piece of software, and more like a versatile, intuitive helper.
A world where our technology adapts to us, rather than the other way around. Imagine asking your phone, "Summarize my work email, write a funny reply to my friend's photo, and what's the fastest way to get home considering road closures?".
Google Leading AGI Race
We’ve outlined how hardware, software & data are driving Google toward winning the AGI race.
We haven’t yet mentioned Google’s largest advantages:
Google’s talent
Google’s massive customer base
Google AI and DeepMind attract some of the brightest minds in the field. The presence of leading figures like Geoff Hinton and Demis Hassabis drives cutting-edge research.
Google fosters an environment where researchers are encouraged to publish, share ideas, and build upon each other's work.
This open culture accelerates progress.
Google's products like Search, YouTube, and Google Maps provide access to enormous amounts of real-world data. This data is crucial for training large, generalizable models. The variety of data Google collects exposes potential AGI models to the complexity and richness of real-world information, fostering wider generalizability.
Google's track record includes major AI breakthroughs in image recognition, language modeling (BERT, LaMDA), and game-playing (AlphaGo, AlphaZero).
Google has a proven ability to integrate AI into real-world products and services at scale. This practical orientation aids in testing and shaping potential AGI technologies in real-life contexts.
I suspect Google could have launched something like Gemini 5-years ago, at significant CAPEX to deliver a worse experience that would also cannibalize Search.
Now that the dynamics have shifted, look for Google to quickly take the top spot in model evals with Gemini Ultra and stay at the top of the leaderboards.
Until AGI starts telling us which version of itself is the most potent.