
(T) Following is my quick summary of this week Google Generative AI and other major ML announcements at Google IO 2023. The breath and depth of those announcements is quite impressive!
I wrote a few notes to compare PaLM 2/Bard to GPT-4/ChatGPT, and when some technical details are missing or are not clear from the announcements.
PaLM 2 in a nutshell:
Overview:
- Autoregressive model that uses a decoder-only architecture like PaLM and GPT-3
- 340 billion parameters (PaLM had 540 billion parameters), trained on 3.6 trillion tokens (PaLM was trained on 780 billion tokens)
- Will be available in four sizes from smallest to largest: Gecko (for mobile devices), Otter, Bison and Unicorn
- Fine-tuned on two application domains: Med-PaML-2 for medical knowledge and Sec-PaLM for enterprise security use cases
- APIs available for developers with MarkerSuite to prototype Generative AI applications, and on Google Cloud Vertex AI – Model Garden with other models (codey, imagen, and chirp) – and with Generative AI Studio for application development
- Gemini will be Google’s next foundation model after PaLM 2 and T5, developed from the ground-up to be multimodal, with various sizes and capabilities, and to be further announced
Notes:
- PalM 2 has not achieved Jeff Dean’s Pathways vision:
- A single model trained on millions of tasks e.g. massive multi-task learning
- That model will be multimodal e.g. multiple types of inputs
- And it will have sparse architecture
- PaLM 2 is not multimodal. It takes text as an input and generates text as an output. GPT-4 is multimodal. It takes both text and images as an input
- If the information from CNBC about the number of parameters of PaLM 2, 340 billion, is correct, PaLM 2 is following the recent trends of models from DeepMind (Chunchilla) and Meta (LLaMa) that showed some good performance for a number of tasks with a lower number of parameters
- Note that Google provides two ways to access the PaLM 2 APIs – one is for Google developers, and the other one is for Google Cloud users
Capabilities:
- Multilingual, trained on 100 languages, captured better text nuances, including idioms, poems and riddles
- Improved reasoning and logic (?) as training data sets includes scientific papers and Web pages that contain mathematical expressions (see below paper in technology)
- Generate code for 20 programming languages – popular ones such as C++, Go, Java, Javascript, Python and Typescript but also legacy ones such as Prolog, Lisp, Fortran, and Cobol
Notes:
- PaLM 2 powered 25 Google new products but I could not find the exact list of those products
- I could not find as well the complete list of the 20 programming languages but I tried some of the legacy languages and it works!
- Google mentions that reasoning and logic have been improved but it does not define really it in its technical report
- PaLM 2 like PaLM makes use of chain of thought prompting which decomposes a problem into intermediate reasoning steps to improve the correctness of the answer
Technology:
- Compute-optimal scaling: validate that data and model size should be scaled roughly 1:1 to achieve the best performance for a given amount of training compute (as opposed to past trends, which scaled the model 3 times faster than the dataset)
- Improved dataset mixtures: multilingual and diverse pre-training data sets, which extends across hundreds of languages and domains (e.g., programming languages, mathematics, and parallel multilingual documents)
- Use a tuned mixture of different pre-training objectives based on UL2 (UL2 blog article, UL2 paper)
PaLM 2 training data as described in the technical report:
- “The PaLM 2 pre-training corpus is composed of Web documents, books, code, mathematics, and conversational data. PaLM 2 is trained on a dataset that includes a higher percentage of non-English data than previous large language models…
- In addition to non-English monolingual data, PaLM 2 is also trained on parallel data covering hundreds of languages in the form of source and target text pairs where one side is in English…
- PaLM 2 uses several data cleaning and quality filtering methods, including de-duplication, removal of sensitive-PII and filtering…
- For a small fraction of pre-training data, PaLM 2 uses special control tokens marking the toxicity of text, using signals from a fixed version of the Perspective API…
- PaLM 2 was trained to increase the context length of the model significantly beyond that of PaLM. This improvement is crucial for enabling capabilities such as long dialog, long-range reasoning and comprehension, summarization, and other tasks that require the model to consider a large amount of context“
Notes:
- Note how much PaLM 2 training leverages the concepts proposed in UL2
- Google did not disclose the exact “context length”. For ChatGPT based on GPT-4, it is 8,192 for
gpt-4-0314
and 32,768 forgpt-4-32k-0314
- Google did not also disclose the architecture of the data pipeline for PaLM 2. PaLM 2 is trained with TPUv4 but not sure if the Pathway System used to train PaLM has been modified or not
Bard:
- Powered now by PaLM 2 (previously used LaMDA), available in over 180 countries, and in the future in the 40 most spoken languages
- Bard can retrieve information from the Internet (when the user in Bard uses the button “Google it”) by querying Google and Bing search, and crawl Web sites not indexed by Google and Bing search engines
- Output includes text and images (based on Google Lens), generates code with source citation, dark theme, and “export” button into Google Codelab
- Will be integrated to Google apps and services (Docs, Drive, Gmail, Maps and others) and to other apps (Kayak, OpenTable, ZipRecruiter, Instacart, Wolfram and Khan Academy)
- Will be integrated to Adobe Firefly and Adobe Express for generating images
Notes:
- It is quite impressive that Google was able in a few months to migrate Bard from LaMDA to PaLM 2
- Google did not mention why it did the switch from LaMDA to PaLM 2 but the key reason is likely because LaMDA was trained on dialogue data and its use case was on conversation, while PaLM 2 is a more generic LLM that Google can use not only for Bard but also many other applications
- As of today, you can only use Bard in US English, Japanese, and Korean. ChatGP is available today in many languages: French, Spanish, German, Italian, Portuguese, Dutch etc…
- ChatGPT does not retrieve information from the Internet e.g. search engines or Web sites as Bard does
- ChatGPT available on OpenAI does not output images but ChatGPT plus and Bing Chat powered by GPT-4 do
- Bing Chat shares the source for its responses. That is not the case for OpenAI’s ChatGPT. Bard seems to share the source of its responses such as when the response is from a well known source such as Wikipedia
- It is not clear if/how other apps will be integrated to Bard as it is the case for ChatGPT plugins
- Bard based on LaMDA and probably Bard based on PaLM 2 make use of fine tuning the model predictions first with FLAN (Finetuned Language Models Are Zero-Shot Learners – blog post, paper) and improves it with RLHF second in a similar way to OpenAI’s Instruct and ChatGPT‘s RLHF.
PaLM-2 Applications:
- Gmail: “help me write”, takes e-mail thread context for drafting e-mail responses
- Duet AI for Google Workspace:
- Google Docs: Generates a draft based on a topics including smart chips (integration of people, files, or events data to Google Docs) for information like location and status, and variables for details that the user want to customize
- Google Sheets: automate data classification based on the context of a cell, and creates custom plans for tasks, projects, or any activity that the user want to track or manageGoogle Slides: generates images created from text
- Google Slides: generates images created from text
- Google Meet: generate background for video calls
- Duet AI for Google Cloud:
- Code assistance: code generation for cloud applications, identifies code vulnerabilities, and bug fixes
- Chat assistance: assistance for cloud apps development
- AppSheet: create new workflows in Google Workspace without any coding
Other Google Generative AI and ML applications introduced at Google IO 2023:
- Search Labs:
- Search generative experience:
- Snapshot of key information to consider when searching
- Product information when shopping based on Google shopping graph
- Code tips: for writing codes (C, C++, Go, Java, JavaScript, Kotlin, Python, TypeScript), tools (Docker, Git, shells), and algorithms
- Search generative experience:
- Project Tailwind: notebook, powered by the user’s notes and sources
- MusicLM: generate music clips from text requests
- Google maps: immersive views for routes
- Google photos: Magic Editor
- Universal translator: new translation service that puts video into a new language while also synchronizing the speaker’s lips with words they never spoke
- Project Startline: feeling like a remote person is facing you
References:
- “Google I/O 2023: Making AI more helpful for everyone“
- “Google IO 2023“
- “100 things we announced at I/O 2023“
Note: The picture above is the courtyard of the Boston Public Library.
Copyright © 2005-2023 by Serge-Paul Carrasco. All rights reserved.
Contact Us: asvinsider at gmail dot com
Categories: Artificial Intelligence, Deep Learning, Uncategorized