Aug 30, Issue #1 | Singularity Gallery

type

status

date

slug

summary

0.Introduction

In this blog and future entries titled "Issues #", I'll highlight AI news that catches my interest. I'm writing this series because, while there's an abundance of information and hype in today's AI world, I don't need to document everything. That's why careful selection is crucial. These blogs aren't meant to be the polished, comprehensive newsletters you'd find elsewhere. Instead, they're simply collections of information that intrigues me, along with some of my personal thoughts. It's a rough, unstructured approach—more of a casual exploration than a formal report.

1.Google

1-1 New in models

Google has updated several models in their AI Studio, offering users more choices. You can explore these options in the Google AI Studio.

A new smaller 8B variant model: gemini-1.5-flash-8b-exp-0827

A new, more powerful Gemini 1.5 Pro model: gemini-1.5-pro-exp-0827

A new, enhanced flash model: gemini-1.5-flash-exp-0827

According to Logan Kilpatrick, the gemini-1.5-pro-exp-0827 model performs better on coding tasks and complex prompts. Here's the source:

Logan Kilpatrick on Twitter / X

Today, we are rolling out three experimental models:- A new smaller variant, Gemini 1.5 Flash-8B- A stronger Gemini 1.5 Pro model (better on coding & complex prompts)- A significantly improved Gemini 1.5 Flash modelTry them on https://t.co/fBrh6UGKz7, details in 🧵— Logan Kilpatrick (@OfficialLoganK) August 27, 2024

https://x.com/OfficialLoganK/status/1828480081574142227

I tested the gemini-1.5-pro-exp-0827 model and found that the output often repeats itself multiple times. This is not ideal at present. Below is my test output:

I tested it in both the Google AI Studio (left) and the API version (right). The results were identical.

1-2 New in Gemini App

Imagen 3 and Gems are rolling out in the Gemini App. Gemini Advanced subscribers will soon be able to create custom Gems. Moreover, all users will gain access to enhanced image capabilities powered by the latest Imagen 3 model.

Here's a demonstration:

In my opinion, I prefer Midjourney and Flux over Imagen 3. Midjourney emphasizes aesthetics, while Flux, being open-source, allows me to train a LORA model and explore more possibilities.

2.Zhipu AI

Zhipu AI has also launched several new features this week on KOD:

New LLM: GLM-4-Plus

New image generation model: CogView-3-Plus

New image/video understanding model: GLM-4V-Plus

New video generation model: CogVideoX

Below are my test results for GLM-4-Plus

I tested three very tricky questions, but glm-4-plus answered all of them incorrectly. This didn't meet my expectations.

Another update from Zhipu AI is that they've made glm-4-flash free. However, there may be some rate limits for free users. It's advisable to read their documentation carefully.

3.Qwen

The Qwen team announced the release of Qwen2-VL this week. They've open-sourced Qwen2-VL-2B and Qwen2-VL-7B under the Apache 2.0 license and provided the API for Qwen2-VL-72B. Here are some resources to learn more:

Blog: https://qwenlm.github.io/blog/qwen2-vl/

GitHub: https://github.com/QwenLM/Qwen2-VL

HF: https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d…

ModelScope: https://modelscope.cn/organization/qwen…

4. Runway

Runway has removed Stable Diffusion 1.5 from Hugging Face. You can verify this at the following link:

https://huggingface.co/runwayml/stable-diffusion-v1-5

5. Claude System Prompt

Anthropic has released their system prompts for Claude, which serve as excellent resources for learning about prompts. Check them out here: https://docs.anthropic.com/en/release-notes/system-prompts#july-12th-2024

Additionally, Anthropic has published prompt learning tutorials. These are valuable for expanding your knowledge. Find them here:

https://github.com/anthropics/courses/tree/master/real_world_prompting