What is Google's new "Whisk" tool and how to use it to generate images without text commands?
Creating a text command and an accurate description to generate an image using AI is a big challenge, as the results are often disappointing and require frequent adjustments. But with Google's new Whisk tool , this process has become very simple. This tool allows you to use images instead of detailed text to get modified or reimagined images.
But how does Whisk use images to generate new images, how can you use it, and how will it open up new horizons for artistic creativity?
How does Whisk work?
Whisk is the latest experimental tool in the Google Labs platform , and it is based on Google 's Gemini and Imagen 3 AI models . However, it does not copy the original images, but rather extracts key elements from them to create the new image, including:
- Subject: The main element in the photo, such as a person, pet, or object.
- Scene: The background or setting that frames the subject, such as a quiet beach or a bustling city.
- Style: The aesthetic style of the image, such as watercolor, cartoon, or futuristic.
The work begins with Gemini’s model analyzing the input images automatically. This analysis goes beyond simply identifying the elements in the image, but also goes beyond understanding the context and fine details of the image. Gemini then creates a detailed text description for each image. This description aims to capture the essence of the image—the main elements and characteristics that make it stand out—rather than simply providing an exact copy of it. This includes a detailed description of the main subject, background, colors, lighting, and any other relevant details.
The detailed text descriptions are then used as input to Imagen 3, Google’s latest image generation model, to guide the image generation process. With this process, it becomes easy to remix different elements of images—including subjects, scenes, and styles—in new and innovative ways. For example, a subject from one image can be combined with a background from another image in a specific artistic style from a third image, creating an entirely new image that has a unique combination of these elements.
It is worth noting that this tool focuses on capturing the essence of the input images and not creating an exact copy of them. This means that the goal is not to reproduce the original images, but rather to understand the main elements in them and use them as a basis for creating a new image that expresses a specific idea or concept.
How to use this tool to generate creative images?
- To get started, head to the Whisk homepage and sign in with your Google account.
- After logging in, you can choose from three basic image generation templates, each with a different visual effect, including:
- Sticker template: This template produces flat, 2D images, similar to the digital stickers used in messaging apps. This style is characterized by its simplicity and clarity of its elements.
- Enamel Pin: This style adds some depth and sparkle to your image, and is perfect for photos where you want to highlight details in an elegant way.
- Plushie : This template turns your photo into a 3D doll-like shape, adding a playful touch to your photos.
- By default, Whisk automatically selects an image for the style based on the template you choose, however, you will later have the ability to change this style manually.
- Choose the image you want to use for the theme. Here you can choose images from the images provided by the tool to use as themes, or upload any image from your device to use as a theme. This feature allows you to introduce personal or specific elements into your creations.
- After selecting a topic, the Gemini model analyzes the input images to accurately identify the pattern and subject and then combines them to create a new image.
- If you are not satisfied with the initial result, you can easily change the theme image and recreate for a different result.
Advanced Creative Control (Starting from Scratch):
In addition to the default method above, Whisk offers a start from scratch option, which gives you complete control over the creative process.
When you choose this option, you can follow these steps to generate images:
- Choose images for your subject, scene and style, either by uploading any images from your device, or by typing in traditional text prompts to describe what you want to appear in the image, as this option is also available in the tool although its primary focus is on using images as input.
- Once you've selected all the elements (subject, scene, style), ask Whisk to create the new image, and the tool will display a set of different images based on the selections you've entered.
- You can improve the results by clicking on the (Refine) option that appears in the upper left corner of the resulting image. This option allows you to change the images used to create the image, or modify the text prompts.
- To save and download your photos, you'll find all the photos you create automatically saved to your Whisk library, from where you can delete any you don't want and download the ones you like. All downloads are saved in JPG format, allowing you to easily use them in other apps and services.
What are the practical uses of Whisk?
Whisk's uses go beyond being just a fun experiment tool, as it offers wide possibilities in various fields, including:
- Graphic Design: Artists can create rapid prototypes by incorporating inspirations from different images.
- Marketing: Allows brands to create unique advertising images by integrating product elements with customer lifestyles and creative themes.
- Content Creation: Allows influencers and bloggers to create engaging and unique images.
Imagine creating a holiday greeting card combining a family photo with a snowy mountain scene and vintage postcard style in seconds!
Looking to the future:
Whisk maintains a delicate balance between creativity and control. Unlike tools that rely heavily on pre-defined algorithms, Whisk lets you actively participate in shaping the outcome. The mix of visual and text prompts caters to both intuitive creators and those who prefer detailed customization.
Whisk is still in beta, but it highlights Google’s commitment to generative AI . As it evolves, it could become an essential tool for artists, designers, and anyone looking to expand their creative horizons. By integrating technology with imagination, Whisk offers a glimpse into a future where visual storytelling knows no bounds.