Developer finds AI that generates images is great for compressing

Developer finds AI that generates images is great for compressing photos
Emerson Rosemary

Developer finds AI that generates images is great for compressing photos

Stable Diffusion is an engine that uses machine learning to generate images from text. But one developer found that the tool can also be used to compress an image to a level that surpasses standards like JPEG and WebP. The resulting file may even have visual artifacts, but in a low proportion.

On the web, where image optimization is so important that it is encouraged by Google, this discovery can be useful to online stores, social networks and websites.

Explaining this Stable Diffusion thing

Let’s say Stable Diffusion is a fashion toy. Like the Dall-E, the engine can generate images through instructions given via text, in a matter of minutes. Many of them are impressive.

Cool, but what about image compression?

If, on the one hand, tools like Stable Diffusion have been bothering artists by flooding communities with artificially generated images, on the other hand, they attract the attention of many artificial intelligence enthusiasts.

This is the case of software engineer Matthias Bühlmann. While testing Stable Diffusion, he found that the tool triggers three artificial neural networks. One of them is the Variational Auto Encoder (VAE), which encodes and decodes an image within a latent space.

Understand latent space with an intermediate representation of the expected result. Making a rough comparison, it is as if this space represents a sketch or a reduced version of the image.

In fact, the latent space contains a lower resolution representation of the original image, but with more precise details. With this, the image can be expanded again, without being de-characterized.

The latent space is used for another neural network algorithm, U-Net, to kick in. A random noise is inserted in the space for this mechanism to generate predictions about what it “sees” there. It’s as if the algorithm is a person trying to identify shapes in clouds.

This process serves to eliminate noise in a manner consistent with the expected result and works in line with the third neural network, the text encoder. This serves as a guide as to what U-Net should try to “see”.

This is a very simplified explanation. What matters is knowing that all this is combined so that the image requested by the user is generated.

During this process, the image has “impurities” removed. This is not necessarily to make it smaller, but to make the result more accurate.

Now, compression

During his experiment, Bühlmann discovered that the Stable Diffusion algorithms can be adapted to just do image compression. To do so, he removed the text encoder, but kept the procedures related to image treatment.

In testing, he found the results to be quite convincing. The photo of a llama was made 6.74 KB in size using a tool that compresses the image in WebP. In JPEG, the image was 5.66 KB. In Stable Diffusion, the image was 4.97 KB and, even so, presents more details than the previous ones.

The result is not perfect, let’s be clear. Bühlmann noticed that faces or text in images may be less visible. But it is assumed that the mechanism can be adapted and trained to overcome these limitations.

For those who want to experiment or even collaborate with the project, Bühlmann has published the source code of his work on Google Colab.



Source link

About Admin

Check Also

New Galaxy S23 5G are now on sale on Amazon, including free Buds2

Home › 💲 Offers › New Galaxy S23 5G are now on sale …

Leave a Reply

Your email address will not be published. Required fields are marked *