How computers represent images & motion

A non-technical overview

Jun 20, 2023

Some time ago, I started writing what I had intended to be a short note on the history of NVIDIA. However, I couldn't avoid getting into some technical details that exceeded what I wanted my note on NVIDIA to be. I split that note in two, and soon it was divided again. This one is the first (or the third if enumerating chronologically) of those notes.

With this note, I intend to explain, as simply as I can, how a computer represents still images and motion. I will keep technical details to the minimum, sacrificing technical accuracy for clarity whenever I consider it necessary, so please have that in mind if you are knowledgeable on the subject. In any case, I will appreciate any feedback that would help me improve this note.

Representing an image

An image in a computer is represented by pixels, which are colored squares, each one with a specific position within the image. More pixels translate to better quality and richer details. If you ever saw a movie on Netflix with a slow Internet connection, you would have surely noticed the pixels: it's those little squares that you see when there is a low-quality image on your screen. If you're wondering why they are visible when using a slow internet connection, please bear with me a little longer, and I will answer that question. But first, pixels.

To better understand what pixels are, I'll use a picture I took of the beautiful José Ignacio's lighthouse1 in Uruguay.

That is a high-quality picture. It is formed by arranging more than 5 million pixels. The amount of pixels is so high that we can't even notice them. Let's see what happens to the image when we reduce the number of pixels.

That is the same image, but it is composed of 26,400 pixels, which are now visible. They're visible because we need bigger pixels to have a picture with the same width and height as the previous one. This second image, although you can still see the lighthouse, the rocks surrounding it, and the sea in it, is of much lower quality and lacks details.

The following image has 2,688 pixels in total. As expected, pixels are even bigger than before and more visible. In this case, we lost so much quality and detail that it's impossible to understand the picture.

With the previous example, we can intuitively deduce that the higher the number of pixels, the smaller the pixels, the more detail, and the more quality. There is, of course, a drawback to having more pixels.

Each pixel is information (it specifies color and position, remember?). Having more pixels is more “expensive.” To understand why, imagine that 1 pixel equals 1 box, any box. It could be this box.

Now think of the lighthouse images. The third image would require 2,688 of those boxes. Imagine the space those 2,688 would use. Imagine yourself having to move 2,688 of those boxes. Supposing that moving each one takes only 1 second, you could complete that job in 45 minutes. Not bad, right?

Let's move on to the second image of the lighthouse. You would now have 26,400 boxes, or ten times more boxes than before. Can you picture the space those boxes would occupy and the work you would need to move each one? Whatever you've imagined for the previous example, multiply that by 10. It's getting challenging to picture that, right?

Finally, take the first image, and you end up with more than five million of those boxes. Could you think of a place large enough to store them all? I bet you can't! As to the time required to move them, you would spend more than 57 whole days in that endeavor...

The problem we have presented is the same one that computers must cope with when storing and processing images. More pixels mean more image quality but demand more storage and processing power.

Introducing motion

Do you know flipbooks? It's those little books with drawings or pictures that, when you flip their pages, you can see the images moving along the book. In case you don't, have a look at the following video.

For a flipbook to work, each image needs to progress in some way with respect to the previous one. If changes are tiny, the movement will feel more natural; conversely, if changes are significant, the movement will be more clunky. In addition, the velocity with which you turn the pages would also make the movement seem more natural or clunky. Motion in computer graphics is based on the same idea: a succession of images quickly turned. The velocity at which the images are turned is measured in number of images per second, referred to as frames per second (FPS). 24 is the minimum number of frames per second that is considered to provide a video that flows smoothly. Higher frames per second are usually required for action videos, such as those in sports or games, in which objects move hastily.

Video quality

We have already discussed that resolution (the number of pixels in an image) determines the quality of the image2. If a video is a series of images quickly passed, one after another, we can correctly deduce that resolution would also affect video quality. The other important factor is, as previously mentioned, the number of images (or frames) shown per second. The higher the frames per second, the more smoothly a video flows. There is a problem, though. Can you imagine what it is?

If we need a minimum of 24 images per second for a smooth video, the challenge that higher image resolution creates (more pixels, more boxes) is now multiplied by 24... Per second...

Want some number to grasp the dimension of the problem? Assuming that each pixel requires 1 byte to be stored/represented3, the following table4 compares image and video sizes for the three resolutions of the lighthouse pictures5.

Having gotten this far, you might already understand why we see the pixels when watching Netflix with a slow Internet connection. If not, let me add (Internet) bandwidth to the previous table.

In the first case, a video would require a fast internet connection, one that only a few people have at home. But even having a 1 Gbps connection, if at any moment other people are sharing that connection, you might end up with less than 1 Gbps available for you, so the resolution will be lowered to match the available bandwidth, so your movie keeps playing instead of freezing.

Having explained how computers represent images and motion, my following note will be a high-level analysis of how GPUs work and how they compare to CPUs.

Did you like this note? Why not sharing it with a friend?

José Ignacio is a beautiful and quiet town in Uruguay, along the Atlantic Ocean's coastline. It's 33 kilometers to the east of Punta del Este.

Resolution is not the only factor affecting image quality, but we'll not get into details in this note.

Bear in mind that 1 pixel = 1 byte is not what happens in reality. We assume that to simplify our analysis.

For some reason I don’t understand, Substack does not support tables. As the information I want to show is more easily to understand if presented in a table format, I’ve created the tables somewhere else and uploaded pictures instead.

The differences are extreme, but please keep in mind that there are lots of possible resolutions between the ones presented here.