While a GPU can do all the operations a CPU can (which mean that you could, in theory use it like a CPU), the architecture isn’t optimized for it would make it very inefficient.

While CPUs and GPUs are basically the same thing (processors), both have different goals: the CPU is optimized for latency, and the GPU is optimized for throughput. (The goal of the CPU is to do any sequence of operations in the smallest possible amount for time, while the goal of the GPU is to do the maximum amount of work per amount of time).

To do this they both use different architectures/layouts: the CPU has a few, very big, fast cores and the GPU has hundreds/thousands of tiny, slow “cores”.

So to make an analogy:

  • The CPU is a supercar: two seats, 200mph top speed.

  • The GPU is an articulated bus: 400 seats, 30mph top speed.

If you wanna do one (or two) things really fast, the CPU wins. If you want to do the same thing over and over and over again a billion time (and don’t care how long it takes to do it just once), the GPU wins.


How does this look like on the chip then?

To really understand this, you need to know that in a CPU, the circuit that does the actual computation (let’s call it the ALU) is incredibly fast and the most important thing for CPU speed isn’t to make it faster, but to keep it fed with work to do and data to work on.

For this reason CPUs have a ton of extra circuits whose job is to keep the ALU busy (caches, predictors, schedulers, buffers, …). GPUs don’t do that as much. GPUs are designed to process pixels or triangles, and there are millions of them on a screen.

The repetitive nature of the work done on a GPU means that most cores will work on the same kind of thing at the same time, and the circuit that feed them with instructions and data can be shared across cores. And since you don’t care how long it takes for a single pixel to be computed, but rather how long it takes for the whole screen, each GPU core can afford to compute several pixels in parallel to amortize wait times (if the computation for a pixel has to wait for data from memory, the core can just switch to some other pixel).

The resulting architecture is very different: instead of having big cores with their own ALU and a huge control circuit to make the ALU happy, The GPU has groups of cores that share the same control circuit. This means that they can have a lot more ALU (because they don’t need as much control stuff), but that cores aren’t all independent. Cores withing a group have to work on the same thing, which is fine when doing graphics but can lead to atrocious performance when trying to do one single thing.