Concurrency and Parallelism: Dancing at the Edge of Computation
Concurrency and parallelism are two concepts that are often placed side by side, and just as often conflated. Since the multicore era of the 2010s, algorithm design and architectural reasoning have long since moved beyond the classical single-threaded frame; even Introduction to Algorithms, a canonical text in the field, has added discussion of parallel algorithms in its later updates, though some still argue that both the length and depth of that treatment remain far from sufficient.
According to the definition I am more inclined to trust, concurrency means starting multiple tasks at the same time; parallelism means executing multiple tasks at the same time. Since the basic unit of CPU scheduling is the kernel thread (or lightweight process), using the process alone as the unit of measurement is not a sufficiently precise criterion for distinction. This is also why, apart from environmental isolation and runtime overhead, multiprocessing and multithreading often feel highly similar at the application layer.
Concurrency has always been an especially painful region in practice. Interruptions and scheduling events that occur at unknown moments can easily give rise to bizarre race conditions; object lifetimes also become ambiguous, as if everything were being held together by the programmer’s fragile mental model and the team’s conventions. Especially when highly privileged kernel processes are moving back and forth, the whole scene acquires an almost cruel kind of beauty.
Parallelization, by contrast, is one of the great tendencies of the age. From OpenMP and pthreads to the later emergence of CUDA, what stands behind it is in fact a renewed interpretation of computational power: the traditional RAM model, together with asymptotic complexity analysis, certainly guided algorithm design effectively for decades, but in the multicore era it inevitably shows its insufficiency. Models such as PRAM and LogP therefore emerged in response. These formal methods are of real value to practical engineering problems, because they describe the boundaries of capability and the frameworks within which those capabilities can be used.
Of course, if we keep asking where this computational power ultimately comes from, the answer is still hardware progress. Multicore parallelization differs from instruction-level parallelism, superscalar execution, or vectorization; it is more a vast landscape drawn together by hyper-threading, push-pull migration and affinity, symmetric or heterogeneous multiprocessing, and related mechanisms. These are especially worth attending to in optimization, because they are not merely background knowledge: they are the performance boundary itself.
Even today, when AI has opened an entirely new territory, distributed systems, systems programming, and high-performance computing remain the three mountains overhead. To put it with a little personal color and philosophical inclination: these fields are where computers dance at the boundary of “the approximation of computation,” and where the tension between formal methods and real systems reveals itself with particular intensity.