C++ Fundamentals: Hardware Brutalism and Zero-Cost Abstraction

C++ fundamentals time. Today, let us feel again the core temperament of C++: hardware brutalism, and zero-cost abstraction.

There is a function dedicated to floating-point multiply-add: std::fma, whose name comes from Fused Multiply-Add. You give it x, y, and z, and it returns x * y + z. Someone may ask: “How is that different from just writing x * y + z?” The key is precisely that word: Fused.

The ordinary expression x * y + z usually proceeds in two steps. First it computes temp = x * y. During this process, the result is rounded to the precision representable by the floating-point format, such as the 53 significant bits of double. Precision has already been lost once. Then it computes result = temp + z, where another rounding occurs. In total, there are two rounding errors.

By contrast, under IEEE 754, std::fma(x, y, z) must compute the exact value of x * y + z as if it had infinite intermediate precision, and only then perform a single rounding when storing the result back into the floating-point format. This means std::fma is usually more accurate than ordinary multiply-add. It is especially valuable when x * y and z are close in magnitude but opposite in sign, producing catastrophic cancellation; std::fma can preserve more significant digits.

Modern CPUs, such as Intel architectures since Haswell and ARM Cortex-A series cores, usually support FMA instructions directly at the ISA level, such as x86 FMA3 or the historical FMA4 instruction set. std::fma is often compiled into a single assembly instruction, for example vfmadd213sd. That means it is not only more accurate, but also extremely fast, often taking only 4-5 clock cycles while offering high throughput.

But if the hardware does not support it, the compiler may call a software library to emulate infinite intermediate precision. At that point it can become very slow. The standard defines three optional macros, FP_FAST_FMA, FP_FAST_FMAF, and FP_FAST_FMAL, to report the ground truth: whether FMA is actually “fast” on this machine.

Imagine writing a std::list<int> and passing in std::allocator<int> as the allocator. Internally, however, a linked list does not store lonely int objects directly; it stores nodes, say Node<int>. The allocator only knows how to allocate memory of int size, so where do the next / prev pointer fields inside Node go? This is where allocator rebinding is needed. In the early model, it was implemented through a rebind struct.

Before C++11, for example in C++98, the standard required every allocator to manually provide a rebind struct. In C++11 and later, std::allocator_traits uses SFINAE techniques to detect whether the user has written rebind: if so, it uses the user-defined version; if not, it automatically substitutes the template parameter and generates the corresponding allocator type.

What is SFINAE? It stands for Substitution Failure Is Not An Error: if invalid code appears while substituting template parameters, such as accessing a type that does not exist, the compiler does not immediately issue an error. Instead, it treats that overload as non-viable and continues looking for another candidate. One caveat matters: SFINAE only protects substitution failures in the immediate context, including return types, function parameter types, and default template parameters. It does not protect code inside the function body.

The first-generation and most classical SFINAE tool is std::enable_if. Its mechanism relies on partial specialization: if the condition is true, it has a type member; if the condition is false, it has no type member, thereby triggering SFINAE and making the function disappear from the candidate set. Here is pseudocode Gemini generated for me:

#include <type_traits>
#include <iostream>

// Version 1: exists only when T is a floating-point type
template <typename T>
typename std::enable_if<std::is_floating_point<T>::value, void>::type
process(T t) {
    std::cout << "Processing floating point: " << t << std::endl;
}

// Version 2: exists only when T is an integer type
template <typename T>
typename std::enable_if<std::is_integral<T>::value, void>::type
process(T t) {
    std::cout << "Processing integer: " << t << std::endl;
}

int main() {
    process(3.14); // Matches version 1; version 2 is discarded by SFINAE
    process(42);   // Matches version 2; version 1 is discarded by SFINAE
    // process("Hello"); // Both fail; only now is it a real Compile Error
}

The typename in there is purely a C++ syntax issue. Before template instantiation, the compiler by default treats whatever follows a scope-resolution operator as a variable or value. So why is typename sometimes necessary and sometimes absent? The core issue is dependent names: whether the compiler can eliminate the ambiguity at the current stage.

For example, std::vector<int>::iterator is not a dependent name. Because std::vector<int> is fully determined, the compiler can look it up and know that iterator is a type, so typename is unnecessary. But T::iterator is a dependent name, because the meaning of iterator depends on what T actually is. For dependent names, the compiler assumes a value by default, so typename must be used to explicitly mark it as a type. There is another similar syntactic trap, called .template, or ->template:

template <typename T>
void call_foo(T& t) {
    // Wrong! The compiler may parse this as: (t.foo < 3) > (5)
    // t.foo<3>(5);

    // Correct! Tell the compiler that < begins a template argument list
    t.template foo<3>(5);
}

By C++17, we obtained a more elegant technique specifically for detecting whether a class has a certain member.

The role of std::void_t<...> is this: no matter what types you put inside it, as long as they are valid types, the result is void; if any expression is invalid, SFINAE is triggered. Gemini example time again, this time detecting whether a class has a reserve() function:

#include <type_traits>
#include <vector>
#include <iostream>

// Primary template: assume there is no reserve
template <typename T, typename = void>
struct has_reserve : std::false_type {};

// Specialized version: use SFINAE to probe
// If T.reserve(size_t) is valid, std::void_t<> becomes void and this specialization matches.
// If it is invalid, this line is invalid; SFINAE removes the specialization and falls back to the primary template.
template <typename T>
struct has_reserve<T, std::void_t<decltype(std::declval<T>().reserve(1U))>>
    : std::true_type {};

int main() {
    std::cout << has_reserve<std::vector<int>>::value << std::endl; // 1 (True)
    std::cout << has_reserve<int>::value << std::endl;              // 0 (False)
}

std::declval<T>() is a fascinating little thing:

template <typename T>
typename std::add_rvalue_reference<T>::type declval() noexcept;
// That is, it returns T&&

After C++11 introduced decltype, we often need to ask the compiler: “If I had two variables x and y, and I added them as x + y, what would the resulting type be?” If we have actual instances, this is easy. But in template metaprogramming, we usually have only the type T, not an instance. At that point, writing decltype(std::declval<T>().foo()) makes the compiler perform semantic analysis on the expression. Because this occurs in a type-parameter position, namely the immediate context, a deduction failure hits the SFINAE rule and removes that option from the candidate set. However, std::declval may only be used in unevaluated contexts.

Fortunately, the later introduction of polymorphic memory resources (PMR, Polymorphic Memory Resources) and Concepts ended many pages of obscure template scripture.

Before PMR, the allocator was part of the container type. std::vector<int, AllocA> and std::vector<int, AllocB> were two completely different types. This meant you could not pass them to the same ordinary function unless you also made that function a template; the result was severe code bloat and extremely inflexible interfaces. std::pmr uses virtual functions and type erasure to hide the concrete memory-allocation strategy at runtime. Now std::pmr::vector<int> is a single type. Whether the underlying resource is new_delete_resource or a hand-rolled monotonic_buffer_resource, the container type remains unchanged. In essence, this trades the cost of virtual function calls for smaller code size and simpler interfaces. It is a very typical engineering compromise.

In the past, using enable_if together with SFINAE produced code filled with angle brackets. Once an error occurred, the compiler would emit thousands of lines of “template instantiation failed” stack traces, making it almost impossible to see which condition was missing. Concepts, meaning compile-time constraints, let us describe type requirements in something close to natural language.

The dark age:

template <typename T,
          typename = typename std::enable_if<std::is_integral<T>::value>::type>
void foo(T t) { /* ... */ }

The bright age:

void foo(std::integral auto t) { /* ... */ }

Overall, PMR ended the type fragmentation caused by Allocator. It moved complexity from the compile-time type system into runtime object state, making code feel closer to traditional OOP. Concepts ended the obscure syntax of SFINAE. They turned implicit substitution failure into explicit constraint checking, reducing template programming from black magic into ordinary engineering. And with that, the almost dangerous sharpness of C++ finally gained a handle that human beings can actually grip.