C++ references, continued

So I got some feedback about my last C++ post. The comment states that references are not pointers, they are just names for another object.

Sorry for reopening a topic after nearly 6 months. But I cannot stay silent.
I think you got it wrong. Completely.
Although a reference might behave like “some sort” of a pointer, it is *not* a pointer. Your statement: “A reference is effectively a pointer, but this is hidden by the language.” is completely wrong.
To quote the C++ standard: “A reference is an alterantive name for an object.” It is just a new name for something that you’ve defined elsewhere. That’s the very reason why it cannot be null –> You cannot have an alternative name for an object that you do not have yet.

–Willi Burkhardt

Great, in theory. Unfortunately, none of the compilers I have used treat references as anything other than pointers. References are, on some level, supposed to guarantee non-null-ness as well as that they reference a valid object. This is not true in any compiler I have ever used.

Take this example (see it run):

#include <iostream>

static int const a_const = 5;

int const& A() {
	return a_const;
}

static int const* b_ptr = 0;

int const& B() {
	return *b_ptr;
}

int main() {
	int const& a_ref = A();
	
	std::cout << "Called A()" << std::endl;
	std::cout << "a_ref: " << a_ref << std::endl;
	
	int const& b_ref = B();
	
	std::cout << "Called B()" << std::endl;
	std::cout << "b_ref: " << b_ref << std::endl;
	
	return 0;
}

If we are to believe that references are simply another name for an object, then converting *b_ptr to a reference should have caused a runtime error. After all, we dereferenced a null pointer, right? The compiler should emit code to prevent this, right?

In an ideal world, this would cause an error — but it does not. The segmentation fault does not come until b_ref is used; indeed, we see “Called B()” in the program output, indicating that B() successfully returned a reference, which was stored in b_ref. Obviously, at runtime there was a null pointer dereference. But we didn’t use a pointer, I hear you saying. We used a reference!

Then please explain this behavior to me. On a language level, sure, references are “names for objects.” But this does not change the fact that the implementation is done using memory addresses — which is fundamentally the same thing pointers do. This helps to explain why we see the behavior of this sample. As I mentioned in my last post, when you convert an expression to a reference type, it’s treated exactly as though you had converted it to a pointer type, with an implicit address-of operator (&). So we can rewrite this function:

int const& B() {
	return *b_ptr;
}

Like this:

int const* B() {
	return &*b_ptr;
}

And it becomes immediately clear why the segmentation fault did not occur here — taking the address of a dereference expression is the same thing as taking the original expression. The & and * cancel out during compilation, and we just return the pointer. Take a look at this example, which is identical to the above example, except that A() is gone, and B() now returns a pointer, with dereferences added in the appropriate places (see it run):

#include <iostream>

static int const* b_ptr = 0;

int const* B() {
	return &*b_ptr;
}

int main() {
	int const* b_ptr = B();
	
	std::cout << "Called B()" << std::endl;
	std::cout << "b_ref: " << *b_ptr << std::endl;
	
	return 0;
}

Identical behavior.

So you can throw the spec at me all you want, but every implementation I’ve tried uses pointer-with-automatic-dereference semantics — if you convert every reference to a pointer, add an address-of operator to every assignment to a reference, and a dereference operator to every use of a reference, you will see identical behavior.

To preempt the “but the compiler can optimize local references” argument, the compiler can do exactly the same with pointers.

// With a reference
int A() {
    int a = 5;
    int& b = a;
    return b;
}

// With a pointer
int B() {
    int a = 5;
    int* b = &a;
    return *b;
}

I’ve heard the argument that the compiler can eliminate the reference in A(). Well, it can also eliminate the pointer in B(). If a local pointer is set to point at another local and the compiler can prove that it will never change, it can optimize it away just as easily as it can optimize away a local reference to a local.

So, this supports my original argument that references store memory addresses in the same way that pointers do, only with automatic dereferencing. They are effectively nothing more than syntactic sugar, allowing you to forget that you’re operating on an object somewhere else in memory.

8 Replies to “C++ references, continued”

  1. Shocking! I really would have expected the dereference error to come at the point *b_ptr is evaluated. Does the behaviour you demonstrate comply with the language spec? And if so, why does the spec allow it?

  2. @Ed: I think it illustrates two things: 1. References are implemented in the same way that pointers are implemented, as memory locations that hold memory addresses, and 2. there really isn’t anything that the compiler could be expected to emit, except possibly a null test (which it clearly does not). For example, if I had initialized b_ptr to 1, what could it have done? What constitutes a “valid int memory address?” (Replace “int” with any other C++ type and you run into the same problem.)

  3. That does not address the “references are basically syntactic sugar around pointers” argument, that only notes that storing a null pointer in a reference is illegal. I never said that it wasn’t.

    My point is that with every major compiler out there, you can interchange references with pointers (adding the appropriate dereferences) and observe identical behavior. This can easily be explained that using memory addresses is the easiest way to implement references, and therefore they behave like pointers for practical purposes.

  4. “that only notes that storing a null pointer in a reference is illegal. I never said that it wasn’t.”

    I think your example code shows that storing a null pointer in a reference is just fine, or that if it is illegal, it’s not a crime you will ever be prosecuted for.

    The FAQ that Ebrahim B. linked to resolves it by saying “The C++ standard does not require a diagnostic for this particular error”. That seems a bit of a cop-out to me. We are left with the situation that references “cannot” be null, but no compiler is bothering to enforce this properly, leaving the code to blow up later on when the reference is used.

  5. @Ed: Not really, since technically dereferencing NULL and storing the result in a pointer causes undefined behavior. However, this undefined behavior is consistent across every C++ compiler I have used in my career.

    I think that the cop-out is there because there is nothing that the compiler can reasonably do. For example, if I tried to store *static_cast<int*>(1) into a reference, what should happen? Should the compiler emit an unused mov before returning the memory address so that the program will fail at runtime? This would work but may be less performant.

  6. My feeling as a C and C++ programmer is that the * and -> operators are danger signs in your code. When you see them it is your responsibility to make sure the pointer is valid at that point. Since dereferencing a null pointer is undefined behaviour (I think) the compiler is quite free to let the code merrily continue on its way past the *((int*)0) and do something weird later on; but that would be unhelpful to the programmer, so most don’t.

    In the same spirit of being helpful and taking into account the programmer’s expectations that the * or -> operation will be the place an error is flagged, I would expect the compiler to generate some checking code for int& r = *ptr. Even if it does make the code run a tiny bit slower. That check can be left out at higher optimization levels, just as in principle the nullness test can be optimized away in { *ptr = 5; if (ptr == 0) { … } }.

  7. When I was in college learning C++ for the first time, this is what confused me to no end. It took me a while before I really understood pointers vs. references. And in my opinion, I’d much rather use pointers everywhere.

Comments are closed.