C++ Memory Model and Memory order Semantics

Introduction

From C++11, Memory Model and Memory order semantics were introduced that makes it possible to run concurrent code using features of the standard library. This memory model is built on amalgamation of structural and concurrent aspects.
Structural aspects deal with how the data is laid out in memory and concurrent aspects of memory model specifies the aspects that are important for concurrent code, especially when performing low-level atomic operations.

  • As per this new model, all data can be viewed as objects(even the primitive built-in types). An object can be stored in one or more memory locations.
  • Primitive types of any size occupy exactly one memory location.
  • Adjacent bit-fields are part of the same memory location.

To get a sense of how troublesome things can get, let’s study a relatively simple piece of code that uses relaxed ordering to store and load 2 atomic variables in 2 functions that are run on separate threads.

void funcA()
{
	flag2.store(true, std::memory_order_relaxed);//(A)Thread-1
	flag1.store(true, std::memory_order_relaxed);//(B)Thread-1
}
void funcB()
{
	while(!flag1.load(std::memory_order_relaxed));//(C)Thread-2

	bool res = flag2.load(std::memory_order_relaxed);//(D)Thread-2
	assert(res);
}

In the snippet above, funcB waits for flag1 to be set which is done in funcA that’s running in thread-1. Common sense dictates that since flag2 will also be set since it is ordered before flag1 in funcA and the assert in funcB will never get fired. But, Alas! that intuitive behaviour is not guaranteed by the memory model and the assertion could theoretically fire in a multi-core system!
To understand why this happens, let’s first try and get familiar with some of the terms you come across when working with synchronisation and atomics.

Definitions

Inter-Thread Synchronization

This is used to determine a synchronisation relationship between two or more threads. In the snippet above, funcB does not guarantee inter-thread synchronisation with funcA due to relaxed ordering of memory and instructions.

happens-before

This is as intuitive as it sounds in single threaded applications, but not so much in multi-threaded applications. It models the behaviour of instruction that happened before the current instruction. In funcA, (A) happens before (B) and (C) happens before (D) on their respective threads. Order across threads is not guaranteed without explicit synchronisation.

Synchronizes-with

This is used to establish a synchronisation relationship between a pair of threads.

Dependency-Ordered before

Consider this piece of code:

atomic<int> result = 0;
void func()
{
    int a = 10;(A)
    int b = a + 10;(B)
    result.store(b);(C)
}

Here, on a single threaded application B depends on C and C depends on A. Therefore C depends on A. Therefore C has a dependency-ordered before relationship with B and A. The concept is similar when extending to multiple threads and we will study it a little closer when talking about various memory orderings.

Acquire

This is an atomic load operation with memory ordering stronger than memory_order_acquire.

Release

This is an atomic store operation with memory ordering stronger than memory_order_release.

Consume

This is an atomic store operation with memory ordering stronger than memory_order_consume.

Memory Ordering

The standard supplies us with following options for ordering memory. These hint compiler at the availability for optimisation and define memory can be re-ordered by the processors in the operating system.

  • memory_order_seq_cst
  • memory_order_relaxed
  • memory_order_acquire
  • memory_order_release
  • memory_order_acq_rel
  • memory_order_consume
Sequentially consistent ordering

This is represented by memory_order_seq_cst. This is the easiest ordering operation to reason about. Every thread has a sequentially consistent view of data and all modifications done to an atomic variable are observed in the same order.
In the snippet below, funcB on Thread-2 waits for flag1 from Thread-1. Because we mentioned sequentially consistent ordering, it is guaranteed that flag2’s store operation synchronises with (C) and by the time control reaches (D) in Thread-2, the thread’s storage would be updated to reflect flag2 correctly and the assert will never get triggered. Therefore, funcB Inter-Thread synchronises with funcA and following identities can be established:
(A) happens-before (B)
(B) inter-thread synchronises happens-before (C)
(C) happens-before (D)
(A) inter-thread happens-before (D) [By Transitive property]
Note that if a different thread is modifying A, depending on the scheduling, the value of A may take on this latest value. All the synchronisation guarantees is that no value older than store at (A) will be retrieved.

void funcA()
{
	flag2.store(true, std::memory_order_seq_cst);//(A)Thread-1
	flag1.store(true, std::memory_order_seq_cst);//(B)Thread-1
}
void funcB()
{
	while(!flag1.load(std::memory_order_seq_cst));//(C)Thread-2

	bool res = flag2.load(std::memory_order_seq_cst);//(D)Thread-2
	assert(res);
}
Relaxed Ordering

This is represented by memory_order_relaxed. Relaxed ordering is the least restrictive among all and allows for maximum optimisation. The initial code snippet uses a relaxed ordering and the standard makes no guarantee that the assertion wont trigger. Relaxed ordering lets compiler re-order instructions and lets the processor re-order memory as it sees fit and there’s no inter-thread synchronisation happening here at all! But, atomics are still guaranteed to execute without any data races though.
Having said that, the initial snippet runs perfectly fine on an x86 machines. Intel x86 machines’ store instruction supports synchronisation at the lowest level and guarantees that reads and writes are not reordered with older reads. Here what the developer’s manual says and if you want to read more, you can access it here.

Intel SDM Memory ordering snippet
Acquire-Release ordering

This ordering allows us to achieve some synchronisation using memory_order_acquire, memory_order_release and memory_order_acq_rel. This still doesn’t guarantee the total order of operations, but gives us ability to synchronise acquire(atomic load) and release(atomic store) operations. Atomic operations such as fetch_add are read-modify-write operations and they can be acquire, release or both(memory_order_acq_rel).
In the snippet below, store(B) is tagged as a release operation and load(C) is marked as a release operation. Therefore, the store(B) a inter-thread synchronises with store(C).
Following identities can be established:
(A) happens-before (B)
(B) inter-thread synchronises happens-before (C)
(C) happens-before (D)
(A) inter-thread happens-before (D) [By Transitive property]
This is similar to what we get in sequential ordering, but acquire-release is not as stringent as sequentially consistent ordering and lets compiler and processor perform any optimisations it’s allowed to.
It’s also important to keep in mind that any atomic operations after (B), atomic writes and loads on data that didn’t participate in happens-before relationship after (C) are not synchronised and are not guaranteed to be ordered.

void funcA()
{
	flag2.store(true, std::memory_order_relaxed);//(A)Thread-1
	flag1.store(true, std::memory_order_release);//(B)Thread-1
}
void funcB()
{
	while(!flag1.load(std::memory_order_acquire));//(C)Thread-2

	bool res = flag2.load(std::memory_order_relaxed);//(D)Thread-2
	assert(res);
}

Finally, memory_order_acq_rel have both acquire and release semantics. A store operation that occurs before atomic with memory_order_acq_rel can inter-thread synchronise with store. And, a subsequent load operation can inter-thread synchronise with an atomic that has memory_order_acq_rel.

Dependency Ordering

Dependency ordering allows us to establish a “dependency-ordered before” relationship to inter-thread synchronisation. This is achieved using memory_order_consume. This is a great way to ensure that any data dependent on the atomic operation is synchronised.
Following identities can be established here:
(A) happens-before (B) happens-before (C)
(D) happens-before (E) happens-before (F)
(C) depends-on (B)
(C) is dependency-ordered before and inter-thread synchronises with (D)
Therefore, (B) is dependency-ordered before and inter-thread synchronises with (D)

From these identities, it clear that value of ‘n’ will be synchronised across threads if the store operation is observed by consume. Therefore, (E) will never fire.
On the other hand, there’s no dependency relationship with the atomic operation (C) and (A). Therefore, it’s possible that (F) could trigger!

atomic<int> counter = 0;
int flag = false;
int n = 0;
void funcA()
{
    flag = true; //(A) Thread-1
    n = 10; //(B) Thread-1
	counter.store(n, std::memory_order_release);//(C)Thread-1
}
void funcB()
{
	while(!counter.load(std::memory_order_consume));//(D)Thread-2
    assert(n != 0);//(E)Thread-2
    assert(flag);//(F)Thread-2 /*This can trigger*/ 
}

Well, that’s it folks! Do leave a comment if there are any corrections, suggestions or improvements

,

Leave a Reply

Your email address will not be published. Required fields are marked *