Implementing an heterogeneous dynamic array in C++

│ Blog

Introduction

heterogeneous: consisting of dissimilar or diverse constituents
dynamic array: a dynamic array is a random access, variable-size list data structure that allows elements to be added or removed. […] A simple dynamic array can be constructed by allocating an array of fixed-size, typically larger than the number of elements immediately required. The elements of the dynamic array are stored contiguously at the start of the underlying array, […]

Heterogeneous containers are data structures wherein elements can be of different types. struct and std::tuple are example of heterogeneous containers in C++. In contrast, arrays provided by the language or the standard library are homogeneous containers, their elements can only be of the same type. std::array and std::vector are examples of static array and dynamic array implementations, respectively.

In this post, I will outline the implementation of an heterogeneous dynamic array in C++, which satisfies the API below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
struct A { int16_t a0,a1,a2;};
struct B { int32_t b0;};
struct C { int64_t c0;};
HeterogeneousArray ha;
ha.insert(A{0,1,2},B{404});
ha.insert(C{-3});
auto& b = ha.get<B>();
auto& a = ha.get<A>();
auto& c = ha.get<C>();
b.b0+=42;
a.a0+=(a.a2+c.c0);
assert(ha.get<B>().b0 == 446);
assert(ha.get<A>().a0 == -1);

and resulting in the following memory layout:

Motivation

There are regularly questions or posts popping-up about the existence of an heterogenous container in C++.

Answers are often centered around the use of vector<any> or vector<variant>. However, they each have shortcomings:

With vector<any> solution, any introduces an indirection for each inserted object (vector→any→type) and inserted objects are not stored contiguously, barring special cases (custom allocator).
With vector<variant> solution, variant requires the user to provide a list of types known at compile-time. Moreover, memory is inherently wasted because the size of a variant is the size of the largest possible type it can hold, plus some extra for the type identifier.

In “A true heterogeneous container in C++”, the container relies on a static member variable to store/retrieve the inserted/requested objects, using the memory address of the container itself as a key in a map:

1
2
3
4
5
struct andyg_hc {
	template<class T>
	static std::unordered_map<const andyg_hc*, std::vector<T>> items;
    /* Etc */
};

The API is pleasant, but only instances of a same type are stored contiguously, which favors fast per-type linear iteration. As noted by its author, it has some drawbacks making it ill-suited beyond an experimental usage.

The layout provided by these solutions is

I was left bugged by these solutions, so I decided to explore the feasibility of having a class that has the semantics of an heterogeneous vector/a structure populated at runtime. Basically, a contiguous piece of memory (one indirection) where any object can be inserted one after another. I used this exercise as a good opportunity to experiment with C++17 meta-programming features and low-level memory manipulation. Beware, knowledge of variadic templates and fold-expressions is highly recommended in the following. Let’s now see that together !

Principle

For a bit, there were only bytes

We will have to manipulate and manage directly the memory in order to perform our goal. Since we want the least amount of indirection while remaining extensible, the best is to store everything inside an array of bytes on the heap. This member will hold the actual instances of our objects.

1
2
using allocator = /*for later*/;
std::vector<std::byte,allocator> data;

Know your style

With this single member, only the caller who inserted the object would know what is where. Type information is lost and our heterogenous array is far from being self-contained. Indeed, there would be no way to access its content solely with a reference to the container. Thus, we add a member that keeps track of what is where.

1
2
3
using typeid_t = std::uint32_t;
using offset_t = std::uint32_t;
std::unordered_map<typeid_t,offset_t> type_to_offset;

Values represented by typeid_t uniquely identify a type at runtime and those represented by offset_t give the memory offset where the instance of a type is located in data.

Since C++14, it is possible to have a unique value assigned to a type easily thanks to a variable template. The implementation holds in a few lines of code:

1
2
3
4
5
6
7
class TypeCounter {
    static inline typeid_t i = 0;
public:
    template<typename T>
    static inline const auto id = i++;
};
template<typename T> inline auto type_id() { return TypeCounter::id<T>; }

In order to get, you must give

We now have the minimum necessary members to implement insertion into the container. It consists of allocating needed memory first, then constructing the object within that space and finally doing some bookkeeping (record what was stored where).

1
2
3
4
5
6
7
template<typename... Types>
auto insert(Types&&... types) {
    const auto offsets = allocate<Types...>();           //allocate
    {size_t i = 0; (construct(offsets[i++],types), ...);}//construct
    {size_t i = 0; (typeid_to_offset.emplace(type_id<Types>(),offsets[i++]), ...);}//bookkeeping
    return offsets;
}

Allocation is the most involved task. It must compute the new size needed to hold the desired content, resize accordingly, while computing the position where each object should be constructed. The easy but wrong solution would be to resize data solely with the additional storage needed by each inserted type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// ❌ Don't do this ❌
template<typename... Ts>
auto allocate() {
    constexpr auto N = sizeof...(Ts);
    constexpr auto sizes = array<size_t, N>{ sizeof(Ts)... };

    std::array<offset_t, N> output;
    output[0] = data.size();
    for(int i=1; i<N; ++i) {
        output[i] = output[i-1] + sizes[i-1]; 
    }
    constexpr size_t to_add = (sizeof(Ts)+...+0);
    data.resize(data.size() + to_add);
    return output;
}

Repeat after me: padding is a thing, padding is a thing, padding is a thing. It also works with alignment my torment 🎵 depending on your history with the topic. In brief, objects cannot reside wherever they want in memory, and their memory addresses are restricted to be dividable by some values (a power of 2). For example, std::uint16_t has a size and alignment of 2 bytes, then the least significant bit in its memory address will always be 0 (i.e. memory address is an even number).

In general, the alignment and padding within user-defined structures are automatically set by the compiler. It ensures that each structure members will be properly aligned in regular scenarios. However, we cannot rely here on the compiler to do it because objects are constructed in a piece of memory only appearing as an array of bytes. Hence, we have to compute the padding for each object ourselves and take them into account when resizing the container.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
template<typename... Types>
auto allocate() {
    using namespace std;
    constexpr size_t N = sizeof...(Types);
    constexpr array<size_t, N> alignments = { alignof(Types)... };
    constexpr array<size_t, N> sizes = { sizeof(Types)... };
    
    array<offset_t, N> output;
    
    const size_t size_before = data.size();
    uintptr_t ptr_end = uintptr_t(data.data() + size_before);
    size_t to_allocate = 0;
    for(int i=0; i<N; ++i) 
    {
        const size_t padding = ((~ptr_end + 1) & (alignments[i] - 1));
        output[i] = size_before + to_allocate + padding;
        to_allocate += padding + sizes[i];
        ptr_end += padding + sizes[i];
    }
    data.resize(size_before + to_allocate);
    return output;
}

You may raise a serious warning here: “ok, newly inserted objects are properly padded with respect to current end address (ptr_end), but resizing the vector may provoke a reallocation, thus destroying the memory alignment since we have no guarantee that the vector would put everything at a same memory ending!” *catch up breath* Fine observation, you are right !

Obtaining properly aligned memory manually could be achieved, accepting a waste of alignment value bytes. New facilities were added in C++17 to deal with alignment in dynamic memory allocation. For example, the alignment value can be explicitly provided during dynamic memory allocation via additional overload to new. Alas, this is not yet (ever?) portable due to how each OS deals with aligned memory allocation. To keep it simple, we make use of the Boost.Align library, which provides an allocator that can force our array of bytes to always be aligned on a specific value.

1
2
constexpr auto max_alignment = 64UL;
using allocator = boost::aligned_allocator<std::byte,max_alignment>;

Consequently, every type whose alignment is lower or equal than max_alignment is guaranteed to be properly aligned within the buffer, even after memory relocation. For security, we could add the following check to guarantee no type inserted in the container has a higher alignment than the arbitrary chosen one.

1
static_assert(((alignof(T) <= max_alignment) && ...));

Now that allocation is finally correctly done, we can construct our instance within the resized data. For this purpose, in-place constructor suffices. Phew !

1
2
3
4
5
template<typename Type>
void construct(offset_t n, Type&& t) {
    using U = std::remove_reference_t<Type>;
    new(&data[n]) U{ std::forward<Type>(t) };
}

You shall have the aptitude to get

The procedure to retrieve a reference from the container is conceptually straightforward. We lookup where the type is stored and cast the bytes starting from that address as a reference to the desired type, which we know is there. This implies the use of reinterpret_cast. Unfortunately, C++ is quite unforgiving when one attempts to mess the type system. If the reference is a structure/class which has some members, any end-user that attempts to access a member would trigger Undefined Behavior, which conveniently rhymes with Unexpected Bug.

From reinterpret_cast
Performing a class member access that designates a non-static data member or a non-static member function on a glvalue that does not actually designate an object of the appropriate type - such as one obtained through a reinterpret_cast - results in undefined behavior:

A way to get around this safely was introduced in C++17 with std::launder. One of the example given on cppreference fits exactly our situation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct Y { int z; };
int main()
{
  alignas(Y) std::byte s[sizeof(Y)];
  Y* q = new(&s) Y{2};
  const int f = reinterpret_cast<Y*>(&s)->z; // Class member access is undefined behavior:
                                             // reinterpret_cast<Y*>(&s) has value "pointer to s"
                                             // and does not point to a Y object 
  const int g = std::launder(reinterpret_cast<Y*>(&s))->z; // OK
}

Finally, the method for accessing the content of the container is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
template<typename... T>
auto get() -> decltype(auto) {
    if constexpr(sizeof...(T)==1)
        return (get<T>(type_to_offset.at(type_id<T>())),...);
    else
        return std::tuple<T&...>{ get<T>(type_to_offset.at(type_id<T>()))... };	
}
template<typename T, typename U = std::remove_reference_t<T>>
U& get(offset_t n) {
    return *std::launder(reinterpret_cast<U*>(&data[n]));
}

Neither auto& or auto&& would be warning/error-free because the method either returns a reference (T&) or a value (std::tuple<T&...>). As a result, the more obscure decltype(auto) is necessary, which infers the proper return type.

No leeks in my cup of tea

For a basic implementation to be minimally correct, we must properly resolve the destruction of the container. If it were left with a default destructor, any object in the container that allocated on the heap would leak its memory. Once again, this is the result of our manual management of the memory. Since the compiler only sees an array of bytes and not the underlying types, it will not call the destructors of inserted objects. As a result, we must deal with them explicitly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
using destructor_t = void(*)(HeterogeneousArray&);
std::unordered_map<type_id_t, destructor_t> destructors;

~HeterogeneousArray() {
    for (auto& [tid, destructor] : destructors)
        destructor(*this);
}
template<typename T>
void record_destructor() {
    destructors.emplace(type_id<T>(), +[](HeterogeneousArray& c) { std::destroy_at(&c.get<T>()); });
}
// Update
auto insert(Types&&... types) {
    const auto offsets = allocate<Types...>();
    {size_t i = 0; (construct(offsets[i++],types), ...);}
    {size_t i = 0; (type_to_offset.emplace(type_id<Types>(),offsets[i++]), ...);}
    (record_destructor<Types>(), ...);//🆕
    return offsets;
}

EDIT: UB strikes back

Performing a simple resize of data during allocation, which could trigger allocation of a new buffer, is actually Undefined Behavior because no object is actually “exists” in the new memory location. On Windows, it works without flinching for any types and will act as if the object was in that new memory location, but not on Linux, where storing for example std::string can send the program to oblivion. One possibility chosen here is to restrict the usage to types that are not breaking, e.g. std::vectorof any type is ok. Otherwise, memory allocation must be modified so that move constructor is called for each already present type, which means it must also be stored somewhere:

1
2
3
4
//data.resize(size_after); // Nop
vector_bytes tmp(size_after);
move_into(tmp);//unspecified method explicitly moving existing objects from data into new buffer
data = std::exchange(tmp, {});

Conclusion

A minimal working implementation of an heterogeneous dynamic array has been described and it holds in less than a 100 lines of code ! 😮 See for yourself on Compiler Explorer or at the end of this page. The main restriction of the current implementation is that it only accepts a single instance of each type. Furthermore, several features could be added on top of it to make it more efficient and user-friendly:

Copy/move construction/assignment of the container
Insertion that does not require the type to be move/copy constructible, à la emplace
Reduce memory waste when inserting several types in a single call
Avoid storing empty types
Avoid storing trivial destructors
Provide additional methods for retrieving offsets type identifier, or checking presence

In a follow-up post, the performance of the implementation will be compared to some alternatives.

Acknowledgment

Thanks Rasti for the diligent review !

Code

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
// MIT License
//
// Copyright (c) 2020 Fabien Péan
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include<array>
#include<cassert>
#include<cstdint>
#include<algorithm>
#include<numeric>
#include<vector>
#include<iostream>
#include<unordered_map>
#include<boost/align/aligned_allocator.hpp>

using typeid_t = std::uint32_t;

class TypeCounter {
    static inline typeid_t i = 0;
    public:
    template<typename T>
    static inline const auto id = i++;
};
template<typename T> inline auto type_id() { return TypeCounter::id<T>; } 

struct HeterogeneousArray {
    template<typename T, size_t N>
    using aligned_allocator = boost::alignment::aligned_allocator<T, N>;
    constexpr static size_t max_alignment = 64; 
    std::vector<std::byte, aligned_allocator<std::byte, max_alignment>> data;
    
    using offset_t = std::uint32_t;
    std::unordered_map<typeid_t,offset_t> type_to_offset;

    using destructor_t = void(*)(HeterogeneousArray&);
    std::unordered_map<typeid_t, destructor_t> destructors;

    ~HeterogeneousArray() {
        for (auto& [tid, destructor] : destructors)
            destructor(*this);
    }

    template<typename T>
    void record_destructor() {
            destructors.emplace(type_id<T>(), +[](HeterogeneousArray& c) { std::destroy_at(&c.get<T>()); });
    }

    template<typename... T>
    auto insert(T&&... types) {
        const auto offsets = allocate<T...>();           //allocate
        {size_t i = 0; (construct(offsets[i++],types), ...);}//construct
        {size_t i = 0; (type_to_offset.emplace(type_id<T>(),offsets[i++]), ...);}//bookkeeping
        (record_destructor<T>(), ...);//bookkeeping
        return offsets;
    }

    template<typename... T>
    auto allocate() -> std::array<offset_t, sizeof...(T)> {
        static_assert(((alignof(T) <= max_alignment) && ...));
        using namespace std;

        constexpr size_t N = sizeof...(T);
        constexpr array<size_t, N> alignments = { alignof(T)... };
        constexpr array<size_t, N> sizes = { sizeof(T)... };

        array<offset_t, N> output;

        const size_t size_before = data.size();
        uintptr_t ptr_end = uintptr_t(data.data() + size_before);
        size_t to_allocate = 0;
        for(int i=0; i<N; ++i) 
        {
            size_t padding = ((~ptr_end + 1) & (alignments[i] - 1));
            output[i] = size_before + to_allocate + padding;
            to_allocate += padding + sizes[i];
            ptr_end += padding + sizes[i];
        }
        size_t size_after = size_before + to_allocate;
        data.resize(size_after);// this works only on Windows
        return output;
    }

    template<typename Type>
    void construct(offset_t n, Type&& t) {
        using U = std::remove_reference_t<Type>;
        new(&data[n]) U{ std::forward<Type>(t) };
    }

    template<typename... T>
    auto get() -> decltype(auto) {
        if constexpr(sizeof...(T)==1)
            return (get<T>(type_to_offset.at(type_id<T>())),...);
        else
            return std::tuple<T&...>{ get<T>(type_to_offset.at(type_id<T>()))... };
    }

    template<typename T, typename U = std::remove_reference_t<T>>
    U& get(offset_t n) {
        return *std::launder(reinterpret_cast<U*>(&data[n]));
    }

};

struct A {int a[2]; std::vector<double> b;};
int main() {
    HeterogeneousArray array;
    array.insert(int{42},std::vector<std::string>{"test"},std::tuple<int>{42});
    auto ids = array.insert(double{3.14}, A{{42,404},{1.67}});
    auto&& v = array.get<double>();
    static_assert(std::is_same_v<decltype(v),double&>);
    assert(v == 3.14);
    v = 2.;
    assert(array.get<int>() == 42);
    assert(array.get<std::vector<std::string>>()[0] == "test");
    auto&&[a,b] = array.get<A,double>();
    static_assert(std::is_same_v<decltype(a),A&>);
    static_assert(std::is_same_v<decltype(b),double&>);
    assert(a.a[0] == 42);
    assert(a.a[1] == 404);
    assert(a.b[0] == 1.67);
    assert(b == 2.);
    return 0;
}