Using C++17 to Create Composable, Recursive Data Types

This article presents a simple way to cleanly define nested, multi-branch type hierarchies using C++17. An endeavour which was much messier in older versions of C++. This can be altered and modified to generated pure schemas, new data types, and anything tree based. It leverages std::variant to cleanly express a cohesive approach to designing easily composed data structures.

The final result will allow us to easily define N-Arity tree instances with clear, nested object instance declaration. For example:

auto root =
    Node("addUserRequest",
        Sequence({
            Node("userId", 123),
            Node("name", "Charles"),
            Node("age", 424),
            Node("sessionInfo",
                Sequence({
                    Node("signOnId", "f1f133112"),
                    Node("bannerId", Null())}))}));

Declaring Our Building Blocks

In order to get started, we need to declare the fundamental types that will allows to construct this tree system. These will include our ‘simple’, scalar data types which hold concrete data on each node: int, string, null. As well as our ‘complex’, composite data types: an ordered sequence and a variant representing the alternative data types each node can hold.

// Forward Declaration of Tree Node
struct Node;

// Simple 
using Int = int;
using String = std::string;
using Null = std::monostate;

// Composite
using Sequence = std::vector<Node>;
using Data = std::variant<Int, String, Null, Sequence>;

With these type declarations out of the way, our data hierarchy is starting to take shape. Some explanations:

  • We forward declare our Node class, so that the Sequence type can be aware of it.
  • We create aliases for Int, String and Null.
  • We declare an alias for the Sequence type: a vector of Nodes.
  • Finally, we declare the Data class. Each Node element will use a data member to hold either an Int, String, Null, or a Sequence of other Nodes.

The Node Class

Now let’s define the Node data type:

struct Node {
    std::string m_name;
    Data m_data;
    explicit Node(std::string&& name, Data&& data)
        : m_name(std::move(name)), m_data(std::move(data)) {}
};

The above definition has two member variables:

  • m_name: A string used to name the node’s ‘field’.
  • m_data: The data the node is holding. Defined as a variant above.

It also has a single explicit constructor which takes the node name and the data it holds as R-value references. Which means data will moved from the incoming instances if possible. If you plan to use this with variables or make copies, more constructors will have to be defined.

Surprisingly, that’s it! Really. We can already build trees using the syntax demonstrated at the top of the article. Allowing us to develop nested hierarchies dynamically (but with type safety). Notice how the variant constructor accepts each type as needed to initialize the data member.

auto root =
    Node("data", Sequence({
        Node("users", Sequence({
            Node("david", 12322),
            Node("charles", 2322),
            Node("rebecca", 998)})),
        Node("citySize", Sequence({
            Node("nyc", 8),
            Node("toronto", 4)}))}));

Traversing The Structure

Of course, with no way to traverse the structure, it’s useless. Let’s define output stream operators so that we can ‘serialize’ the data structure to stdout.

First let’s do an overload for the Null type:

template<typename Stream>
Stream& operator<<(Stream& stream, const Null& null) {
    stream << "null";
    return stream;
}

Very simple. Next let’s overload it for the Data (variant) type:

template<typename Stream>
Stream& operator<<(Stream& stream, const Data& data) {
    std::visit([&](auto& val) {
        stream << val;
    }, data);
    return stream;
}

Here, we are using std::visit to access the variant as it’s current type. Using an auto template lambda keeps things concise. Inside, we simply apply the stream out operator to whatever is in the variant at the time (Int, Null, etc..).

Next, we can overload the stream operator for the Node type. Again, this definition is simple:

template<typename Stream>
Stream& operator<<(Stream& stream, const Node& node) {
    std::cout
        << "Node{ name='" << node.m_name
        << "' data=" << node.m_data << " }";
    return stream;
}

So far, so good! However, we are missing the crucial overload for Sequence (std::vector). Why is this so important? It is what enables the actual traversing to deeper levels. Look carefully:

template<typename Stream>
Stream& operator<<(Stream& stream, const Sequence& data) {
    std::cout << "Sequence[ ";
    for(const auto& d : data) {
        std::cout << d << ' ';
    }
    std::cout << ']';
    return stream;
}

For each child Node in the sequence we use the stream out operator. And this prints out their value. What if a child node is a sequence? Well then the same operator will be called recursively as needed. Viola! We can serialize our data:

auto root2 =
    Node("test",
        Sequence({
            Node("name", "Herbert"),
            Node("age", 55)}));

// prints out:
// Node{ name='test' data=Sequence[ Node{ name='name' data=Herbert } Node{ name='age' data=55 } ] }
std::cout << root2 << std::endl;

Programmatically Generating Structure Segments

We have a pretty usable and extensible system already. However, what if we want to transform a vector of custom structs into a canonical Sequence in our simple tree data system? All we need to do is write a utility function that iterates over the vector to produce a sequence:

template<typename T, typename Func>
Sequence vecToSeq(const std::vector<T>& vec, Func func) {
    Sequence result;
    for(const auto& v : vec) {
        result.push_back(func(v));
    }
    return result;
}

Now, we can use this function with a custom function argument to define the structure that will be appended for each data member within the vector:

// Define custom type
struct Custom {
    int id;
    std::string name;
};

// Create vector
std::vector<Custom> vec = {
    {12, "Johnny"}, { 344, "Filber"}, {999, "Jennifer"}
};

// Use lambda to process each item in vector and output reflecting Node data structure
auto root3 =
    Node("data", Sequence({
        Node("customThings", vecToSeq(vec, [](const Custom& c) {
            return Node("custom", Sequence({
                Node("id", c.id),
                Node("name", c.name)})); })),
        Node("requestId", 232324)}));

This idea can be expanded to create toTree() functions for different types. Perhaps as an exercise, one could generalize the toTree() function for all types to allow automatic conversion without direct function calls.

Conclusion

As you can see, Modern C++ really improves the ability to work with composable data types using utility types such as std::variant. The example presented in this post, provides a simple blueprint for more specific systems. It can extended and modified to support different schema definitions, serialization, and validation. Maybe you want just types with no names? Or a schema with no concrete data? Hopefully this serves a good starting point to direct your thinking. If the curly braces bother you, you play around with template parameter packs. And if you can get by with exclusively compile time generation, maybe you can experiment with using tuples to store the sequences. Have fun!

Link to Complete Code Example

Related post