Data structures

Axes

An axis is a mapping of labels to positions in a given dimension. It is the equivalent of the index object from pandas. xframe supports many types of labels, the most common are strings, char, integers and dates. An axis is created from a list of labels, a builder function is provided so the type of the axis can be inferred. The following example illustrates the two main ways of creating an axis:

using saxis_type = xf::xaxis<xf::fstring, std::size_t>;

saxis_type s1({ "a", "b", "d", "e" });
auto s2 = xf::axis({ "a", "b", "d", "e" });
// s1 and s2 are similar axes

It is also possible to create an axis given the size of the axis or the start, stop and step:

auto s3 = xf::axis(5);        // == xf::axis({ 0, 1, 2, 3, 4 });
auto s4 = xf::axis(2, 7);     // == xf::axis({ 2, 3, 4, 5, 6 });
auto s5 = xf::axis(0, 10, 2); // == xf::axis({ 0, 2, 4, 6, 8 });
auto s6 = xf::axis("a", "d"); // == xf::axis({ "a", "b", "c" });

The axis API is similar to the one of a constant std::map that throws an exception when asked a missing key:

std::size_t i0 = s1["a"];
try
{
    std::size_t i1 = s1["c"];
}
catch(std::exception& e)
{
    // The exception will be catch since "c" is not a label of s1
    std::cout << e.what() << std::endl;
}

xaxis also provides iterators and methods to compute the union and the intersection of axes. However a user rarely needs to manipulate the axes directly, the most common operation is to create them and then store them in a coordinate system.

Coordinates

Coordinates are mappings of dimension names to axes. xframe provides different methods to easily create them:

using coordinate_type = xf::xcoordinate<xf::fstring>;

coordinate_type c1({{"group", xf::axis({"a", "b", "d", "e"})},
                    {"city",  xf::axis({"London", "Paris", "Brussels"})}});
auto c2 = xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                          {"city",  xf::axis({"London", "Paris", "Brussels"})}});
// c1 and c2 are similar coordinates

Note

The builder function xf::coordinate converts the const char* arguments to fstring and returns a xcoordinate<fstring> object. You can modify this behavior by specifying the key type of the coordinate as the first template parameter of the coordinate function: auto c2 = xf::coordinate<std::string>({{"group", xf::axis({"a", "b", "d", "e"})}, ...});

xnamed_axis allows to store a dimension name - axis pair that you can reuse in different coordinates objects; if you want to create a coordinate object from a named axis, all the arguments must be named axes; fortunately, a xnamed_axis can be created in place, as shown below:

// This object will be used in different coordinates objects
auto a1 = xf::named_axis("igroup", xf::axis({1, 2, 4, 5})});

auto c1 = xf::coordinate<xf::fstring>(a1, xf::named_axis("city", xf::axis({"London", "Parid", "Brussels"})));
auto c2 = xf::coordinate<xf::fstring>(a1, xf::named_axis("country", xf::axis({"USA", "Japan"})));

As you can notice, coordinates objects can store axes with different label types. By default, these types are int, std::size_t, char and xf::fstring, you can specify a different type list:

using coordinate_type = xf::xcoordinate<xf::fstring, xtl::mpl::vector<int, std::string>>;

coordinate_type c({{"group", xf::axis({"a", "b", "d", "e"})},
                   {"city",  xf::axis({"London", "Paris", "Brussels"})}});

Dimension

A dimension object is the mapping of the dimension names to the dimension positions in the data tensor. Creating a xdimension is as simple as creating an xcoordinate or an xaxis:

using dimension_type = xf::xdimension<xf::fstring>;

dimension_type dim1({"city", "group"});
auto dim2 = xf::dimension({"city", "group"});
// dim1 and dim2 are similar dimensions

xdimension provides an API similar to xaxis and therefore can be considered as a special axis. Together a dimension object and a coordinate object form a coordinate system which maps labels and dimension names to indexes in the data tensor.

Note

Like xf::coordinate, the builder function xf::dimension converts the const char* arguments to fstring and returns a xdimension<fstring> object. You can modify this behavior by specifying the label type of the dimension as the first template parameter of the dimension function: auto d = xf::dimension<std::string>({"city", "group"});

Variables

A variable is a data tensor with a coordinate system, that is an xcoordinate object and an xdimension object. It is the C++ equivalent of the xarray.DataArray Python class. xvariable provides many constructors:

using coordinate_type = xf::xcoordinate<xf::fstring>;
using dimension_type = xf::xdimension<xf::fstring>;
using variable_type = xvariable<double, coordinate_type>;

data_type d = xt::eval(xt::random::rand({3, 4}));
auto c = xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                         {"city",  xf::axis({"London", "Paris", "Brussels"})}});
auto dim = xf::dimension({"city", "group"});

variable_type v1(d, c, dim);

// Coordinates and dimension can be built in place
variable_type v2(d, xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                                    {"city",  xf::axis({"London", "Paris", "Brussels"})}}),
                    xf::dimension({"city", "group"}));

The data parameter can be omitted, in that case the variable creates an uninitialized data tensor:

variable_type v3(c, dim);

variable_type v4(xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                                 {"city",  xf::axis({"London", "Paris", "Brussels"})}}),
                 xf::dimension({"city", "group"}));

A variable can also be created from a map of axes and a list of dimension names:

variable_type::coordinate_map coord_map;
coord_map["group"] = xf::axis({"a", "b", "d", "e"});
coord_map["city"]  = xf::axis({"London", "Paris", "Brussels"});
dimension_type::label_list dim_list = {"group", "city"};

variable_type v5(d, coord_map, dim_list);
variable_type v6(coord_map, dim_list);

If the dimension object is omitted, the dimension mapping is inferred from the coordinate object. In the code below, the mapping is different from the previous defined variables, group is the name of the first dimension and city is the name of the second one:

variable_type v7(d, {{"group", xf::axis({"a", "b", "d", "e"})},
                     {"city",  xf::axis({"london", "Paris", "Brussels"})}});

// variable with same coordinate system but uninitialized data
variable_type v8({{"group", xf::axis({"a", "b", "d", "e"})},
                  {"city",  xf::axis({"london", "Paris", "Brussels"})}});

xframe also provides builder functions, so that the type of the variable can be inferred:

auto v10 = variable(d, c, dim);
auto v11 = variable(d, xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                                       {"city",  xf::axis({"London", "Paris", "Brussels"})}}),
                       xf::dimension({"city", "group"}));

auto v12 = variable(c, dim);
auto v13 = variable(xf::coordinate({{"group", xf::axis({"a", "b", "d", "e"})},
                                    {"city",  xf::axis({"London", "Paris", "Brussels"})}}),
                    xf::dimension({"city", "group"}));

Summary

xframe summay