Schema Support

Data model schemas not only allow us to generate documentation, but also check automatically whether xarray.DataArray and xarray.Dataset objects conform to the xradio schemas (see e.g. xradio.measurement_set.schema.VisibilityXds).

Checking

class SchemaIssue(path: List[Tuple[str, str]], message: str, found: Any | None = None, expected: List[Any] | None = None)[source]

Representation of an issue found in a schema check

As schemas can be quite big, path can be used to precisely locate the source of the issue.

expected: List[Any] | None = None

List of expected values. Can be any type (type, dtype, value)

found: Any | None = None

What was found. Can be any type (type, dtype, value)

message: str

Explanation of the issue

path: List[Tuple[str, str]]

Path to offending data item, using pairs of (entity type, entity name). Entity types can be data_var, coord or attr.

Example: [('data_var', 'foo'), ('coord','bar'), ('attr', 'asd')] refers to obj.data_vars['foo'].cords['bar'].attrs['asd']

exception SchemaIssues(issues=None)[source]

List of issues found in a schema check

Can be thrown as an exception, so we can report on multiple schema issues in one go.

expect(elem: str | None = None, ix: str | None = None)[source]

Raises this object if issues were found

Parameters:
  • elem – If given, will be added to path

  • ix – If given, will be added to path

Raises:

SchemaIssues

issues: list[SchemaIssue]

List of issues found

check_array(array: DataArray, schema: type | ArraySchema) SchemaIssues[source]

Check whether an xarray DataArray conforms to a schema

Parameters:
  • array – DataArray to check

  • schema – Schema to check against

Returns:

SchemaIssues found

check_attributes(attrs: Dict[str, Any], attrs_schema: List[AttrSchemaRef], attr_kind: str = 'attrs') SchemaIssues[source]

Check whether an attribute set conforms to a schema

Parameters:
  • attrs – Dictionary of attributes

  • attrs_schema – Expected schemas

Returns:

SchemaIssues found

check_data_vars(data_vars: Dict[str, DataArray], data_vars_schema: List[ArraySchemaRef], data_var_kind: str) SchemaIssues[source]

Check whether a data variable set conforms to a schema

As data variables are data arrays, this will recurse into checking the array schemas

Parameters:
  • data_vars – Dictionary(-like) of data_varinates

  • data_vars_schema – Expected schemas

  • datavar_kind – Either ‘coords’ or ‘data_vars’

Returns:

SchemaIssues found

check_dataset(dataset: Dataset, schema: type | DatasetSchema, allow_superflous_dims: Set[str] = frozenset({})) SchemaIssues[source]

Check whether an xarray DataArray conforms to a schema

Parameters:
  • array – DataArray to check

  • schema – Schema to check against

Returns:

SchemaIssues found

check_datatree(datatree: DataTree)[source]

Check datatree for schema conformance

This is the case if each contained Dataset conforms to a schema registed with Xradio. This works by looking for a type attribute in the Dataset, which must have a typing.Literal type annotation specifying the name of the dataset schema.

Parameters:

datatree – Data to check for schema conformance

check_dict(dct: dict, schema: type | DictSchema) SchemaIssues[source]

Check whether a dictionary conforms to a schema

Parameters:
  • dct – Dictionary to check

  • schema – Dictionary schema to check against

Returns:

SchemaIssues found

check_dimensions(dims: [<class 'str'>], expected: [[<class 'str'>]], check_order: bool = True, allow_superflous: ~typing.Set[str] = frozenset({})) SchemaIssues[source]

Check whether a dimension list conforms to a schema

Parameters:
  • array – Dimension list to check

  • schema – Expected possibilities for dimension list

  • check_order – Whether to check order of dimensions

Returns:

SchemaIssues found

check_dtype(dtype: ~numpy.dtype, expected: [<class 'numpy.dtype'>]) SchemaIssues[source]

Check whether a numpy dtype conforms to a schema

Parameters:
  • dtype – Numeric type to check

  • schema – Expected possibilities for dtype

Returns:

SchemaIssues found

register_dataset_type(schema: DatasetSchema)[source]

Registers the given schema for usage with check_datatree()

This looks for a type attribute in the dataset schema, which must have a typing.Literal type annotation specifying the type name of the dataset

Parameters:

schema – Schema to register

schema_checked(fn, check_parameters: bool = True, check_return: bool = True)[source]

Function decorator to check parameters and return value for schema conformance

Parameters:
  • fn – Function to decorate

  • check_parameters – Whether to check parameters. Can also pass an iterable with parameters to check

  • check_return – Whether to check return value

Returns:

Decorated function

Decorators

Class decorators to generate schemas from suitably annotated Python class definition. This approach was essentially copied from https://pypi.org/project/xarray-dataclasses/, though our implementation differs in a number of critical ways:

  • We use custom decorators on the classes instead of base classes. This especially overrides the existing constructor, which makes it easier to directly construct instances and allows for extra data variables and attributes.

  • We support multiple options for types and dimensions

  • We convert the schema definition into our own meta-model, which facilitates generating documentation generation using Sphinx

dict_schema(cls)[source]

Decorator for classes representing dict schemas, along the lines of xarray_dataarray_schema() and xarray_dataset_schema().

The annotated class can contain fields with arbitrary annotations, similar to a dataclass. They can be used with check_dict() for checking dictionieries against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirming xarray.Dataset objects.

xarray_dataarray_schema(cls)[source]

Decorator for classes representing xarray.DataArray schemas. The annotated class should exactly contain:

  • one field called “data” annotated with Data to indicate the array type

  • fields annotated with Coord to indicate mappings of dimensions to coordinates (coordinates directly associated with dimensions should have the same name as the dimension)

  • fields annotated with Attr to declare attributes

Decorated schema classes can be used with check_array() for checking xarray.DataArray objects against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirming xarray.DataArray objects.

For example:

from xradio.schema import xarray_dataarray_schema
from xradio.schema.typing import Data, Coord, Attr
from typing import Optional, Literal
import dataclasses

Coo = Literal["coo"]

@xarray_dataarray_schema
class TestArray:
    data: Data[Coo, complex]
    coo: Coord[Coo, float]
    attr1: Attr[str]
    attr2: Attr[int] = 123
    attr3: Optional[Attr[int]] = None

This data class represents a one-dimensional xarray.DataArray with complex data, a float coordinate and three attributes. Instances of this class cannot actually be constructed, instead you will get an appropriate xarray.DataArray object:

>>> TestArray(data=[1,2,3], attr1="foo")
<xarray.DataArray (coo: 3)>
array([1.+0.j, 2.+0.j, 3.+0.j])
Coordinates:
  * coo      (coo) float64 0.0 1.0 2.0
Attributes:
    attr1:    foo
    attr2:    123

Note that:

  • The constructor uses the annotations to identify the role of every parameter

  • The data was automatically converted into a numpy.ndarray

  • As there was no coordinate given, it was automatically filled with an enumeration of the type specified in the annotation

  • Default attribute values were assigned. A value of None is interpreted as the value attribute being missing.

  • For the returned DataArray object data, coo, attr1 and attr2 can be accessed as if they were members. This works as long as the names don’t collide with DataArray members.

Positional parameters are also supported, and coords and attrs passed as keyword arguments can supply additional coordinates and attributes:

>>> TestArray([1,2,3], [3,4,5], 'bar', coords={'coo_new': ('coo', [3,2,1])}, attrs={'xattr': 'baz'})
<xarray.DataArray (coo: 3)>
array([1.+0.j, 2.+0.j, 3.+0.j])
Coordinates:
    coo_new  (coo) int64 3 2 1
  * coo      (coo) float64 3.0 4.0 5.0
Attributes:
    xattr:    baz
    attr1:    bar
    attr2:    123
xarray_dataset_schema(cls)[source]

Decorator for classes representing xarray.Dataset schemas. The annotated class should exactly contain:

  • fields annotated with Coord to indicate mappings of dimensions to coordinates (coordinates directly associated with dimensions should have the same name as the dimension)

  • fields annotated with Data to indicate data variables

  • fields annotated with Attr to declare attributes

Decorated schema classes can be used with check_dataset() for checking xarray.Dataset objects against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirming xarray.Dataset objects.

Annotations

Typing support for xarray data classes

This has been extracted from the xarray-dataclasses package by astropenguin (see https://github.com/astropenguin/xarray-dataclasses/). The reason we replicate this here is because we actually ignore / redo everything but the type annotations, especially adding xradio-specific support for multiple options in data variable / coordinate dimensionality and dtype.

Attr

Type hint for attribute fields (Attr[T]).

Example

@dataclass
class Image():
    data: Data[tuple[X, Y], float]
    long_name: Attr[str] = "luminance"
    units: Attr[str] = "cd / m^2"

Hint

The following field names are specially treated when plotting.

  • long_name or standard_name: Coordinate name.

  • units: Coordinate units.

Reference:

https://xarray.pydata.org/en/stable/user-guide/plotting.html

alias of T[T]

Coord

Type hint for coordinate fields (Coord[TDims, TDType]).

Example

@dataclass
class Image():
    data: Data[tuple[X, Y], float]
    mask: Coord[tuple[X, Y], bool]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0

Hint

A coordinate field whose name is the same as TDims (e.g. x: Coord[X, int]) can define a dimension.

alias of Union[Labeled[TDims], Collection[TDType], TDType][Union[Labeled[TDims], Collection[TDType], TDType]]

Coordof

Type hint for coordinate fields (Coordof[TDataClass]).

Unlike Coord, it specifies a dataclass that defines a DataArray class. This is useful when users want to add metadata to dimensions for plotting.

Example

@dataclass
class XAxis:
    data: Data[X, int]
    long_name: Attr[str] = "x axis"


@dataclass
class YAxis:
    data: Data[Y, int]
    long_name: Attr[str] = "y axis"


@dataclass
class Image():
    data: Data[tuple[X, Y], float]
    x: Coordof[XAxis] = 0
    y: Coordof[YAxis] = 0

alias of Union[TDataClass, Any][Union[TDataClass, Any]]

Data

Type hint for data fields (Coordof[TDims, TDType]).

Example

Exactly one data field is allowed in a DataArray class (the second and subsequent data fields are just ignored):

@dataclass
class Image():
    data: Data[tuple[X, Y], float]

Multiple data fields are allowed in a Dataset class:

@dataclass
class ColorImage():
    red: Data[tuple[X, Y], float]
    green: Data[tuple[X, Y], float]
    blue: Data[tuple[X, Y], float]

alias of Union[Labeled[TDims], Collection[TDType], TDType][Union[Labeled[TDims], Collection[TDType], TDType]]

class DataClass(*args: ~typing.~PInit, **kwargs: ~typing.~PInit)[source]

Type hint for dataclass objects.

Dataof

Type hint for data fields (Coordof[TDataClass]).

Unlike Data, it specifies a dataclass that defines a DataArray class. This is useful when users want to reuse a dataclass in a Dataset class.

Example

@dataclass
class Image:
    data: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0


@dataclass
class ColorImage():
    red: Dataof[Image]
    green: Dataof[Image]
    blue: Dataof[Image]

alias of Union[TDataClass, Any][Union[TDataClass, Any]]

class Labeled[source]

Type hint for labeled objects.

Name

Type hint for name fields (Name[THashable]).

Example

@dataclass
class Image():
    data: Data[tuple[X, Y], float]
    name: Name[str] = "image"

alias of THashable[THashable]

class Role(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Annotations for typing dataclass fields.

ATTR = 'attr'

Annotation for attribute fields.

COORD = 'coord'

Annotation for coordinate fields.

DATA = 'data'

Annotation for data (variable) fields.

NAME = 'name'

Annotation for name fields.

OTHER = 'other'

Annotation for other fields.

classmethod annotates(tp: Any) bool[source]

Check if any role annotates a type hint.

deannotate(tp: Any) Any[source]

Recursively remove annotations in a type hint.

find_annotated(tp: Any) Iterable[Any][source]

Generate all annotated types in a type hint.

get_annotated(tp: Any) Any[source]

Extract the first role-annotated type.

get_annotations(tp: Any) Tuple[Any, ...][source]

Extract annotations of the first role-annotated type.

get_dataclass(tp: Any) Type[DataClass[Any]][source]

Extract a dataclass.

get_dims(tp: Any) List[Tuple[str, ...]][source]

Extract data dimensions (dims).

get_name(tp: Any, default: Hashable = None) Hashable[source]

Extract a name if found or return given default.

get_role(tp: Any, default: Role = Role.OTHER) Role[source]

Extract a role if found or return given default.

get_types(tp: Any) List[dtype[Any]][source]

Extract data types from type annotation

E.g. Coord[…, Type1 | Type2 | …] or Data[…, Type1 | Type2 | …]

is_optional(type_ann)[source]

Check whether a type annotation indicates that the value is optional

Boils down to checking whether it’s a union type that includes None

Data Model

Data classes used by Xradio routines to represent dataset schemas.

class ArraySchema(schema_name: str, dimensions: List[List[str]], dtypes: List[List[str]], coordinates: List[ArraySchemaRef], attributes: List[AttrSchemaRef], class_docstring: str | None, data_docstring: str | None)[source]

Schema for xarray data array

A data array maps a tuple of dimensions to (numpy) values. The schema allows for multiple options both on dimensions as well as types to be used.

attributes: List[AttrSchemaRef]

Attributes associated with data array

class_docstring: str | None

Documentation string of class

coordinates: List[ArraySchemaRef]

Coordinates data arrays giving values to dimensions

data_docstring: str | None

Documentation string of data in class

dimensions: List[List[str]]

List of possible dimensions

dtypes: List[List[str]]

List of possible dtype options, where each inner list contains (numpy) types as array interface protocol descriptors (e.g. “>f4”). Each inner list corresponds to a possible configuration of dtypes for the data array.

is_coord() bool[source]

Checks with this is a valid coordinate data array

Such data arrays must not have coordinate references of their own, i.e. be defined in terms of (integer) dimensions only. This is of course because their very purpose is to map these integer dimensions to semantically meaningful values, such as frequencies.

required_dimensions() list[str][source]

Returns set of dimensions that is always required

schema_name: str

(Class) name of the schema

class ArraySchemaRef(schema_name: str, dimensions: List[List[str]], dtypes: List[List[str]], coordinates: List[ArraySchemaRef], attributes: List[AttrSchemaRef], class_docstring: str | None, data_docstring: str | None, name: str, optional: bool, default: Any | None = None, docstring: str | None = None)[source]

Schema for xarray data array as referenced from a dataset schema

Includes information about name and docstring associated with array schema where referenced

default: Any | None = None

what is the default value?

Type:

If optional

docstring: str | None = None

Documentation string of array reference

name: str

Name of array schema as given in dataset.

optional: bool

Is the data array optional?

class AttrSchemaRef(type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray'], dict_schema: DictSchema | None = None, array_schema: ArraySchema | None = None, literal: List[Any] | None = None, optional: bool = False, name: str = '', default: Any | None = None, docstring: str = '')[source]

Schema information about an attribute as referenced from an array or dataset schema.

This includes the name and docstring associated with the attribute in the array or dataset schema definition.

default: Any | None = None

what is the default value?

Type:

If optional

docstring: str = ''

Documentation string of attribute reference

name: str = ''

Name of attribute as given in data array / dataset.

class DatasetSchema(schema_name: str, dimensions: list[list[str]], coordinates: list[ArraySchemaRef], data_vars: list[ArraySchemaRef], attributes: list[AttrSchemaRef], class_docstring: str | None)[source]

Schema for an xarray dataset

attributes: list[AttrSchemaRef]

List of attributes

class_docstring: str | None

Documentation string of class

coordinates: list[ArraySchemaRef]

List of coordinate data arrays

data_vars: list[ArraySchemaRef]

List of data arrays

dimensions: list[list[str]]

List of possible dimensions (derived from data arrays)

schema_name: str

(Class) name of the schema

class DictSchema(schema_name: str, attributes: list[AttrSchemaRef], class_docstring: str | None)[source]

Schema for a simple dictionary

attributes: list[AttrSchemaRef]

List of attributes

class_docstring: str | None

Documentation string of class

schema_name: str

(Class) name of the schema

class ValueSchema(type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray'], dict_schema: DictSchema | None = None, array_schema: ArraySchema | None = None, literal: List[Any] | None = None, optional: bool = False)[source]

Schema information about a value in an attribute or dictionary.

array_schema: ArraySchema | None = None

Array schema, if it is an xarray DataArray

dict_schema: DictSchema | None = None

Dictionary schema, if it is a dict

literal: List[Any] | None = None

Allowed literal values, if specified.

optional: bool = False

Is the value optional?

type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray']

Type of value

  • bool: A boolean

  • str: A UTF-8 string

  • int: A 64-bit signed integer

  • float: A double-precision floating point number

  • list[str]: A list of strings

  • dict: Dictionary

  • dataarray: An xarray dataarray (encoded using to_dict)

Import and Export

Functions to import and export xradio.schema.metamodel.DatasetSchema as JSON representation. This can be used to externalise schema checks, or generate documentation from schemas in JSON representation.

export_schema_json_file(schema: DatasetSchema, fname: str)[source]

Exports given schema as a JSON file

Parameters:
  • schema – Dataset schema. Dataclasses will be converted automatically.

  • fname – File name to write serialised schema to

import_schema_json_file(fname: str)[source]

Imports a schema from a JSON file

For JSON files generated by export_schema_json_file(), this will return a DatasetSchema.

Parameters:

fname – File name to load

Returns:

Deserialised object