Schema Support
Data model schemas not only allow us to generate documentation,
but also check automatically whether xarray.DataArray and
xarray.Dataset objects conform to the xradio schemas (see
e.g. xradio.measurement_set.schema.VisibilityXds).
Checking
- class SchemaIssue(path: List[Tuple[str, str]], message: str, found: Any | None = None, expected: List[Any] | None = None)[source]
Representation of an issue found in a schema check
As schemas can be quite big,
pathcan be used to precisely locate the source of the issue.
- exception SchemaIssues(issues=None)[source]
List of issues found in a schema check
Can be thrown as an exception, so we can report on multiple schema issues in one go.
- expect(elem: str | None = None, ix: str | None = None)[source]
Raises this object if issues were found
- Parameters:
elem – If given, will be added to path
ix – If given, will be added to path
- Raises:
SchemaIssues
- issues: list[SchemaIssue]
List of issues found
- check_array(array: DataArray, schema: type | ArraySchema) SchemaIssues[source]
Check whether an xarray DataArray conforms to a schema
- Parameters:
array – DataArray to check
schema – Schema to check against
- Returns:
SchemaIssuesfound
- check_attributes(attrs: Dict[str, Any], attrs_schema: List[AttrSchemaRef], attr_kind: str = 'attrs') SchemaIssues[source]
Check whether an attribute set conforms to a schema
- Parameters:
attrs – Dictionary of attributes
attrs_schema – Expected schemas
- Returns:
SchemaIssuesfound
- check_data_vars(data_vars: Dict[str, DataArray], data_vars_schema: List[ArraySchemaRef], data_var_kind: str) SchemaIssues[source]
Check whether a data variable set conforms to a schema
As data variables are data arrays, this will recurse into checking the array schemas
- Parameters:
data_vars – Dictionary(-like) of data_varinates
data_vars_schema – Expected schemas
datavar_kind – Either ‘coords’ or ‘data_vars’
- Returns:
SchemaIssuesfound
- check_dataset(dataset: Dataset, schema: type | DatasetSchema, allow_superflous_dims: Set[str] = frozenset({})) SchemaIssues[source]
Check whether an xarray DataArray conforms to a schema
- Parameters:
array – DataArray to check
schema – Schema to check against
- Returns:
SchemaIssuesfound
- check_datatree(datatree: DataTree)[source]
Check datatree for schema conformance
This is the case if each contained
Datasetconforms to a schema registed with Xradio. This works by looking for atypeattribute in theDataset, which must have atyping.Literaltype annotation specifying the name of the dataset schema.- Parameters:
datatree – Data to check for schema conformance
- check_dict(dct: dict, schema: type | DictSchema) SchemaIssues[source]
Check whether a dictionary conforms to a schema
- Parameters:
dct – Dictionary to check
schema – Dictionary schema to check against
- Returns:
SchemaIssuesfound
- check_dimensions(dims: [<class 'str'>], expected: [[<class 'str'>]], check_order: bool = True, allow_superflous: ~typing.Set[str] = frozenset({})) SchemaIssues[source]
Check whether a dimension list conforms to a schema
- Parameters:
array – Dimension list to check
schema – Expected possibilities for dimension list
check_order – Whether to check order of dimensions
- Returns:
SchemaIssuesfound
- check_dtype(dtype: ~numpy.dtype, expected: [<class 'numpy.dtype'>]) SchemaIssues[source]
Check whether a numpy dtype conforms to a schema
- Parameters:
dtype – Numeric type to check
schema – Expected possibilities for dtype
- Returns:
SchemaIssuesfound
- register_dataset_type(schema: DatasetSchema)[source]
Registers the given schema for usage with
check_datatree()This looks for a
typeattribute in the dataset schema, which must have atyping.Literaltype annotation specifying the type name of the dataset- Parameters:
schema – Schema to register
- schema_checked(fn, check_parameters: bool = True, check_return: bool = True)[source]
Function decorator to check parameters and return value for schema conformance
- Parameters:
fn – Function to decorate
check_parameters – Whether to check parameters. Can also pass an iterable with parameters to check
check_return – Whether to check return value
- Returns:
Decorated function
Decorators
Class decorators to generate schemas from suitably annotated Python class definition. This approach was essentially copied from https://pypi.org/project/xarray-dataclasses/, though our implementation differs in a number of critical ways:
We use custom decorators on the classes instead of base classes. This especially overrides the existing constructor, which makes it easier to directly construct instances and allows for extra data variables and attributes.
We support multiple options for types and dimensions
We convert the schema definition into our own meta-model, which facilitates generating documentation generation using Sphinx
- dict_schema(cls)[source]
Decorator for classes representing
dictschemas, along the lines ofxarray_dataarray_schema()andxarray_dataset_schema().The annotated class can contain fields with arbitrary annotations, similar to a dataclass. They can be used with
check_dict()for checking dictionieries against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirmingxarray.Datasetobjects.
- xarray_dataarray_schema(cls)[source]
Decorator for classes representing
xarray.DataArrayschemas. The annotated class should exactly contain:one field called “
data” annotated withDatato indicate the array typefields annotated with
Coordto indicate mappings of dimensions to coordinates (coordinates directly associated with dimensions should have the same name as the dimension)fields annotated with
Attrto declare attributes
Decorated schema classes can be used with
check_array()for checkingxarray.DataArrayobjects against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirmingxarray.DataArrayobjects.For example:
from xradio.schema import xarray_dataarray_schema from xradio.schema.typing import Data, Coord, Attr from typing import Optional, Literal import dataclasses Coo = Literal["coo"] @xarray_dataarray_schema class TestArray: data: Data[Coo, complex] coo: Coord[Coo, float] attr1: Attr[str] attr2: Attr[int] = 123 attr3: Optional[Attr[int]] = None
This data class represents a one-dimensional
xarray.DataArraywith complex data, afloatcoordinate and three attributes. Instances of this class cannot actually be constructed, instead you will get an appropriatexarray.DataArrayobject:>>> TestArray(data=[1,2,3], attr1="foo") <xarray.DataArray (coo: 3)> array([1.+0.j, 2.+0.j, 3.+0.j]) Coordinates: * coo (coo) float64 0.0 1.0 2.0 Attributes: attr1: foo attr2: 123
Note that:
The constructor uses the annotations to identify the role of every parameter
The data was automatically converted into a
numpy.ndarrayAs there was no coordinate given, it was automatically filled with an enumeration of the type specified in the annotation
Default attribute values were assigned. A value of None is interpreted as the value attribute being missing.
For the returned
DataArrayobjectdata,coo,attr1andattr2can be accessed as if they were members. This works as long as the names don’t collide withDataArraymembers.
Positional parameters are also supported, and
coordsandattrspassed as keyword arguments can supply additional coordinates and attributes:>>> TestArray([1,2,3], [3,4,5], 'bar', coords={'coo_new': ('coo', [3,2,1])}, attrs={'xattr': 'baz'}) <xarray.DataArray (coo: 3)> array([1.+0.j, 2.+0.j, 3.+0.j]) Coordinates: coo_new (coo) int64 3 2 1 * coo (coo) float64 3.0 4.0 5.0 Attributes: xattr: baz attr1: bar attr2: 123
- xarray_dataset_schema(cls)[source]
Decorator for classes representing
xarray.Datasetschemas. The annotated class should exactly contain:fields annotated with
Coordto indicate mappings of dimensions to coordinates (coordinates directly associated with dimensions should have the same name as the dimension)fields annotated with
Datato indicate data variablesfields annotated with
Attrto declare attributes
Decorated schema classes can be used with
check_dataset()for checkingxarray.Datasetobjects against the schema. Furthermore, the class constructor will be overwritten to generate schema-confirmingxarray.Datasetobjects.
Annotations
Typing support for xarray data classes
This has been extracted from the xarray-dataclasses package by astropenguin (see https://github.com/astropenguin/xarray-dataclasses/). The reason we replicate this here is because we actually ignore / redo everything but the type annotations, especially adding xradio-specific support for multiple options in data variable / coordinate dimensionality and dtype.
- Attr
Type hint for attribute fields (
Attr[T]).Example
@dataclass class Image(): data: Data[tuple[X, Y], float] long_name: Attr[str] = "luminance" units: Attr[str] = "cd / m^2"
Hint
The following field names are specially treated when plotting.
long_nameorstandard_name: Coordinate name.units: Coordinate units.
alias of
T[T]
- Coord
Type hint for coordinate fields (
Coord[TDims, TDType]).Example
@dataclass class Image(): data: Data[tuple[X, Y], float] mask: Coord[tuple[X, Y], bool] x: Coord[X, int] = 0 y: Coord[Y, int] = 0
Hint
A coordinate field whose name is the same as
TDims(e.g.x: Coord[X, int]) can define a dimension.alias of
Union[Labeled[TDims],Collection[TDType],TDType][Union[Labeled[TDims],Collection[TDType],TDType]]
- Coordof
Type hint for coordinate fields (
Coordof[TDataClass]).Unlike
Coord, it specifies a dataclass that defines a DataArray class. This is useful when users want to add metadata to dimensions for plotting.Example
@dataclass class XAxis: data: Data[X, int] long_name: Attr[str] = "x axis" @dataclass class YAxis: data: Data[Y, int] long_name: Attr[str] = "y axis" @dataclass class Image(): data: Data[tuple[X, Y], float] x: Coordof[XAxis] = 0 y: Coordof[YAxis] = 0
- Data
Type hint for data fields (
Coordof[TDims, TDType]).Example
Exactly one data field is allowed in a DataArray class (the second and subsequent data fields are just ignored):
@dataclass class Image(): data: Data[tuple[X, Y], float]
Multiple data fields are allowed in a Dataset class:
@dataclass class ColorImage(): red: Data[tuple[X, Y], float] green: Data[tuple[X, Y], float] blue: Data[tuple[X, Y], float]
alias of
Union[Labeled[TDims],Collection[TDType],TDType][Union[Labeled[TDims],Collection[TDType],TDType]]
- class DataClass(*args: ~typing.~PInit, **kwargs: ~typing.~PInit)[source]
Type hint for dataclass objects.
- Dataof
Type hint for data fields (
Coordof[TDataClass]).Unlike
Data, it specifies a dataclass that defines a DataArray class. This is useful when users want to reuse a dataclass in a Dataset class.Example
@dataclass class Image: data: Data[tuple[X, Y], float] x: Coord[X, int] = 0 y: Coord[Y, int] = 0 @dataclass class ColorImage(): red: Dataof[Image] green: Dataof[Image] blue: Dataof[Image]
- Name
Type hint for name fields (
Name[THashable]).Example
@dataclass class Image(): data: Data[tuple[X, Y], float] name: Name[str] = "image"
alias of
THashable[THashable]
- class Role(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Annotations for typing dataclass fields.
- ATTR = 'attr'
Annotation for attribute fields.
- COORD = 'coord'
Annotation for coordinate fields.
- DATA = 'data'
Annotation for data (variable) fields.
- NAME = 'name'
Annotation for name fields.
- OTHER = 'other'
Annotation for other fields.
- get_annotations(tp: Any) Tuple[Any, ...][source]
Extract annotations of the first role-annotated type.
- get_name(tp: Any, default: Hashable = None) Hashable[source]
Extract a name if found or return given default.
- get_role(tp: Any, default: Role = Role.OTHER) Role[source]
Extract a role if found or return given default.
Data Model
Data classes used by Xradio routines to represent dataset schemas.
- class ArraySchema(schema_name: str, dimensions: List[List[str]], dtypes: List[List[str]], coordinates: List[ArraySchemaRef], attributes: List[AttrSchemaRef], class_docstring: str | None, data_docstring: str | None)[source]
Schema for xarray data array
A data array maps a tuple of dimensions to (numpy) values. The schema allows for multiple options both on dimensions as well as types to be used.
- attributes: List[AttrSchemaRef]
Attributes associated with data array
- coordinates: List[ArraySchemaRef]
Coordinates data arrays giving values to dimensions
- dtypes: List[List[str]]
List of possible dtype options, where each inner list contains (numpy) types as array interface protocol descriptors (e.g. “>f4”). Each inner list corresponds to a possible configuration of dtypes for the data array.
- is_coord() bool[source]
Checks with this is a valid coordinate data array
Such data arrays must not have coordinate references of their own, i.e. be defined in terms of (integer) dimensions only. This is of course because their very purpose is to map these integer dimensions to semantically meaningful values, such as frequencies.
- class ArraySchemaRef(schema_name: str, dimensions: List[List[str]], dtypes: List[List[str]], coordinates: List[ArraySchemaRef], attributes: List[AttrSchemaRef], class_docstring: str | None, data_docstring: str | None, name: str, optional: bool, default: Any | None = None, docstring: str | None = None)[source]
Schema for xarray data array as referenced from a dataset schema
Includes information about name and docstring associated with array schema where referenced
- class AttrSchemaRef(type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray'], dict_schema: DictSchema | None = None, array_schema: ArraySchema | None = None, literal: List[Any] | None = None, optional: bool = False, name: str = '', default: Any | None = None, docstring: str = '')[source]
Schema information about an attribute as referenced from an array or dataset schema.
This includes the name and docstring associated with the attribute in the array or dataset schema definition.
- class DatasetSchema(schema_name: str, dimensions: list[list[str]], coordinates: list[ArraySchemaRef], data_vars: list[ArraySchemaRef], attributes: list[AttrSchemaRef], class_docstring: str | None)[source]
Schema for an xarray dataset
- attributes: list[AttrSchemaRef]
List of attributes
- coordinates: list[ArraySchemaRef]
List of coordinate data arrays
- data_vars: list[ArraySchemaRef]
List of data arrays
- class DictSchema(schema_name: str, attributes: list[AttrSchemaRef], class_docstring: str | None)[source]
Schema for a simple dictionary
- attributes: list[AttrSchemaRef]
List of attributes
- class ValueSchema(type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray'], dict_schema: DictSchema | None = None, array_schema: ArraySchema | None = None, literal: List[Any] | None = None, optional: bool = False)[source]
Schema information about a value in an attribute or dictionary.
- array_schema: ArraySchema | None = None
Array schema, if it is an xarray DataArray
- dict_schema: DictSchema | None = None
Dictionary schema, if it is a dict
- type: Literal['bool', 'str', 'int', 'float', 'list[str]', 'dict', 'dataarray']
Type of value
bool: A booleanstr: A UTF-8 stringint: A 64-bit signed integerfloat: A double-precision floating point numberlist[str]: A list of stringsdict: Dictionarydataarray: An xarray dataarray (encoded usingto_dict)
Import and Export
Functions to import and export xradio.schema.metamodel.DatasetSchema
as JSON representation. This can be used to externalise schema checks, or
generate documentation from schemas in JSON representation.
- export_schema_json_file(schema: DatasetSchema, fname: str)[source]
Exports given schema as a JSON file
- Parameters:
schema – Dataset schema. Dataclasses will be converted automatically.
fname – File name to write serialised schema to
- import_schema_json_file(fname: str)[source]
Imports a schema from a JSON file
For JSON files generated by
export_schema_json_file(), this will return aDatasetSchema.- Parameters:
fname – File name to load
- Returns:
Deserialised object