Data Validation

The data validation activity consists of examining a data structure and verifying its compliance to a given schema.

Why data validation?

Data validation ensures that application data is properly structured and thus is ready for utilization by applications; it follows a list of constraints and other rules defined in a schema. Usually, product data is validated using EXPRESS schemas or XML schemas (XSD).

The language in which the schema is defined has its set of features allowing the verification of specific constraints (e.g. ensuring that a field does contain a string and not a number; that an attribute points to any inherited members of a class under specific conditions...).

Read more about data validation on Wikipedia

How to do data validation

Ontologies are not really a means to validate data; at least, not directly. They can, however, be used to generate constraints and rules in a language that can be understood by the applications that will actually perform the validation operation.

OWL to SPIN conversion
SPIN rules can be generated directly from ontologies

We recommend using SPIN to define these constraints and rules because they are easy to generate from an ontology. A SPIN API can then be used to compute these rules against some dataset and, as an output, generate other triples on the consistency of a dataset.

For example:

s3kl:ContractedProductVariant
      rdf:type owl:Class , s3kl:ClassS3000L ;
      rdfs:subClassOf [ 
          rdf:type owl:Restriction ;
          owl:onProperty s3kl:contractedProductVariantRelated ; 
          owl:cardinality 1 ] ;
      rdfs:subClassOf [ 
          rdf:type owl:Restriction ;
          owl:onProperty s3kl:contractedProductVariantRelating ; 
          owl:cardinality 1 ] .

This ontology axiom could be translated into the following SPIN rule:

s3kl:ContractedProductVariant spin:constraint [
  a sp:Ask ;
  sp:text """
    ASK WHERE {
      {
        SELECT ?this (COUNT(?contract) as ?nbContracts) (COUNT(?productVariant) as ?nbProductVariants) WHERE {
          ?this s3kl:contractedProductVariantRelating ?contract .
          ?this s3kl:contractedProductVariantRelated ?productVariant
        } GROUP BY ?this
      }
      FILTER (?nbContracts = 1 && ?nbProductVariants = 1)
    }
  """ ;
  rdfs:comment "Cardinality constraint for ContractedProductVariant violation" .
]

Finally, this rule can be executed by a SPIN-enabled RDF application and generate triples to report on the consistency of the data. That's right, we add even more data to the data, and new triples (if we are being unfortunate) are added to the dataset and will help point out any inconsistency in the data.