OJP

Rules for Schema Subsets

GTRI
Home
Technical Documentation
Global JXDM Developer's Workshop
XML Tools
Schema Subset Generation Tool (SSGT)
Global JXDM Issue Tracking
Office of Justice Programs (OJP)
National Information Exchange Model (NIEM)
On This Page
Introduction
Concepts
How to determine when a specific component is required
Namespace prefix declaration
Import of a schema
Type definition
Global element definition
Global attribute definition
Occurrence of an element in a type
Occurrence of an attribute in a type definition
Occurrence of an attribute in the SuperTypeMetadata
Enumeration
What you may omit
Other changes to schemas
Application of cardinality constraints
What you should not do

This document is a draft, and is under continual revision.
The latest version may be obtained online.

Introduction

Many users have found that specific XML Schema validation tools operate slowly when validating against the justice data model schemas. The justice data model schemas are, however, reference schemas, and need not be imported whole. This document lays out the rules under which users may construct schemas that are subsets of the full justice data model schemas. Such schemas may be used by applications and tools that do not to accept the entire justice data model.

The forthcoming schema subset tool will solve this problem for many users. It will allow users to develop their desired subset interactively, and will generate a full set of conformant schemas for them. An XSL-based toolset is also being developed for creating subset schemas.

Neither tool is required for generating subset schemas; they may be created with any good text editor. Subset schemas may be created through careful removal of non-required components in accordance with the rules, or they may be created additively, by adding required components. The deletion / removal method is quite tedious. The additive method, when done by hand, is prone to error caused by pasting components in the wrong order. This is why the XSL tooset is being developed.

In generating a subset schema, there is one primary, overriding rule:

Instances that validate to the subset schema will validate to the full schema.

All of the guidelines which follow keep this primary rule as the goal. We generate schemas that are subsets of the original schema. Substitution of the full schema in place of the subset is always an option. Any instance that validates to a subset schema will validate to the full justice data model schema.

This information in this document is quite technical, and requires an understanding of the justice data model, as well as XML Schema. Readers unfamiliar with either may wish to study those areas before proceeding.

Concepts

Schema size may be reduced by removing any component that is not required . This is most easily done by starting with initial requirements and following the algorithm .

The initial requirements of an application are:

A component is required if:

The algorithm for constructing a minimal schema subset to satisfy requirements:

  1. Start with all schema components that are required by instances and outside schemas .
  2. Look at each unexamined component, adding all components that it requires. See the components section below for descriptions of what each component requires.
  3. If there are no more unexamined components, and you have properly followed the rules for components , the subset is complete.
  4. Otherwise, goto 2, continuing the process until all unexamined components are examined and their requirements resolved.

How to determine when a specific component is required

A component is any of the following. Each component lists the ways in which it may be required , and other components which it may require.

Namespace prefix declaration

A namespace prefix declaration is required when the declared prefix is used anywhere within the schema file.

For example, if an entity from the 'http://xyz.com/ns' namespace is used within the schema, then a namespace declaration must be included for that namespace.

Import of a schema

Each schema defines the contents of a single namespace (the targetNamespace of the schema). If Schema B refers to an entity of Schema A (an entity in the namespace defined by Schema A), then Schema B must import Schema A. If Schema B does not refer to an entity of Schema A, then Schema A need not import Schema B.

An import is required when any component from the namespace which it imports is used within the schema file.

Type definition

A type definition is required when:

  • The type is used by the consumer of the schema (either in an instance or an outside schema)
  • It is used as the type of an element or attribute
  • It is the base type of another type definition

    This requirement should not be ignored. When defining a type, it is important that all base types of that type be included in the schema. Even if they are empty (i.e. have no attribues or elements), they should be included. A base type may be empty if there are no requirements for elements or attributes of that type to appear in instances or outside schemas . Even in such a case, the type should still be included, preserving the type hierarchy.

A type definition may require other components:

  • The base type of the type.

    This type may be simple or complex. Also required are the base type of that type, and the base type of that type, etc.

  • The SuperTypeMetadata attribute group
  • Occurrences of attributes in the SuperTypeMetadata
  • Occurrences of attributes in the type
  • Occurrences of elements in the type

Global element definition

Every element that represents a data model entity has a single definition in the schema. A global element definition is required when:

  • It is referenced in a type definition, as an occurrence of an element in a type
  • It is referenced by a non-data model schema, such as a schema for local extensions.

A global element definition requires:

  • The type definition for its data type.

Global attribute definition

Every attribute in the schema that represents a data model property is given a single definition. Each use of the attribute references that definition (using ref=)

A global attribute definition is required when:

  • It is referenced in a type definition, as an occurrence of an attribute in a type
  • It is referenced in the SuperTypeMetadata attribute group

A global attribute defintion requires:

  • The type definition for its data type

Occurrence of an element in a type

An occurrence of an element in a type is required when:

  • An instance of a type uses the element.

    For example, if an instance of PersonType contains PersonName, then the element PersonName is required to occur in the type PersonType.

  • An instance of a subclass of the type uses the element.

    For example, an instance of EnforcementOfficialType requires PersonName, then the element PersonName is required to occur in the type PersonType, since EnforcementOfficialType inherits PersonName from PersonType.

An occurrence of an element in a type definition requires:

  • The global definition of the element

Occurrence of an attribute in a type definition

An occurrence of an attribute in a type is required when:

  • An instance of a type uses the attribute
  • An instance of a subclass of the type uses the attribute

It requires:

  • The global definition of the attribute

Occurrence of an attribute in the SuperTypeMetadata

An occurrence of an attribute in the SuperTypeMetadata is required when:

  • An instance of the SuperType uses the attribute
  • An instance of any subtype of SuperType uses the attribute

Enumeration

An enumeration is required when:

  • An instance may include the enumeration as a data value.

What you may omit

What we've specified up to this point is the set of rules for when objects are required. Here we summarize specifically what may be omitted from the subset schema.

  1. Any entities above that are not required, including:
    • Namespace prefix declaration: Omit unnecessary xmlns: attributes.
    • Import of a schema: Omit unnecessary xsd:import elements.
    • Type definition: Omit unnecessary xsd:complexType and xsd:simpleType elements that are "top-level" (direct children of the xsd:schema element.
    • Global element definition: Omit xsd:element elements that are children of the xsd:schema element.
    • Global attribute definition: Omit xsd:attribute elements that are children of the xsd:schema element.
    • Occurrence of an element in a type: Omit xsd:element elements that are contained in a type definition.
    • Occurrence of an attribute in a type: Omit xsd:attribute elements that are contained in a type definition.
    • Occurrence of an attribute in the SuperTypeMetadata: Omit xsd:attribute elements that appear in the SuperTypeMetadata attributeGroup.
    • Enumeration: Omit xsd:enumeration elements, within a simple type, for enumerations that are not relevant to the applications or instances.
  2. Documentation: Omit xsd:annotation or xsd:documentation elements.

Other changes to schemas

Application of cardinality constraints

Users may wish to place cardinality restrictions on occurrences of elements in types. By default, most entities are entered as [0..unbounded] (minOccurs="0", maxOccurs="unbounded"). Cardinality restrictions will not invalidate a subset schema. To restrict cardinality, you may increase the minOccurs value, and decrease the maxOccurs value. For example:

<xsd:element ref='PersonFullName' minOccurs='0' maxOccurs='unbounded' />

may be restricted to any of the following:

<xsd:element ref='PersonFullName' minOccurs='0' maxOccurs='999' />
<xsd:element ref='PersonFullName' minOccurs='0' maxOccurs='1' />
<xsd:element ref='PersonFullName' minOccurs='1' maxOccurs='1' />
<xsd:element ref='PersonFullName' minOccurs='1' maxOccurs='unbounded' />
<xsd:element ref='PersonFullName' minOccurs='999' maxOccurs='unbounded' />
<xsd:element ref='PersonFullName' minOccurs='1' maxOccurs='2' />

What you should not do

There are a large number of transformations which could be performed on the schema that are not recommended. It is important that the generated schema truly is a subset of the full justice data model schema. Here is a list of changes that should not be performed:

  1. Do not omit any components that are required.
  2. Do not rearrange or change the order of elements appearing in schemas.
  3. Do not change the base type of any type
  4. Do not change the type of any element or attribute
  5. Do not make global components local
  6. Do not change the namespace of elements, attributes, types, etc.
  7. Do not rename components
  8. Do not flatten structures
  9. Do not change any namespace URI
  10. Do not change the relative file path location of files (i.e. .../3.0/jxdm.xsd)
  11. Do not do anything not specified by this document