Skip to content

Creating JSON Schema

JSON Schema is a tool used to validate data. In Synapse, JSON Schemas can be used to validate the metadata applied to an entity such as project, file, folder, table, or view, including the annotations applied to it. To learn more about JSON Schemas, check out JSON-Schema.org.

Synapse supports a subset of features from json-schema-draft-07. To see the list of features currently supported, see the JSON Schema object definition from Synapse's REST API Documentation.

In this tutorial, you will learn how to create, register, and bind JSON Schemas using an existing data model.

Tutorial Purpose

You will learn the complete JSON Schema workflow: 1. Generate JSON schemas from your data model 2. Register schemas to a Synapse organization 3. Bind schemas to Synapse entities for metadata validation

This tutorial uses the Python client as a library. To use the CLI tool, see the command line documentation.

Prerequisites

1. Initial set up

from synapseclient import Synapse
from synapseclient.extensions.curator import (
    bind_jsonschema,
    generate_jsonschema,
    register_jsonschema,
)

# Path or URL to your data model (CSV or JSONLD format)
# Example: "path/to/my_data_model.csv" or "https://raw.githubusercontent.com/example.csv"
DATA_MODEL_SOURCE = "tests/unit/synapseclient/extensions/schema_files/example.model.csv"
# List of component names/data types to create schemas for, or None for all components/data types
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
OUTPUT_DIRECTORY = "temp"

syn = Synapse()
syn.login()

To create a JSON Schema you need a data model, and the data types you want to create. The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL. Data model_documentation.

The data types must exist in your data model. This can be a list of data types, or None to create all data types in the data model.

2. Create a JSON Schema

Create a JSON Schema

schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    data_types=DATA_TYPE,
    synapse_client=syn,
)

print(schemas[0])

You should see the first JSON Schema for the datatype you selected printed. It will look like this schema. By setting the output parameter as path to a "temp" directory, the file will be created as "temp/Patient.json".

3. Create multiple JSON Schema

Create multiple JSON Schema

# Create JSON Schemas for multiple data types
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    data_types=["Patient", "Biospecimen"],
    synapse_client=syn,
)

The data_types parameter is a list and can have multiple data types.

4. Create every JSON Schema

Create every JSON Schema

# Create JSON Schemas for all data types
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    synapse_client=syn,
)

If you don't set a data_types parameter a JSON Schema will be created for every data type in the data model.

5. Create a JSON Schema with a certain path

Create a JSON Schema

# Specify path for JSON Schema
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    data_types=DATA_TYPE,
    output="test.json",
    synapse_client=syn,
)

If you have only one data type and set the output parameter to a file path(ending in.json), the JSON Schema file will have that path.

6. Create a JSON Schema in the current working directory

Create a JSON Schema

# Create JSON Schema in cwd
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    data_types=DATA_TYPE,
    synapse_client=syn,
)

If you don't set output parameter the JSON Schema file will be created in the current working directory.

7. Register a JSON Schema to Synapse

Once you've created a JSON Schema file, you can register it to a Synapse organization.

# Register a JSON Schema to Synapse
json_schema = register_jsonschema(
    schema_path="temp/Patient.json",  # Path to the generated JSON Schema file
    organization_name="my.organization",  # Your Synapse organization name
    schema_name="patient.schema",  # Name for the schema
    schema_version="0.0.1",  # Optional version number
    synapse_client=syn,
)
print(f"Registered schema URI: {json_schema.uri}")

The register_jsonschema function: - Takes a path to your generated JSON Schema file - Registers it with the specified organization in Synapse - Returns the schema URI and a success message - You can optionally specify a version (e.g., "0.0.1") or let it auto-generate

8. Bind a JSON Schema to a Synapse Entity

After registering a schema, you can bind it to Synapse entities (files, folders, etc.) for metadata validation.

result = bind_jsonschema(
    entity_id="syn12345678",  # Replace with your entity ID (file, folder, etc.)
    json_schema_uri=json_schema.uri,  # URI from the registered schema
    enable_derived_annotations=True,  # Enable auto-population of metadata
    synapse_client=syn,
)
print(f"Successfully bound schema to entity: {result}")

The bind_jsonschema function: - Takes a Synapse entity ID (e.g., "syn12345678") - Binds the registered schema URI to that entity - Optionally enables derived annotations to auto-populate metadata - Returns binding details

Source Code for this Tutorial

Click to show me
from synapseclient import Synapse
from synapseclient.extensions.curator import (
    bind_jsonschema,
    generate_jsonschema,
    register_jsonschema,
)

# Path or URL to your data model (CSV or JSONLD format)
# Example: "path/to/my_data_model.csv" or "https://raw.githubusercontent.com/example.csv"
DATA_MODEL_SOURCE = "tests/unit/synapseclient/extensions/schema_files/example.model.csv"
# List of component names/data types to create schemas for, or None for all components/data types
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
OUTPUT_DIRECTORY = "temp"

syn = Synapse()
syn.login()

schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    data_types=DATA_TYPE,
    synapse_client=syn,
)

print(schemas[0])


# Create JSON Schemas for multiple data types
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    data_types=["Patient", "Biospecimen"],
    synapse_client=syn,
)

# Create JSON Schemas for all data types
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    output=OUTPUT_DIRECTORY,
    synapse_client=syn,
)

# Specify path for JSON Schema
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    data_types=DATA_TYPE,
    output="test.json",
    synapse_client=syn,
)

# Create JSON Schema in cwd
schemas, file_paths = generate_jsonschema(
    data_model_source=DATA_MODEL_SOURCE,
    data_types=DATA_TYPE,
    synapse_client=syn,
)

# Register a JSON Schema to Synapse
json_schema = register_jsonschema(
    schema_path="temp/Patient.json",  # Path to the generated JSON Schema file
    organization_name="my.organization",  # Your Synapse organization name
    schema_name="patient.schema",  # Name for the schema
    schema_version="0.0.1",  # Optional version number
    synapse_client=syn,
)
print(f"Registered schema URI: {json_schema.uri}")

# Bind a JSON Schema to a Synapse entity
result = bind_jsonschema(
    entity_id="syn12345678",  # Replace with your entity ID (file, folder, etc.)
    json_schema_uri=json_schema.uri,  # URI from the registered schema
    enable_derived_annotations=True,  # Enable auto-population of metadata
    synapse_client=syn,
)
print(f"Successfully bound schema to entity: {result}")

Reference