Data Products

A Data Product is the only way agents are allowed to read or write data in TraceMem.

It is not a database connection, not a table, and not an API client.

A Data Product is a governed, purpose-bound interface to data that is safe to use inside decision envelopes and safe to remember forever.

Why Data Products Exist

Agents fail in enterprises not because they lack data, but because:

  • Data access rules are implicit
  • Privacy constraints are scattered
  • Schemas change silently
  • No one knows which version of data rules applied at decision time

Data Products solve this by turning "data access" into a named, versioned, auditable contract.

Agents never touch raw data.
They interact with Data Products.

What a Data Product Is (and Is Not)

A Data Product is:

  • A logical access boundary
  • A semantic contract over one or more data sources
  • A policy attachment point
  • A purpose-bound interface
  • A hashable, versioned artifact

A Data Product is not:

  • A physical database
  • A schema registry
  • An ETL pipeline
  • A copy of the data
  • A data warehouse table

Where Data Products Sit in the Flow

text
Decision Envelope
    ↓
Data Product read/write
    ↓
Policies evaluated
    ↓
Approvals (if needed)
    ↓
Outcome committed
    ↓
Decision Trace recorded

Every read or write event in a decision trace references a Data Product.

Core Responsibilities

A Data Product defines:

  1. What data is exposed - Exposed schema (subset of source schema)
  2. For what purposes - Allowed purposes (e.g., "order_processing", "support")
  3. Which operation is allowed - Exactly one of: read, insert, update, or delete
  4. Under which restrictions - Data residency, result modes, field-level restrictions
  5. With which policies - Attached policies that apply to all access
  6. In what shape - Schema definition
  7. Under what version - Immutable version with hash

Important: Each data product supports exactly one operation. If you need multiple operations (e.g., both read and insert), create separate data products.

It answers:

"What was the agent allowed to see or change in this decision?"

Data Product Lifecycle

1. Creation (Draft)

Data Products are created by administrators via:

  • The Admin Dashboard
  • The Admin API

Agents cannot create or modify Data Products.

Status: draft - Not used by agents yet

2. Publishing

Once published:

  • A Data Product becomes immutable
  • It receives a version identifier and hash
  • It can be used by agents
  • New decisions automatically use the latest published version

Status: published - Available for agent use

3. Deprecation

When a Data Product is replaced:

  • Old versions are deprecated (not deleted)
  • Historical traces remain valid
  • New decisions use the latest published version

Status: deprecated - Not used for new decisions, but historical traces remain valid

Key Components

Sources

Data Products reference one or more Connectors as data sources:

json
{
  "sources": [
    {
      "connector_id": "postgres-06505000-f1a19e",
      "type": "database",
      "system": "postgres",
      "resource": "public.customers"
    }
  ]
}

Exposed Schema

Only a subset of the source schema is exposed:

json
{
  "exposed_schema": [
    {
      "name": "customer_id",
      "type": "string",
      "classification": "identifier"
    },
    {
      "name": "email",
      "type": "string",
      "classification": "pii"
    },
    {
      "name": "tier",
      "type": "string",
      "classification": "business"
    }
  ]
}

Allowed Purposes

Every access must specify a purpose:

json
{
  "allowed_purposes": [
    "order_processing",
    "support_triage",
    "renewal_context"
  ]
}

Restrictions

Data Products can apply restrictions:

json
{
  "restrictions": {
    "data_residency": "eu",
    "result_mode_default": "summary",
    "allow_raw_values": false,
    "insert_config": {
      "return_created": true
    }
  }
}

Insert Method Configuration:

The insert_config object allows fine-grained control over insert operations:

  • return_created (boolean, optional): When true, the created object(s) are returned after insert operations. This is useful when you need database-generated IDs, timestamps, or other computed values immediately after insertion. Defaults to false.

  • allow_custom_primary_key (object, optional): For primary keys with auto-generated defaults (sequences, auto_increment, UUIDs), you can control whether users can provide custom values. Maps item IDs to boolean values. Example: {"item_id_123": true} allows custom values for that primary key.

  • column_config (object, optional): Per-column configuration controlling:

    • required: Whether the column must be provided in insert requests
    • allowed: Whether the column can be included in insert requests
    • default_behavior: How to handle default values:
      • "user_provided": User must provide the value
      • "db_default": Use the database default (omit column from INSERT)
      • "tracemem_default": Use a fixed default value (specify in tracemem_default_value)
      • "null": Set to NULL (only for nullable columns)

Example:

json
{
  "insert_config": {
    "return_created": true,
    "allow_custom_primary_key": {
      "id": false
    },
    "column_config": {
      "id": {
        "required": false,
        "allowed": false,
        "default_behavior": "db_default"
      },
      "email": {
        "required": true,
        "allowed": true
      },
      "status": {
        "required": false,
        "allowed": true,
        "default_behavior": "tracemem_default",
        "tracemem_default_value": "active"
      }
    }
  }
}

Allowed Operations

Every Data Product declares which operation is allowed. Each data product supports exactly one operation.

Available operations:

  • read - Read records (supports optional allow_multiple parameter, defaults to limit 1)
  • insert - Create new records (supports optional return_created configuration to return created objects)
  • update - Update records (single or many based on required columns)
  • delete - Delete records (single or many based on required columns)

Important: The legacy write operation has been deprecated. Use insert, update, or delete instead.

Example - Read-only product:

json
{
  "allowed_operations": {
    "read": true,
    "insert": false,
    "update": false,
    "delete": false
  }
}

Example - Insert-only product:

json
{
  "allowed_operations": {
    "read": false,
    "insert": true,
    "update": false,
    "delete": false
  }
}

At runtime:

  • Every data access must specify an operation
  • TraceMem validates the operation against the product's allowed_operations
  • Disallowed operations fail closed with an error
  • Exactly one operation must be enabled per data product

If you need multiple operations: Create separate data products. For example:

  • customer_data_read - For reading customer information
  • customer_data_insert - For creating new customers
  • customer_data_update - For updating existing customers

This separation provides better governance, clearer audit trails, and more granular policy control.

Attached Policies

Policies can be attached to Data Products:

json
{
  "attached_policies": [
    {
      "policy_id": "pii_access_v1",
      "required": true
    }
  ]
}

Purpose-Bound Access

Every read or write operation must specify a purpose:

python
# Agent reads data with explicit purpose
data = agent.read(
    product="customer_data",
    purpose="order_processing",  # Must be in allowed_purposes
    query={"customer_id": "123"}
)

Why this matters:

  • GDPR/CCPA compliance
  • Audit trail shows why data was accessed
  • Data minimization (only access what you need)

Versioning

Data Products are versioned and immutable once published:

  1. Draft - Can be edited freely
  2. Published - Immutable, receives version number and hash
  3. New Version - Editing creates a new version, old version remains
  4. Deprecated - Old versions can be deprecated, but remain for historical traces

Benefits:

  • Historical traces remain valid
  • Policy changes don't break audit trails
  • Clear evolution of data access rules

How Agents Use Data Products

Agents interact with Data Products through:

  1. Agent MCP - decision_read and decision_write tools
  2. SDKs (coming soon) - Language-specific SDKs

Example:

python
# Create decision
decision = agent.create_decision(
    intent="customer.order.create",
    automation_mode="propose"
)

# Read via Data Product (product must have read operation enabled)
customer = agent.read(
    decision_id=decision.id,
    product="customer_data",  # This product only allows read operations
    purpose="order_processing",
    query={"customer_id": "123"}
)

# Insert via Data Product (product must have insert operation enabled)
result = agent.write(
    decision_id=decision.id,
    product="orders",  # This product only allows insert operations
    purpose="order_creation",
    mutation={
        "operation": "insert",
        "records": [{"customer_id": "123", "total": 299.99}]
    }
)

# If the data product has return_created enabled, the created record is returned
if result.get("created_records"):
    created_order = result["created_records"][0]
    order_id = created_order["id"]  # Use the database-generated ID

Note: Each data product supports only one operation. The customer_data product in this example only allows reads, while the orders product only allows inserts. If you need to both read and insert, you would create separate products.

Best Practices

  1. Minimal exposed schema - Only expose fields agents need
  2. Specific purposes - Use specific purposes, not generic ones
  3. Version carefully - Test drafts before publishing
  4. Attach policies - Use policies for access control
  5. Document purposes - Make purposes clear and specific

Relationship to Other Concepts

  • Connectors - Data Products reference Connectors as sources
  • Policies - Data Products can have attached policies
  • Decision Envelopes - All data access happens within Decision Envelopes
  • Decision Traces - Every read/write event references a Data Product

TraceMem is trace-native infrastructure for AI agents