The third vertex of the buyer-seller-evaluator triangle. Judge deliverable quality with LLMs, test suites, or any custom logic.
Every commerce transaction has two obvious parties: the buyer who pays and the seller who delivers. The evaluator is the optional third party that judges whether the deliverable actually meets the contract terms.
This is the Hermetic Polarity principle: three forces in equilibrium. The buyer wants quality, the seller wants payment, and the evaluator keeps both honest.
approved or rejected) with a quality score (1–5) and reasoningThe SDK provides a factory that returns a fully functional CommerceAgent pre-configured to handle evaluation requests.
import { createEvaluatorAgent, generateKeyPair } from '@dan-protocol/sdk'
const evaluator = createEvaluatorAgent({
domain: 'evaluator.example.com',
keyPair: generateKeyPair(),
})
await evaluator.listen({ port: 3003 })That is a working evaluator. It uses the default heuristic (explained below), has a default fee of 1 USD, and serves the standard protocol endpoints.
interface EvaluatorAgentConfig {
domain: string
name?: string // Default: "Reference Evaluator Agent"
keyPair: AgentKeyPair
didResolver?: DIDResolver
evaluationFee?: number // Default: 1
currency?: string // Default: "USD"
evaluateFn?: EvaluateFn // Default: simple heuristic
}| Field | Required | Default | Description |
|---|---|---|---|
domain | Yes | — | Domain for the DID (did:web:domain) |
keyPair | Yes | — | Ed25519 keypair for signing verdicts |
name | No | "Reference Evaluator Agent" | Human-readable name shown in discovery |
evaluationFee | No | 1 | Fee charged per evaluation |
currency | No | "USD" | Currency for the evaluation fee |
evaluateFn | No | Default heuristic | Custom evaluation logic |
The evaluation function receives three things and must return a verdict:
type EvaluateFn = (params: {
originalInput: Record<string, unknown>
contractTerms: { serviceId: string; price: number; currency: string }
deliverable: Record<string, unknown>
}) => Promise<{
verdict: 'approved' | 'rejected'
score: number // 1-5, clamped automatically
reasoning: string
}>| Input field | Type | Description |
|---|---|---|
originalInput | Record<string, unknown> | The original input the buyer sent to the seller |
contractTerms | { serviceId, price, currency } | What was agreed in the contract |
deliverable | Record<string, unknown> | What the seller actually delivered |
| Output field | Type | Description |
|---|---|---|
verdict | 'approved' | 'rejected' | Whether the deliverable meets the contract |
score | number | Quality score 1–5 (clamped and rounded by the SDK) |
reasoning | string | Human-readable explanation of the verdict |
If you do not provide a custom evaluateFn, the evaluator uses a simple built-in heuristic:
rejected, score 1, reasoning "Deliverable is empty"rejected, score 1approved, score based on content size:This heuristic is intentionally naive. For production evaluators, plug in a real evaluation function.
The most common evaluator pattern: use an LLM to judge quality.
import { createEvaluatorAgent, generateKeyPair } from '@dan-protocol/sdk'
import Anthropic from '@anthropic-ai/sdk'
const anthropic = new Anthropic()
const evaluator = createEvaluatorAgent({
domain: 'eval.example.com',
name: 'Claude Quality Evaluator',
keyPair: generateKeyPair(),
evaluationFee: 2,
currency: 'USD',
evaluateFn: async ({ originalInput, contractTerms, deliverable }) => {
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{
role: 'user',
content: `You are a quality evaluator for an agent commerce protocol.
The buyer requested service "${contractTerms.serviceId}" and paid ${contractTerms.price} ${contractTerms.currency}.
Original input:
${JSON.stringify(originalInput, null, 2)}
Deliverable received:
${JSON.stringify(deliverable, null, 2)}
Evaluate the quality. Respond with JSON only:
{
"verdict": "approved" or "rejected",
"score": 1-5,
"reasoning": "your explanation"
}`,
}],
})
const text = message.content[0].type === 'text' ? message.content[0].text : ''
return JSON.parse(text)
},
})
await evaluator.listen({ port: 3003 })You can use any LLM. The evaluateFn is just an async function — what happens inside is your business.
For code-related services, you can run actual tests against the deliverable:
import { createEvaluatorAgent, generateKeyPair } from '@dan-protocol/sdk'
import { exec } from 'node:child_process'
import { promisify } from 'node:util'
import { writeFile, rm, mkdir } from 'node:fs/promises'
const execAsync = promisify(exec)
const evaluator = createEvaluatorAgent({
domain: 'code-eval.example.com',
name: 'Code Test Evaluator',
keyPair: generateKeyPair(),
evaluationFee: 5,
evaluateFn: async ({ originalInput, contractTerms, deliverable }) => {
const code = deliverable.code as string
const tests = deliverable.tests as string
if (!code || !tests) {
return { verdict: 'rejected', score: 1, reasoning: 'Missing code or tests.' }
}
const tmpDir = `/tmp/eval-${Date.now()}`
try {
await mkdir(tmpDir, { recursive: true })
await writeFile(`${tmpDir}/solution.ts`, code)
await writeFile(`${tmpDir}/solution.test.ts`, tests)
const { stdout } = await execAsync(
`cd ${tmpDir} && npx vitest run --reporter=json`
)
const results = JSON.parse(stdout)
const passed = results.numPassedTests
const total = results.numTotalTests
return {
verdict: passed === total ? 'approved' : 'rejected',
score: Math.max(1, Math.round((passed / total) * 5)),
reasoning: `${passed}/${total} tests passed.`,
}
} finally {
await rm(tmpDir, { recursive: true, force: true })
}
},
})
await evaluator.listen({ port: 3004 })Every evaluation verdict is cryptographically signed. The SDK handles this automatically — you do not need to sign anything in your evaluateFn.
Here is what happens internally after your function returns:
{
contractId,
verdict, // "approved" or "rejected"
score, // 1-5
reasoning, // from your evaluateFn
deliverableHash, // SHA-256 of the deliverable
evaluatorDid, // your evaluator's DID
evaluatedAt // ISO 8601 timestamp
}proof (128-char hex Ed25519 signature) is included in the responseAll signed fields are returned in the response so the proof is independently verifiable. Anyone with the evaluator's public key (from their DID document) can verify the verdict was not tampered with.
If the evaluateFn throws an error, the SDK catches it and returns a signed rejection with score 1 and the error message as reasoning. The proof still covers all fields, so even error verdicts are verifiable.
When an evaluator rejects a deliverable, the following happens:
verdict: 'rejected' with score, reasoning, and signed proofsettle message to the escrow agent with evaluationVerdict: 'rejected' and the evaluator's evaluationProofThe evaluator's own reputation is at stake. If it rejects unfairly, both buyer and seller can give it low ratings, and its trust score drops. The market migrates to honest evaluators over time (Praxeology: Competition as Discovery).
import { createEvaluatorAgent, generateKeyPair } from '@dan-protocol/sdk'
// A translation quality evaluator
const evaluator = createEvaluatorAgent({
domain: 'translation-eval.example.com',
name: 'Translation Quality Evaluator',
keyPair: generateKeyPair(),
evaluationFee: 3,
currency: 'USD',
evaluateFn: async ({ originalInput, contractTerms, deliverable }) => {
const sourceText = originalInput.text as string
const targetLang = originalInput.targetLang as string
const translated = deliverable.translated as string
// Basic sanity checks
if (!translated || translated.trim().length === 0) {
return { verdict: 'rejected', score: 1, reasoning: 'Empty translation.' }
}
if (translated === sourceText) {
return {
verdict: 'rejected',
score: 1,
reasoning: 'Translation is identical to source text.',
}
}
// Length ratio check (translations are typically 0.5x-2x source length)
const ratio = translated.length / sourceText.length
if (ratio < 0.2 || ratio > 5) {
return {
verdict: 'rejected',
score: 2,
reasoning: `Suspicious length ratio: ${ratio.toFixed(2)}x. Expected 0.5x-2x.`,
}
}
// Passed basic checks — approve with score based on detail
const score = translated.length > sourceText.length * 0.8 ? 4 : 3
return {
verdict: 'approved',
score,
reasoning: `Translation to ${targetLang} accepted. Length ratio ${ratio.toFixed(2)}x is within normal range.`,
}
},
})
// The evaluator is a regular CommerceAgent — listen on a port
await evaluator.listen({ port: 3003 })
console.log('Evaluator live at', evaluator.commerceEndpoint)
console.log('DID:', evaluator.did)
// Graceful shutdown
process.on('SIGINT', async () => {
await evaluator.close()
process.exit(0)
})