documentation
surface

PII Shield Docs

Full reference for both surfaces. The MCP plugin runs inside Claude Desktop; the CLI runs anywhere Node 22 does. Same engine, same 33 entity types, same on-disk session format. Switch the toggle above and the command reference swaps; everything else applies to both.

v2.1.0 · Live MCP server CLI Node.js 22+ npm install -g MIT
01quick install

Pick a surface, run three lines.

Both surfaces share ~/.pii_shield/: install one or both. Sessions, mappings, audit logs, and the GLiNER model are interchangeable.

MCP plugin (Claude Desktop)

# 1. Download pii-shield-v2.1.0-<platform>.mcpb from GitHub releases # https://github.com/gregmos/PII-Shield/releases # 2. In Claude Desktop: # Settings → Extensions → Advanced Settings → Install extension # Pick the .mcpb file # 3. First call opens an in-chat panel to download the GLiNER model # (~634 MB, fetched by your browser)

CLI (any terminal)

# 1. Install the npm package globally npm install -g pii-shield pii-shield --version # 2. Health check (model will report missing — fix in step 3) pii-shield doctor # 3. Download the GLiNER model (~634 MB, one-off) pii-shield install-model
First-run engine deps
On first anonymize or scan, the engine installs onnxruntime-node, @xenova/transformers, and gliner at pinned versions into ~/.pii_shield/deps/installs/<hash>/. Roughly 300 MB, 1–2 min, deterministic. Subsequent runs are instant. Everything (model, deps, mappings, audit logs) survives npm uninstall -g pii-shield.

Parity at install time

AspectMCP pluginCLI
Node versionBundled (macOS) / host (Win/Linux)Host (≥22.0.0)
Distribution.mcpb in GitHub releasesnpm install -g pii-shield
Model location~/.pii_shield/models/gliner-pii-base-v1.0/
Data directory~/.pii_shield/ (shared)
Audit log~/.pii_shield/audit/mcp_audit.log
Session formatIdentical — exports from one open in the other
02configuration

Environment variables, defaults, and where data lives.

All settings are environment variables — there is no config file. Set them in your shell, in .env, or per-command (KEY=value pii-shield …). Both MCP and CLI read the same vars from the same places.

Detection sensitivity

VariableDefaultRangeEffect
PII_MIN_SCORE0.300.0–1.0Minimum confidence for pattern recognizers (email, SSN, IBAN, …). Lower = more recall, more false positives.
PII_NER_THRESHOLD0.300.0–1.0Minimum confidence for NER detections (PERSON, ORG, LOCATION, NRP). 0.20 catches obscure names; 0.50 catches only obvious ones.
# legal contracts (default tuning) PII_MIN_SCORE=0.30 PII_NER_THRESHOLD=0.30 pii-shield anonymize contract.pdf # medical records (higher recall — miss nothing) PII_MIN_SCORE=0.20 PII_NER_THRESHOLD=0.20 pii-shield anonymize chart.pdf # news clippings (higher precision — only obvious entities) PII_NER_THRESHOLD=0.55 pii-shield anonymize article.txt

Behaviour

VariableDefaultDescription
PII_SKIP_REVIEWfalseSet true to never open the HITL panel — useful for CI / scripting.
PII_MAPPING_TTL_DAYS7Sessions older than N days are deleted on next run. Bump for long matters: PII_MAPPING_TTL_DAYS=90.
PII_WORK_DIR(unset)Default working directory for relative paths.
PII_DEBUG(unset)Set true (or pass --debug) for stack traces and full audit trail.
PII_AUDIT_STDERRCLI: false · MCP: trueMirror server logs to stderr. CLI sets false so output stays clean; --debug flips to true.
PII_QUIET(unset)Set by --quiet. Disables progress bars and summary writes.
NO_COLOR(unset)Honoured per no-color.org. Disables ANSI in terminal output.
FORCE_COLOR(unset)Force ANSI even in non-TTY contexts.

Paths (rare overrides)

VariableDefaultDescription
PII_SHIELD_DATA_DIR~/.pii_shieldRoot for all PII Shield state. Override to relocate everything (deps, model, audit, mappings) — useful for shared/networked installs.
PII_SHIELD_MAPPINGS_DIR<DATA_DIR>/mappingsOverride only the mappings dir — e.g. point to a network share so a team has shared sessions.
PII_SHIELD_MODELS_DIR<DATA_DIR>/modelsOverride only the model directory.
PII_SHIELD_MODEL_DOWNLOADS_DIR(auto)Where install-model looks for an existing gliner-pii-base-v1.0.zip (Downloads / OneDrive / Desktop / Documents are scanned by default).

Data directory layout

~/.pii_shield/ ← PII_SHIELD_DATA_DIR ├── models/ │ └── gliner-pii-base-v1.0/ ← 634 MB, survives uninstall │ ├── model.onnx │ ├── tokenizer.json │ ├── tokenizer_config.json │ ├── special_tokens_map.json │ └── gliner_config.json ├── deps/ │ └── installs/<hash>/ ← onnxruntime-node + transformers + gliner (pinned) ├── mappings/ ← PII_SHIELD_MAPPINGS_DIR (can be relocated) │ ├── <session_id>.json ← placeholder → real PII (the one secret) │ └── review_<session_id>.json ← HITL state (entities + overrides) ├── audit/ │ ├── mcp_audit.log ← every CLI command + arguments + result │ ├── ner_init.log ← NER bootstrap trace │ ├── ner_debug.log ← per-call NER detail │ └── server.log ← lifecycle + errors └── cache/ └── gliner-pii-base-v1.0.zip ← deleted after install

The mapping files are the one place real PII lives outside your source documents. They have 0o700 permissions on POSIX. Delete a mapping = no way to deanonymize that session.

03detection coverage

National IDs, tax numbers, passports — and the obvious stuff too.

17 country-specific patterns on top of generic entity detection. The fields a "find all emails" regex would never catch. Hover any chip for the patterns it covers.

EU + UK patterns
  • UK5 NIN · NHS · Passport · CRN · Driving licence
  • DE2 Tax ID · Social security
  • FR2 NIR · CNI
  • IT2 Fiscal code · VAT
  • ES2 DNI · NIE
  • CY2 TIC · National ID
  • EU2 VAT · Passport
Detection · NER
  • GLiNER zero-shot
  • person
  • org
  • location
  • NRP (nationality / religion / politics)
Detection · pattern
  • all 17 jurisdiction patterns
  • email · phone · URL · IP
  • IBAN · credit card · crypto wallet
  • US SSN · US passport · US driver licence
  • medical licence · ID doc

Full enumerated list of 33 types in §10 Entity types.

04command reference

One engine. Two surfaces.

Toggle in the page header to swap panes. MCP exposes 17 tools to Claude Desktop / Claude Code via stdio. CLI exposes 11 commands to your shell. Same engine underneath — every entity, every session format, every audit log line is identical.

17 MCP tools, four groups

Hover any tool name for a one-line description. Each tool is a single MCP function callable from Claude Desktop. Inputs validated with Zod; errors return MCP-protocol-shaped { isError: true, content: [...] } rather than throwing.

Anonymize4
anonymize_filecore
anonymize_text
anonymize_docx
anonymize_next_chunk
Review5
start_review
apply_review_overrides
get_full_anonymized_text
deanonymize_text
deanonymize_docx
Session3
export_session
import_session
get_mapping
Utility5
scan_textpreview
list_entities
find_file
resolve_path
install_model_from_download

Round-trip example

// 1. anonymize a file — original stays on disk; only placeholders cross the wire anonymize_file({ file_path: "~/contracts/nda.docx" }) { status: "success", session_id: "m9q8x4-7b5a91", entity_count: 14, output_path: ".../nda.anon.txt", docx_output_path: ".../nda.anon.docx", by_type: { PERSON: 4, ORG: 3, MONEY: 2, ... } } // 2. Claude reads output_path and drafts a memo using placeholders only // 3. restore PII into the draft, locally deanonymize_text({ text: draft, session_id: "m9q8x4-7b5a91" }) { status: "success", deanonymized_text: "...John Smith signed the NDA..." }

11 CLI commands, four groups

All commands share two global flags: -q, --quiet (suppresses progress + summary), --debug (full audit trail + stack traces). Session ids accept unique prefixes (git-style): pii-shield review 2026-04 works if exactly one session matches.

Detail by command

anonymize<files…> [-o out] [-s session] [--no-review] [--lang en] [--prefix p] [-y] [--json]

Anonymize one or more files. All files in one invocation share one session and one mapping pool — identical entities across files get the same placeholder. Supported inputs: .pdf, .docx, .txt, .md, .csv, .log, .html.

Options
FlagDefaultDescription
-o, --out <dir><input-dir>/pii_shield_<sid>/Single output directory for all files in this run.
-s, --session <id>(new session)Extend an existing session: pool + placeholders are reused.
--no-review(off — review opens)Skip HITL — write outputs and exit.
--lang <code>enLanguage hint for NER.
--prefix <p>(empty)Prepend to placeholder tags, e.g. --prefix DOC1_<DOC1_PERSON_1>.
-y, --yesAuto-confirm prompts (model download, double-anon warning).
--jsonEmit structured JSON to stdout. Implies --no-review.
Examples
pii-shield anonymize contract.pdf --no-review pii-shield anonymize matter/*.pdf matter/*.docx --out anonymized/ pii-shield anonymize new.pdf --session 2026-04-29 pii-shield anonymize NDA.docx --json --yes
Output formats
InputOutput
.docx<name>_anonymized.docx (formatting preserved) and <name>_anonymized.txt (extracted text)
.pdf<name>_anonymized.txt (PDF write-back not supported)
.txt / .md / .csv<name>_anonymized.<ext>
Exit codes

0 = OK · 1 = error · 2 = wrong arguments.

See also: MCP anonymize_file · §05 Workflows

deanonymize<file> [-s session] [-o out]

Restore real PII from placeholders. Works on any file containing placeholders that match a known session — even files an LLM authored externally, as long as the placeholders survived intact.

Options
FlagDefaultDescription
-s, --session <id>embedded → latestSession id or unique prefix. Falls through .docx metadata, then latest session on machine.
-o, --out <path><name>_restored.<ext>Explicit output path.
Session resolution priority
  1. Explicit --session <id> wins.
  2. For .docx only: read pii_shield.session_id from docProps/custom.xml (auto-embedded by anonymize).
  3. Latest session on this machine.
Examples
pii-shield deanonymize contract_anonymized.docx pii-shield deanonymize analysis.docx --session 2026-04-29_120000_ab12 pii-shield deanonymize summary.txt --session 2026-04 --out final.txt

See also: MCP deanonymize_text / deanonymize_docx

scan<file> [--json] [--lang en] [--wait-ner 30] [-y]

Detect PII without writing anything. Useful for previewing what would be anonymized, or for piping JSON into custom workflows.

Options
FlagDefaultDescription
--jsonEmit machine-readable JSON to stdout.
--lang <code>enLanguage hint.
--wait-ner <s>30Max seconds to wait for NER on cold start.
-y, --yesAuto-confirm model-download prompt.
JSON shape
{ "file": "contract.pdf", "ext": ".pdf", "bytes": 145823, "char_count": 8421, "entities": [ {"text": "John Smith", "type": "PERSON", "start": 12, "end": 22, "score": 0.93} ] }
Examples
pii-shield scan contract.pdf pii-shield scan contract.pdf --json | jq '.entities | length' pii-shield scan chart.pdf --json > entities.json

See also: MCP scan_text

verify<file> -s session [--json] [--lang en]

Re-detect PII on an anonymized file and fail if any non-placeholder entity is found. Use as a final compliance gate before sending output to an external LLM.

Options
FlagDefaultDescription
-s, --session <id>requiredSession id or unique prefix.
--jsonEmit machine-readable JSON to stdout.
--lang <code>enLanguage hint.
-y, --yesAuto-confirm model-download prompt.
How the check works
  1. Re-runs the engine over the anonymized text.
  2. For each detected entity, checks if its text is a placeholder from the session's mapping (<PERSON_1>, etc.). Placeholders are skipped.
  3. Anything else → flagged as a possible leak (offset + 60-char context printed).
Examples
pii-shield verify contract_anonymized.txt --session 2026-04-29 pii-shield verify summary.docx --session 2026-04 --json
Exit codes

0 = clean · 1 = possible leaks (or model/session error) · 2 = usage.

See also: §05 Workflows → Compliance gate

review<session-id> [-y]

Open the HITL review UI in your default browser for an existing session. Useful when you ran anonymize --no-review initially and want to review later, or after sessions import from another machine. The CLI prints a localhost URL with a 32-hex bearer token; only your machine + browser can reach it.

Options
FlagDefaultDescription
-y, --yesAuto-confirm any prompts.
Example
pii-shield review 2026-04-29_120000_ab12 # → Review URL: http://127.0.0.1:6789/?token=4b4c77cf6d12ba7d6c8d47a1c91775cc # → opens browser

After approval, any documents whose review carries non-empty overrides are re-anonymized in place against a fresh shared placeholder state — multi-doc consistency stays intact even after corrections.

See also: MCP start_review · §07 HITL review

doctor[--json]

Health check — verifies Node version, write permissions, model presence, deps cache, and NER status. Use as a CI gate before running anonymization in unattended pipelines.

Options
FlagDefaultDescription
--jsonEmit JSON to stdout.
Examples
pii-shield doctor pii-shield doctor --json

Exit code 0 if all checks pass, 1 if any fail.

install-model[--force] [-y]

Download and install the GLiNER ONNX model (~634 MB) into ~/.pii_shield/models/. If a valid gliner-pii-base-v1.0/model.onnx already exists (>100 MB), the command exits 0 without action unless --force is passed.

Options
FlagDefaultDescription
--forceReinstall over a present model.
-y, --yesSkip the download confirmation prompt.
Examples
pii-shield install-model # interactive pii-shield install-model --yes # non-interactive (CI) pii-shield install-model --force # reinstall over a present model
sessions list[--json]

Table of local sessions, newest first.

Columns
ColumnMeaning
session_idFormat YYYY-MM-DD_HHMMSS_xxxx
modifiedISO-8601 timestamp of last write
docsNumber of documents anonymized under this session
entitiesTotal placeholders in the mapping pool
Examples
pii-shield sessions list pii-shield sessions list --json
sessions show<session-id> [--json]

Detail view: every doc in the session with doc_id, source path, source SHA-256, anonymized timestamp, plus review status.

Examples
pii-shield sessions show 2026-04-29_120000_ab12 pii-shield sessions show 2026-04-29 --json # short prefix
sessions find<path> [--json]

Find which session(s) include a given source file. Linear scan over all sessions in ~/.pii_shield/mappings/.

Examples
pii-shield sessions find ~/Documents/contract.pdf # → Found 1 session(s) including /home/user/Documents/contract.pdf: # → 2026-04-29_120000_ab12 doc_id=lkhc91b-3e2f4a 2026-04-29T12:00:14.123Z pii-shield sessions find /tmp/missing.txt --json # → [] (exit 1)

Exit 0 if at least one session matches, exit 1 if none.

sessions export<session-id> -o out [-p passphrase]

Export an encrypted archive (.pii-session) for hand-off to a colleague. AES-256-GCM with scrypt key derivation (N=16384, r=8, p=1). Wrong passphrase fails loudly — never silent corruption.

Options
FlagDefaultDescription
-o, --out <path>requiredOutput file path.
-p, --passphrase <p>(prompts)Passphrase. If omitted, the CLI prompts (input is masked).
Archive contents
  • manifest.json — version + integrity hash
  • mapping.json — placeholder → real-PII mapping + per-doc metadata
  • review.json — HITL review state (if any)
Example
pii-shield sessions export 2026-04-29_120000_ab12 \ --passphrase "correct horse battery staple" \ --out contract-matter.pii-session

See also: MCP export_session · §08 Cross-machine handoff

sessions import<archive> [-p passphrase] [--overwrite]

Decrypt and persist a session archive locally.

Options
FlagDefaultDescription
-p, --passphrase <p>(prompts)Passphrase. Prompts (masked) if omitted.
--overwriteReplace an existing session of the same id.
Examples
pii-shield sessions import contract-matter.pii-session \ --passphrase "correct horse battery staple" pii-shield sessions import a.pii-session --overwrite

See also: MCP import_session

Parity at a glance

MCP toolCLI command
anonymize_filepii-shield anonymize
anonymize_textpii-shield anonymize -
deanonymize_text / _docxpii-shield deanonymize
scan_textpii-shield scan
start_reviewpii-shield review
export_sessionpii-shield sessions export
import_sessionpii-shield sessions import
list_entitiespii-shield sessions list / show
05workflows

Five flows the engine is built for.

Each works on either surface — sessions exported from one open in the other, with the same mapping format on disk. Pick the one that matches your tooling.

1. Anonymize → external LLM → deanonymize

The headline use case. Send placeholders to any LLM, restore real data on your machine. Works with ChatGPT, Gemini, Claude, local Llama / Mistral, internal corporate gateways — anything that accepts text.

# 1. Anonymize. Note the session id. pii-shield anonymize NDA.docx --no-review # → Session: 2026-04-29_120000_ab12 # 2. Take NDA_anonymized.docx to ChatGPT / Gemini / Claude / DeepSeek. # Ask: "Summarise this NDA. Keep tokens like <PERSON_1> verbatim." # Save the response as summary.docx. # 3. Restore PII in the LLM output. pii-shield deanonymize summary.docx --session 2026-04-29_120000_ab12 # → summary_restored.docx with real names back
Prompt template that survives token round-trips
You are reviewing an anonymized contract. Tokens of the form <PERSON_1>, <ORG_3>, <EMAIL_ADDRESS_2> are anonymous placeholders. Keep them VERBATIM in your output — do not rephrase, translate, hyphenate, or split them. Treat them as opaque proper nouns.

2. Multi-file matter (one session, many docs)

In a single anonymize invocation, all files share one session and one mapping pool. The same entity gets the same placeholder in every file, so cross-doc references stay coherent.

pii-shield anonymize matter/contract.pdf matter/sow.pdf matter/nda.pdf matter/*.docx # → ONE session_id covers all files # → "Acme Corp" in any file → <ORG_1> in every file # → "John Smith" → <PERSON_1> consistently # → bulk-mode review panel opens with one tab per file

Variant suffixes (a, b, …) come from family-based dedup: longest form wins as canonical, shorter / variant spellings get suffixed family numbers. Acme Corp<ORG_1>, Acme Corp.<ORG_1>, Acme Corporation<ORG_1a>, Acme<ORG_1b>. Every variant maps back verbatim on deanonymize.

3. Compliance gate in CI

Use verify as a fail-loud step in your pipeline. Re-runs the engine on the anonymized output; if any non-placeholder PII is detected, exit 1 and the pipeline stops.

# bash, GitHub Actions / GitLab CI / Jenkins pii-shield anonymize "$INPUT" --no-review --json > result.json SID=$(jq -r .session_id result.json) OUT=$(jq -r '.results[0].output_path' result.json) pii-shield verify "$OUT" --session "$SID" # exit 0 → safe to send to external LLM # exit 1 → leak detected; pipeline stops

4. Cross-machine handoff (encrypted)

A colleague needs to deanonymize the LLM output, but the anonymization happened on your machine. Send them an encrypted archive — PII never crosses the wire in plain form. Pass the passphrase out-of-band (Signal, phone, separate email) — never in the same channel as the file.

# Your machine pii-shield sessions export 2026-04-29_120000_ab12 \ --passphrase "<some long passphrase>" \ --out matter-1234.pii-session # Their machine pii-shield sessions import matter-1234.pii-session \ --passphrase "<the same passphrase>" pii-shield deanonymize summary.docx --session 2026-04-29_120000_ab12

5. Re-anonymize with HITL corrections

The HITL review lets you remove false positives and add missed entities. After approval, the engine re-anonymizes affected files in place against a fresh shared placeholder state — multi-doc consistency is preserved even after corrections.

pii-shield anonymize contracts/*.pdf # → browser opens; click some entities to remove, # select text to add as missed PII, click Approve # → CLI prints "Re-anonymized 3 doc(s) with corrections" + final paths # Reopen review later (e.g., browser closed, or after sessions import) pii-shield review 2026-04-29_120000_ab12
06audit & compliance

Three logs, all on disk. No telemetry.

Everything PII Shield does is recorded locally. There is no telemetry endpoint to disable. If your DPO asks "what did the tool do with that file at 14:02 UTC," the answer is on your filesystem.

What leaves your machine

If you use either surface alone, nothing leaves your machine. The engine reads files locally, runs NER + pattern matching on CPU only, writes anonymized outputs locally, and (for CLI HITL review) opens a server bound to 127.0.0.1 only.

The only outbound networking happens during install-model — downloads from github.com/gregmos/PII-Shield/releases/. Everything else is local.

Audit log format

2026-04-29T12:00:01.234Z >>> CALL anonymize_text_cli({"file":"/x/contract.pdf","char_count":8421,"session_id":""}) 2026-04-29T12:00:14.123Z <<< RESP anonymize_text_cli -> {"anonymized":"... <ORG_1> ..."} ... [127 chars total]

PII is truncated at 200 chars in the log. The full PII is only in mappings/<session>.json (mode 0o700). To prove no PII left the machine, read the audit log: every external operation is recorded. There are no hidden side channels.

Trust boundary

ComponentTrust level
Source documentsTrusted
~/.pii_shield/mappings/<sid>.jsonHighly sensitive — contains real PII
<file>_anonymized.<ext> outputsSafe to send to LLMs / colleagues
.pii-session export archivesEncrypted; safe IF passphrase is strong
~/.pii_shield/audit/*.logMostly safe — file paths only, no real PII
~/.pii_shield/models/, deps/Public — same content as the npm package + GitHub release

Audit log rotation

The log grows append-only. There is no built-in rotation in 2.0.x — wipe manually for compliance, or schedule rotation via your OS (logrotate on Linux, Task Scheduler on Windows).

# keep the directory; new lines start fresh rm ~/.pii_shield/audit/*.log

Encrypted handoff details

Session archives use AES-256-GCM with a key derived via scrypt from the passphrase (parameters: N=16384, r=8, p=1). Archive format is versioned (PII1 magic, version byte 0x01) so future PII Shield versions stay backward-compatible. Wrong passphrase = loud decryption failure (no silent corruption).

07hitl review

Add what was missed. Remove false positives.

A pre-flight panel that surfaces every detected entity before placeholders ship to any LLM. MCP renders it inside Claude Desktop as an iframe; CLI renders it in your default browser on a localhost port.

What you can do in the panel

MCP vs CLI rendering

AspectMCPCLI
Where it rendersInside Claude Desktop (MCP Apps iframe)Default browser, localhost
Triggered bystart_review tool callEnd of anonymize, or review <sid>
Network bindingn/a (in-process)127.0.0.1:6789 (auto 6789–6799 if port busy)
Authn/a32-hex bearer token in URL only, never on disk; origin-checked on every POST
Idle timeoutn/a30 minutes (browser heartbeats every 30 s)

Bulk mode (multi-doc)

When the session has ≥2 documents, the panel shows tabs at the top — one per document. Each doc keeps independent removed / added state. Approve once per tab; when all tabs are approved, the success overlay appears and the CLI continues. Adding an entity in one tab propagates word-boundary matches across all other tabs (case-insensitive). Removing is per-tab.

Keyboard shortcuts

KeyAction
Tab / Shift+TabNext / previous entity
DRemove the focused entity
RRestore the focused entity (undo remove)
AApprove the document
/Focus the entity filter
EscClear focus / close dropdowns

When NOT to use HITL

08cross-machine handoff

Move a session safely between machines.

A colleague needs to deanonymize the LLM output, but the anonymization happened on your machine. The .pii-session archive packs the mapping + review state, encrypted with a passphrase. PII never transits in plain form. Both surfaces produce and consume the same archive format.

Step-by-step

# 1. Source machine — export pii-shield sessions export 2026-04-29_120000_ab12 \ --passphrase "<long passphrase>" \ --out matter-1234.pii-session # → matter-1234.pii-session (a few KB to a few MB) # 2. Transfer the file via any channel. # Pass the passphrase out-of-band (Signal, phone, # separate email) — NEVER in the same channel as the file. # 3. Destination machine — import pii-shield sessions import matter-1234.pii-session \ --passphrase "<same passphrase>" # 4. Use it as if it were created locally pii-shield deanonymize summary.docx --session 2026-04-29_120000_ab12 pii-shield review 2026-04-29_120000_ab12 # reopen HITL

Cross-surface handoff

Sessions exported by the CLI open in MCP, and vice versa. The mapping format is version-locked at the file level, not the surface. A typical mixed flow:

What the archive contains

Crypto: AES-256-GCM, key derived via scrypt (N=16384, r=8, p=1). Wrong passphrase fails loudly with wrong passphrase or corrupted archive — no silent corruption.

09python / scripting

Every command speaks JSON.

There is no separate Python SDK — the CLI is the integration surface. Every read-side command supports --json; exit codes are stable; subprocess spawn is the integration pattern. For high-throughput pipelines, the same backend exposes MCP via stdio.

JSON outputs by command

Command--json availableTop-level keys
anonymizesession_id, entity_count, pool_size, ner_ready, results[]
deanonymize(path printed on stdout)
scanfile, bytes, char_count, entities[]
verifyok, leaks_found, leaks[]
sessions listarray of {session_id, modified, docs, entities}
sessions showsession_id, entity_count, documents[], review
sessions findarray of {session_id, doc_id, source_path, anonymized_at}
doctorversion, node_version, checks[], ok

--json on anonymize implies --no-review (no browser inside a non-interactive script). Exit codes: 0 = success, 1 = error / leak detected / 0 hits (for find), 2 = wrong arguments.

Minimal Python wrapper

import subprocess, json, pathlib class PiiShield: def __init__(self, binary: str = "pii-shield"): self.bin = binary def _run(self, *args, check_zero: bool = True) -> str: r = subprocess.run([self.bin, *args], capture_output=True, text=True) if check_zero and r.returncode != 0: raise RuntimeError(f"pii-shield {' '.join(args)} → {r.returncode}: {r.stderr}") return r.stdout def doctor(self) -> dict: return json.loads(self._run("doctor", "--json", check_zero=False)) def scan(self, path: str) -> dict: return json.loads(self._run("scan", path, "--json")) def anonymize(self, *paths: str, session: str | None = None, out: str | None = None) -> dict: args = ["anonymize", *paths, "--json", "--yes"] if session: args += ["--session", session] if out: args += ["--out", out] return json.loads(self._run(*args)) def deanonymize(self, path: str, session: str, out: str | None = None) -> str: args = ["deanonymize", path, "--session", session] if out: args += ["--out", out] self._run(*args) return out or self._infer_restored_path(path) def verify(self, path: str, session: str) -> dict: return json.loads(self._run("verify", path, "--session", session, "--json", check_zero=False)) def session_for_file(self, path: str) -> str | None: hits = json.loads(self._run("sessions", "find", path, "--json", check_zero=False)) return hits[0]["session_id"] if hits else None @staticmethod def _infer_restored_path(input_path: str) -> str: p = pathlib.Path(input_path) return str(p.with_name(p.stem + "_restored" + p.suffix))

Common patterns

Pipeline with verification gate:

result = shield.anonymize(*input_files, out="staging/") for r in result["results"]: v = shield.verify(r["output_path"], session=result["session_id"]) if not v["ok"]: raise RuntimeError(f"Leak in {r['output_path']}: {v['leaks']}") # Safe to send result["results"] to external LLM

Idempotent re-runs (skip already-anonymized files):

sid = shield.session_for_file("contract.pdf") if sid is None: result = shield.anonymize("contract.pdf") sid = result["session_id"] else: print(f"Already anonymized in session {sid}")

Cost control before sending to a paid LLM:

scan = shield.scan(file) if any(e["type"] == "US_SSN" for e in scan["entities"]): raise SystemExit("File contains SSNs; redact manually first.") result = shield.anonymize(file)

Performance notes

10entity types

33 types, grouped by source.

Authoritative list lives in nodejs-v2/src/engine/entity-types.ts (SUPPORTED_ENTITIES). NER detections come from GLiNER zero-shot; the rest are pattern recognizers.

NER-based (4)

PERSONORGANIZATIONLOCATIONNRP

Generic patterns (9)

EMAIL_ADDRESSPHONE_NUMBERURLIP_ADDRESSID_DOCCREDIT_CARDIBAN_CODECRYPTOMEDICAL_LICENSE

United States (3)

US_SSNUS_PASSPORTUS_DRIVER_LICENSE

United Kingdom (5)

UK_NHSUK_NINUK_PASSPORTUK_CRNUK_DRIVING_LICENCE

EU-wide (2)

EU_VATEU_PASSPORT

Country-specific (10)

DE_TAX_IDDE_SOCIAL_SECURITYFR_NIRFR_CNIIT_FISCAL_CODEIT_VATES_DNIES_NIECY_TICCY_ID_CARD

List at runtime in JSON: pii-shield scan small.txt --json | jq '.entities[].type' | sort -u (assuming the sample file has at least one of each type you want to confirm).

11stack & versioning

What's underneath.

Built with
Node.js 22+ MCP (Model Context Protocol) commander.js (CLI) GLiNER onnxruntime-node @xenova/transformers DOCX (pure JS) pdf-parse AES-GCM / scrypt Zod (validation) single-port HITL server (CLI) npm package: pii-shield MIT
Tested on
Ubuntu Windows macOS Node 22 Node 24 8 test suites MCP protocol smoke
Versioning

Current release: v2.1.0 (production). v1 was Python + Presidio + spaCy and remains downloadable in gregmos/PII-Shield for legacy installs. v2.1.0 is a complete Node.js rewrite — same detection coverage, no Python.

The CLI ships from the same monorepo as the MCP server (nodejs-v2/cli/). Both surfaces share the engine in src/engine/, so version numbers stay aligned.

12limits & caveats

Things to know up front.

Performance envelope

BatchCold startWarm
1 short doc (5 KB text)2–3 min (deps + model on first ever run)1–2 s
5 docs avg 10 KB4–5 min cold~50 s
30 docs avg 10 KB6–8 min cold~5 min
1 large doc (200 KB text)3 min cold~1 min

Disable HITL (--no-review) for unattended batches. Tune PII_NER_THRESHOLD=0.4 for a 20–30% speedup if you can tolerate slightly lower recall.

13troubleshooting

Common errors. Click for the fix.

For deeper diagnostics, set PII_DEBUG=true on any command and tail ~/.pii_shield/audit/server.log live.

pii-shield: command not found

npm's global bin/ isn't on PATH. Run npm root -g to find it; add the parent bin/ to PATH.

On Windows: typically %AppData%\npm\.

[!] NER model loading... takes 1–2 min

First run installs deps. Look at ~/.pii_shield/audit/ner_init.log for progress. Subsequent runs are instant (the deps directory is cached and reused).

model.onnx missing in doctor output

Run pii-shield install-model. If you already downloaded the zip elsewhere, set PII_SHIELD_MODEL_DOWNLOADS_DIR=/path/to/dir and re-run — the installer will pick it up without re-downloading.

Unsupported model IR version: 9

Stale onnxruntime-node cached. Delete ~/.pii_shield/deps/ — the next anonymize call reinstalls the pinned 1.22.0 triplet.

Cannot find module '../build/Release/sharp-…'

Old build without the sharp shim. Update: npm install -g pii-shield@latest.

HTTP server fails — port in use

The HITL server tries 6789–6799. Close other PII Shield instances; check lsof -i :6789 (Linux/macOS) or netstat -ano | findstr 6789 (Windows).

Browser doesn't open during review

Copy the printed URL manually. On Linux/macOS, set BROWSER=firefox (or whichever) to override the OS default.

deanonymize silently leaves placeholders in the output

The LLM mangled them — common variants: lowercase (<person_1>), added spaces (<PERSON 1>), translated type (<PERSONA_1>).

Tighten the prompt: "Keep tokens VERBATIM. Do not modify case or spacing inside <…>." Or sed-replace the mangled forms before deanonymize:

sed -E 's/<person_([0-9]+)>/<PERSON_\1>/g' summary.txt > fixed.txt

session 'X' not found on import

Wrong session id. List with pii-shield sessions list.

Wrong passphrase on import

Decryption fails loudly with wrong passphrase or corrupted archive — there is no silent corruption. Recheck the passphrase; the archive is intact.

Mapping disappeared after a week

TTL cleanup. Bump PII_MAPPING_TTL_DAYS=90 for long matters. Already-deleted mappings can only be recovered if you have a .pii-session export.

First-run downloads keep restarting

Network timeouts on slow connections. Run install-model directly — single-shot download with progress bar. Or use install-model.ps1 / .sh in the repo for a curl-based fallback.

Anonymize misses obvious names

NER threshold too high. Try PII_NER_THRESHOLD=0.20. If still missed, the model genuinely doesn't see them — open an issue with a redacted example.

Anonymize too eager (false positives)

Lower recall: PII_MIN_SCORE=0.45 PII_NER_THRESHOLD=0.45. Or use the HITL panel to remove specific false positives — corrections are session-local.

Live logs

PII_DEBUG=true pii-shield <command> # prints stack traces on errors tail -f ~/.pii_shield/audit/server.log # live tool calls + lifecycle events tail -f ~/.pii_shield/audit/ner_init.log # NER bootstrap detail