Securityandgovernance
forsystemsthatact

Canada's specialist consultancy for agentic AI security, governance, and production architecture.

We help organizations assess, govern, and architect LLM and agentic systems before they are trusted with tools, data, workflows, or decisions.

Request an assessment

01— Core Engagements

01Assessment

Assessment

Attack surface review

Agentic systems security assessments

Threat models, adversarial testing, privilege reviews, prompt-injection paths, tool misuse scenarios, memory risk, RAG exposure, and incident-ready findings for systems that can take action.

02Governance

Governance

Operating control design

AI governance operating models

Inventory, risk classification, approval workflows, policy, vendor review, board reporting, human oversight, and accountability structures that fit real teams instead of shelfware.

03Guardrails

Guardrails

Runtime boundary design

Privacy and guardrail architecture

Data boundaries, PII handling, context controls, retention rules, eval gates, refusal behavior, monitoring, escalation paths, and human review for high-consequence AI workflows.

04Systems

Systems

Production architecture

Harness and AI systems architecture

Evaluation harnesses, simulated environments, agent orchestration, identity, permissions, observability, scalable RAG, and production patterns for LLM systems that need to survive contact with users.

02— Why It Matters

Agentic AI is not a chatbot upgrade.

Generative AI produces outputs for people to review. Agentic AI acts across tools, data, and workflows. That shift changes the security model: access control, observability, reversibility, governance, and architecture become first-order concerns.

Risk classFailure modeWhat changes with agents

Privilege

Over-broad agent authority

Agents often inherit broad credentials, service accounts, or human-like access. A single compromised workflow can move across systems faster than manual review can follow.

Tool abuse

Prompt-to-action escalation

Prompt injection becomes more serious when the model can call tools, modify records, send messages, execute code, or trigger downstream automations.

Data boundary

Context, memory, and retrieval leakage

RAG, memory, logs, traces, and context windows can expose sensitive information unless privacy and retention controls are designed into the architecture.

Accountability

Unclear evidence and reversibility

When agents plan, retry, delegate, and act autonomously, teams need evidence of what happened, why it happened, who approved it, and how it can be reversed.

03— Deliverables

Governance has to survive production.

Policies alone do not secure agentic systems. We connect board expectations, privacy obligations, security controls, and engineering reality so the operating model can be audited, tested, and improved as the system changes.

Assessment packet

What you leave with is evidence, not ceremony.

Finding

Agentic risk register

A prioritized register of AI-specific risks across agents, tools, permissions, prompts, retrieval, memory, data flows, vendors, and operational controls.

Evidence

Control gap analysis

A practical comparison between the controls your deployment needs and the controls actually present in engineering, security, privacy, and governance workflows.

Briefing

Executive and board briefing

Clear risk language for decision-makers: what the system can do, what can go wrong, what is being controlled, and what should not be deployed yet.

Roadmap

Implementation roadmap

Concrete architecture, governance, eval, monitoring, policy, and remediation steps that your teams can execute without translating a generic framework.

04— Method

“Agentic systems should earn autonomy through evidence, controls, and reversible operations.”

We work at the boundary between security assessment, governance design, and production engineering. The goal is not to slow AI adoption. The goal is to make autonomy explicit, measured, permissioned, and accountable.

Our engagements produce board-readable risk language and implementation details your technical teams can actually ship.

PhaseActionWorkOutput

Discovery

Map

Trace the agents, tools, data stores, permissions, prompts, vendors, logs, and decisions that make up the system.

System inventory

Adversarial

Break

Run adversarial scenarios against realistic workflows, including indirect prompt injection, privilege abuse, data exposure, and unsafe tool use.

Failure paths

Control design

Govern

Define ownership, approval paths, policy, evidence, review thresholds, and escalation rules that match the system's actual risk.

Operating model

Architecture

Build

Refactor the architecture, guardrails, eval harnesses, monitoring, and release gates so controls become part of how the system ships.

Remediation plan

05— Selected Work

Systems we have hardened, tested, and scaled.

We do not publish client names, system details, or exploitable findings. Technical depth can be shared privately when the conversation warrants it and confidentiality is in place.

ReferenceDomainBuiltTechnical scope

ALPINE

Evaluation infrastructure

SOTA agentic evaluation system

LLM-based judging, human-in-the-loop verification, reviewer calibration, disagreement handling, and scalable quality control for agent behavior under realistic task conditions.

LOCKSTEP

Regulated systems

Harness hardening for regulated industries

Evaluation and deployment harnesses hardened for auditability, repeatability, evidence capture, release gates, access control, and controlled production rollout.

BLACKBOX

Model behavior

Fine-tuning and jailbreak audits

Fine-tuning workflows paired with jailbreak, refusal, policy-boundary, and regression audits to measure whether model behavior improves under adversarial pressure.

SIGNAL

Guardrails

Regulated guardrails and crisis response paths

Guardrail systems for high-consequence interactions, including crisis detection, escalation logic, human review paths, privacy boundaries, and failure-mode monitoring.

KEYRING

MCP security

MCP audit and red-team pentest

Red-team level engagement across MCP servers, tool manifests, delegated permissions, token exposure, prompt-to-tool escalation, and cross-system abuse paths.

ATLAS

Governance infrastructure

Enterprise AI inventory and risk classification

Organization-wide mapping of LLM and agentic systems by data access, tool authority, model and vendor dependency, user exposure, business criticality, and required controls.

COMMONS

AI development policy

Policy system for AI-assisted builders

Practical rules for teams using coding agents and rapid prototyping workflows, including approved tool use, repository access, review gates, secrets handling, provenance, and production handoff.

06— How We Work

We are not trying to be the largest AI consultancy in the room.

ARL is built for narrow, high-consequence work where engineering judgment matters more than headcount.

Small by design

We are a very small but senior team. The people you speak with are the people doing the assessment, design, and implementation work.

Built by engineers

We are principal and staff engineers as well as senior product leaders. Recommendations are written for teams that have to ship, operate, and defend real systems.

Close to the frontier

Agentic systems, evals, harnesses, MCP, guardrails, and AI-assisted development are moving quickly. We stay near the edge because the risks are changing there first.

Selective fit

Not every business needs us. We reserve time for organizations where autonomy, access, privacy, security, or governance materially changes the risk profile.

07— Fit & Process

The fastest path is knowing whether we should work together.

We are useful when AI systems carry real authority, exposure, or operational consequence. We are not useful for every AI project.

Good fit

Agentic systems in production or close to production
LLM systems with tool access, memory, RAG, MCP, or workflow authority
Regulated data, high-consequence users, or board-visible AI risk
Teams that need security, governance, and architecture to meet in the same room

Not a fit

Basic chatbot builds
Generic AI training or prompt workshops
Marketing demos with no production risk
Low-stakes experiments that do not need senior assessment

Initial context

You send the system context, deployment stage, risk concern, and timeline.

Fit check

We review whether ARL is the right team and say clearly if we are not.

Scope

If there is a fit, we define the assessment surface, access needed, deliverables, timeline, and confidentiality terms.

Engagement

We run the work, brief decision-makers, and leave your team with findings, controls, and implementation steps.

08— Contact

Request an
assessment

Tell us what you are deploying, buying, governing, or trying to secure. We will review the context and respond with next steps.

ARL is best suited for teams working with LLM systems that have real access to data, tools, workflows, or decisions.

Every inquiry gets a response. We only take on work where we're the right fit, and we'll say so clearly when we're not.

Securityandgovernanceforsystemsthatact