Knowledge

What the agent can use, and how to prove it is working.

Overview

Knowledge is how an agent gets access to the information it should use when it works.

That includes:

  • connected systems such as Google or GitHub
  • imported document collections
  • synced knowledge feeds
  • normalized items the platform can retrieve later

It helps to think of knowledge as a pipeline, not a blob:

  • an integration connects to a provider
  • a source defines what knowledge feed to use
  • an ingestion syncs that feed
  • items are the normalized records the agent can actually draw from

That matters because you can check each step: what is connected, what was synced, and what the agent can actually reach.

The knowledge pipeline

Knowledge becomes usable in stages: connect a system, define the source, sync it, then inspect the resulting items.

Diagram showing a provider integration feeding a source, then an ingestion, then normalized knowledge items that an agent can use

A concrete example

Imagine Company A runs the underlying deployment platform for Company B.

Company A wants its Platform Support Agent to help Company B diagnose a failing integration, but only with approved material:

  • rollout runbooks
  • known retry issues
  • connector troubleshooting notes
  • past validated migration steps

The right setup is not "give the agent every document." It is:

  1. connect the approved provider or document collection
  2. define the exact source that should be searchable
  3. sync it
  4. inspect what the platform actually ingested
  5. let the agent use only that approved body of knowledge

That is how teams make knowledge useful without turning it into "give the agent every document."


The main pieces

Piece What it means
Integration The authenticated connection to a provider or workspace
Source The specific feed, collection, or scope of knowledge to sync
Ingestion The sync job that imports or refreshes knowledge
Item One normalized knowledge record the platform can retrieve later
Credential Secret material used for knowledge or browser access when needed

Two distinctions matter:

  • an integration says "we can connect to this system"
  • a source says "this is the specific knowledge stream we want from that system"

That is what keeps the setup explainable to developers and security reviewers.


Start in the portal

The developer portal is the safest place to do the first connection work:

  1. connect the outside system
  2. review scopes and ownership
  3. confirm which workspace, repository, inbox, or document collection should be used

Use the portal first when the setup involves secrets, OAuth, or shared operational review.

Then use the CLI to inspect and operate what was created.


Inspect integrations from the CLI

List the connected knowledge integrations:

archastro list contextintegrations
archastro describe contextintegration <integration_id>

This tells you:

  • which provider is connected
  • which workspace it points at
  • who owns it
  • whether the connection is still healthy

Use this when a developer asks, "Which knowledge connection is this agent actually using?"


Inspect and manage sources

Sources are the part developers usually need to reason about most.

They tell the platform which specific feed should become usable knowledge.

archastro list contextsources
archastro list contextsources --installation <installation_id>
archastro describe contextsource <source_id>

If you need to create or tune a source from the CLI:

archastro create contextsource \
  --type github_activity \
  --team-id <team_id> \
  --payload '{"repository":"company-a/platform-rollouts"}'

A source type is provider-specific. github_activity is one concrete GitHub-backed source type. In practice, teams usually create the first source through the portal, then use describe contextsource and list contextsources to inspect the exact shape before scripting more of them.

A source is where the knowledge boundary becomes concrete. It is not just "GitHub is connected." It is "this exact repository or feed is part of the approved context."


Check ingestion health

Ingestion is where many real knowledge problems show up.

If the agent is not seeing the knowledge you expected, check the ingestion state before assuming the model is wrong.

archastro list contextingestions
archastro list contextingestions --status failed
archastro list contextingestions --source <source_id>
archastro describe contextingestion <ingestion_id>

This is the debugging loop:

  1. inspect the source
  2. inspect recent ingestions
  3. confirm whether the sync succeeded
  4. only then debug the agent behavior itself

That sequence saves a lot of wasted prompt debugging.


Inspect the resulting items

Items are the normalized records the platform actually has available after ingestion.

archastro list contextitems --source <source_id>
archastro describe contextitem <item_id>

If an agent keeps missing a fact, this is where you verify whether that fact exists in the synced knowledge at all.

That is a better debugging step than guessing about prompts.


About credentials

Some knowledge flows need credentials in addition to an integration.

The CLI supports credential inspection and management:

archastro list contextcredentials
archastro describe contextcredential <credential_id>

These commands return credential metadata such as domain, owner, and last access time. They do not print raw secret values back to the terminal.

Behind that metadata, the credential fields themselves are stored encrypted at rest. Treat the CLI here as a review and maintenance surface, not as a place to spray raw secrets around your shell history.

For first setup, prefer the portal when secrets are involved. It avoids spraying sensitive values into shell history and gives a clearer review surface.

The CLI is better for:

  • inspection
  • auditing
  • controlled follow-up updates

Knowledge in cross-company work

Knowledge becomes much more important in Agent Network scenarios.

The rule is simple:

  • each company keeps its private knowledge private
  • collaboration happens in the shared thread
  • the shared thread does not imply shared private context

In practice, treat that as an explicit configuration responsibility as well as a platform boundary:

  • only attach the sources an agent truly needs
  • review those sources before the agent joins shared work
  • do not assume a shared thread should widen an agent's retrieval scope

That is what lets Company B ask Company A's agent for help without automatically widening access to Company A's full internal corpus.

Use Agent Network when the knowledge boundary needs to hold across company lines.


Good knowledge posture

Good knowledge setups usually follow five rules:

  1. connect only the systems that help the agent do its actual job
  2. keep each source narrow and intentional
  3. inspect ingestion health before debugging model behavior
  4. review items and ownership when results look wrong
  5. avoid mixing company-private knowledge into shared collaboration spaces

This is one of the clearest places where ArchAstro can feel either trustworthy or sloppy. The difference is whether developers can explain the knowledge path in plain language.


Where to go next

  1. Read Agents for the full runtime model.
  2. Read Installations for the broader attachment lifecycle.
  3. Read Tools if the agent also needs to act, not just read.
  4. Read Agent Network for cross-company knowledge boundaries.