# Sources

## Overview

The Sources tab has two sections: **Auto retraining frequency** at the top and the **Sources to train** list below.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2Fb32idZEjpNAg1fodKmdf%2Fimage.png?alt=media&#x26;token=650ff5de-f57c-4083-836f-a79690ca975d" alt=""><figcaption></figcaption></figure>

**Dashboard path:** [Agent > Knowledge > Sources](https://i.usejimo.com/agent/knowledge/sources)

***

## Auto retraining frequency

Controls how often Jimo automatically re-indexes all your content sources. This keeps the Agent's answers in sync with changes to your documentation.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2F6GhtjHRyEABVm5us1aJQ%2Fimage.png?alt=media&#x26;token=5ace1a27-adf4-4eb5-832c-f5a9c5521bb6" alt=""><figcaption></figcaption></figure>

Choose between three options:

* **Never**: Sources are only retrained when you manually trigger a refresh. Best for stable content that rarely changes.
* **Daily**: Jimo re-indexes all sources every day. Recommended during active product development or when your docs change frequently.
* **Weekly**: A good balance for most teams. Sources are re-indexed once a week.

You can always trigger a one-off retrain for any individual source using the refresh icon (⟳) in the source list.

***

## Sources to train

The main table lists every source you have added. Use the search bar, **Type** filter, and **Status** filter to find specific sources quickly.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2F3XE8cpjkI3Ao0XrTQWCm%2Fimage.png?alt=media&#x26;token=726a6f18-9de4-4947-8520-4d42af8636e2" alt=""><figcaption></figcaption></figure>

| Column      | Description                                                               |
| ----------- | ------------------------------------------------------------------------- |
| **Content** | The name you gave the source (or the URL/filename if no name was set).    |
| **Type**    | The source type: `Site Crawl`, `Individual URLs`, `Text`, or `File`.      |
| **Status**  | Current training state (see [Status reference](#status-reference) below). |
| **Updated** | When the source was last successfully trained.                            |

**Row actions:**

* **⟳ Refresh**: Retrain this source immediately without waiting for the next auto-retrain cycle.
* **🗑 Delete**: Permanently remove the source from your knowledge base.

Click any row to open the **Content Details** drawer, where you can inspect the indexed data chunks and metadata.

***

### Adding a source

Click **+ Add Content** (top-right of the Sources to train section) to open a dropdown with three source types: **URL**, **Text**, and **File**.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2F8WB39El9r6H27HTvZuxy%2Fimage.png?alt=media&#x26;token=996aae44-276f-4121-9a2a-26f7135457eb" alt=""><figcaption></figcaption></figure>

{% tabs %}
{% tab title="URL" %}

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FlIlpCGqGTBbw9oUvQA4n%2Fimage.png?alt=media&#x26;token=559f98ed-8da6-4fcc-801b-7472ae0d7f2d" alt="" width="337"><figcaption></figcaption></figure>

URL sources let you feed entire websites or specific pages to the Agent. Choose between two modes: **Site Crawl** and **Individual URLs**.

#### Site Crawl

Crawl all pages starting from a root URL. Jimo follows internal links and extracts text from every reachable page.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FWt5RAOKvpbYVdRiC7R6q%2Fimage.png?alt=media&#x26;token=caf07ae8-346e-478d-b3cc-91e5b32edb03" alt=""><figcaption></figcaption></figure>

**Fields:**

* **Name** *(optional)*: A label for this source (e.g. "Help Center", "Product Docs").
* **Type**: Toggle between **Site Crawl** and **Individual URLs**.
* **Start URL**: The root URL where the crawl begins (e.g. `https://help.yourapp.com`).
* **Retrieve page**: Controls which pages are included in the crawl.
  * **All pages from this starting URL** *(default)*: Follows every internal link from the start URL.
  * **Only filtered URLs**: Restricts the crawl to pages matching specific rules.

**Filtering with rules**

When you select **Only filtered URLs**, a **Rules** section appears where you can define path-based filters.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FhB1N8OElJCG7MSO8ouHk%2Fimage.png?alt=media&#x26;token=343ef671-1615-4e62-9834-32fb3cb29d1c" alt=""><figcaption></figcaption></figure>

Each rule has two parts:

* **Condition** (dropdown): `Starts with`, `Contains`, `Equals`, etc.
* **Path pattern**: The URL path segment to match (e.g. `/docs/`, `?category=help`).

You can build rules in two ways:

* **+ Add rule manually**: Add conditions one by one.
* **✨ Generate rules with AI**: Let Jimo AI analyze the start URL and suggest relevant filtering rules automatically.

> **Tip:** Use path filters to focus the crawl on your help center or product documentation and exclude marketing pages, blog posts, or legal content that could dilute the Agent's answers.

Click **Train** to start indexing. The source appears in the list with a *Training* status and flips to *Trained* once processing is complete.

#### Individual URLs

Add specific pages manually when you only need a few pages indexed, or when the pages you need are spread across different domains.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FrjJ37Ub6UtTh3QeMBywA%2Fimage.png?alt=media&#x26;token=27b59d39-f1e4-4f6e-a17a-a891c3792205" alt=""><figcaption></figcaption></figure>

**Fields:**

* **Name** *(optional)*: A label for this source.
* **Type**: Toggle to **Individual URLs**.
* **List of URLs**: Paste one URL per line. Each URL is fetched and indexed independently.

Click **Train** to start indexing.
{% endtab %}

{% tab title="Text" %}

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FbasiVjn4hd4l7ikSSBVd%2Fimage.png?alt=media&#x26;token=c4ef5c79-8fa0-4de6-993f-ee131af9b3f0" alt="" width="331"><figcaption></figcaption></figure>

Paste any plain-text or Markdown content directly. Useful for internal knowledge that is not published on a website, like release notes, internal procedures, product specifications, or code documentation.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FngK62zcmkWiibn1tC2wy%2Fimage.png?alt=media&#x26;token=042bb784-2e33-49e5-8352-6b10b3d76053" alt=""><figcaption></figcaption></figure>

**Fields:**

* **Name** *(optional)*: A label for this text source.
* **Text**: The content to index. Supports plain text and Markdown formatting.

Click **Train** to index the content.

> **Tip:** Keep each text snippet focused on a single topic. Multiple smaller, topic-specific sources perform better than one massive text dump.
> {% endtab %}

{% tab title="File" %}

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FHIplXKQVgPI9vpXRwsaZ%2Fimage.png?alt=media&#x26;token=48ab4643-fe03-47ff-aade-57e58689d734" alt="" width="334"><figcaption></figcaption></figure>

Upload documents and files for the Agent to learn from. Ideal for product manuals, PDF guides, data exports, or any structured content stored as files.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2F2dhO6MajuEkwnQmRGqq9%2Fimage.png?alt=media&#x26;token=2c65ccdc-4da4-4c98-b42c-7b5a173f8ade" alt=""><figcaption></figcaption></figure>

**Fields:**

* **Name** *(optional)*: A label for this file source.
* **File**: Click **Choose File** to upload. Supported formats: `.pdf`, `.doc`, `.docx`, `.txt`, `.csv`, `.md`, `.xls`, `.xlsx`, `.json`, `.html`, `.xml`.

Click **Train** to start processing. Large files are automatically chunked so the model can process them efficiently.
{% endtab %}
{% endtabs %}

***

#### Status reference

| Status         | Meaning                                                                                      |
| -------------- | -------------------------------------------------------------------------------------------- |
| **Trained** ✅  | Source is fully indexed and the Agent uses it to generate answers.                           |
| **Training** ⏳ | Source is being processed. The Agent cannot use it yet.                                      |
| **Failed** ⚠️  | Something went wrong during indexing. Hover the status icon for the error message and retry. |

***

### Content Details drawer

Click any source row to open its detail panel and inspect what was indexed.

<figure><img src="https://2794860263-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FKzAcDWQbK1gKbpra7bkb%2Fuploads%2FztlGpcxPvdnNJyHlxNnv%2Fimage.png?alt=media&#x26;token=8350b2bb-5292-4a19-a422-c86ab4ec6a77" alt=""><figcaption></figcaption></figure>

The drawer shows:

* **Header**: Source name, type, starting URL or filename, status, date added, and who added it.
* **Data Chunks**: For crawls, a list of every crawled path with its extracted text. For files, each section or page of the document. Expand any chunk to preview the indexed text.

{% hint style="warning" %}
If a crawl returns unexpected pages, review the data chunks and consider adding path filter rules to narrow the scope.
{% endhint %}

***

## Best practices

{% stepper %}
{% step %}

#### **Start broad, then refine.**

Begin with a full site crawl of your help center, then review data chunks and add path filters to exclude irrelevant pages.
{% endstep %}

{% step %}

#### **Name your sources clearly.**

Labels like "Help Center - Product Docs" or "API Reference v2" make the source list easier to manage as it grows.
{% endstep %}

{% step %}

#### **Use Daily retraining during active development.**

Switch to Weekly once your documentation stabilizes.
{% endstep %}

{% step %}

#### **Keep text sources focused.**

One topic per text source produces better results than a single massive text block.
{% endstep %}

{% step %}

#### **Avoid noisy content.**

Exclude marketing pages, blog posts, customer testimonials, and legal disclaimers unless users frequently ask about them.
{% endstep %}

{% step %}

#### **Monitor statuses.**

Check for Failed sources regularly, especially after changing your website structure.
{% endstep %}
{% endstepper %}
