WikoriDocsIngesting Content

Ingesting Content

Every path to adding knowledge to your vault — files, URLs, web crawling, email, quick notes, and direct text.

How Ingestion Works

No matter how content enters Wikori, it follows the same pipeline:

1
Content lands in INGEST/

Whether you drop a file, paste a URL, or write a quick note — it always ends up as a .md file in your vault's INGEST/ folder.

2
Office files are extracted first

PDF, DOCX, and XLSX files are converted to Markdown automatically before being queued. The original file is moved to a files/ subdirectory for safekeeping.

3
AI enriches the content

Wikori sends the Markdown text to your AI endpoint. The model returns a structured YAML block with title, summary, entities, tags, source type, and confidence score.

4
Enriched file lands in your vault

The YAML frontmatter is prepended to the file, and it's moved from INGEST/ to your vault root. It's now searchable, navigable, and accessible to AI agents via MCP.

From Files

The simplest way to ingest content is to drop a file directly into the INGEST/ folder. Wikori's file watcher picks it up within seconds.

Two ways to do it

Via file manager: Open your vault directory in Finder (macOS), Explorer (Windows), or your Linux file manager. Drag any supported file into the INGEST/ subfolder.

Via the app: Go to the Ingest page and follow the instructions in the "From Files" section — it shows the exact path to your current vault's INGEST folder.

Supported formats

TypeExtensionsProcessing
DocumentsPDFDOCXXLSXText extracted to Markdown, original preserved in files/
Office (basic)PPTXODTODSODPTXTCSVProcessed as-is or with basic extraction
NotesMDProcessed directly — no conversion needed
ImagesPNGJPGJPEGWEBPTIFFBMPAI vision model analyzes content, generates description

Only .md files (and those that get converted to .md) are processed by the AI pipeline. Other file types are moved to the vault but won't receive YAML enrichment unless they are converted first.

From URLs & YouTube

Wikori can scrape any public web page and extract YouTube transcripts. Go to IngestFrom URLs.

Adding URLs

Paste one or more URLs into the input field and click Add to Queue. URLs are added to a queue — you can add many before starting the pipeline.

YouTube URLs

Paste any YouTube video URL (e.g. https://youtube.com/watch?v=...). Wikori automatically detects it, fetches the full transcript, and saves it as a Markdown file — no audio processing required.

Running the pipeline

1
Add URLs to the queue

The queue shows a count of pending URLs.

2
Click Start Pipeline

Wikori scrapes pages sequentially with polite delays to avoid triggering anti-bot measures.

3
Monitor progress

Processed URLs disappear from the queue. Each becomes a .md file in INGEST/ and goes through AI enrichment.

4
Stop or clear

Click Stop Pipeline at any time. Use Clear Queue to remove pending URLs without processing them.

Test the connection first with Test Connection to ensure your AI endpoint can be reached before starting a large batch.

Web Crawler

The Web Crawler takes a single seed URL and automatically discovers every linked page — then lets you preview, curate, and send the ones you want into the scraping pipeline. Go to IngestWeb Crawler.

Perfect for ingesting entire documentation sites, knowledge bases, blog archives, or competitor resource pages without pasting URLs one by one.

Step-by-step

1
Create a Crawler Profile

Click New Profile. Each profile stores a complete crawl configuration — seed URL, mode, depth, scope, and URL cap. You can save up to 50 profiles per vault.

2
Configure the crawl

Set your seed URL, choose a Crawl Mode, configure depth and scope (see tables below), and set a Max URLs cap to prevent runaway discovery.

3
Click Save & Preview

Discovery starts immediately. A live counter shows pages visited, links found, and links discarded in real time. Hit the red Stop button at any point — you'll still get the partial results.

4
Curate the URL list

Every discovered URL appears in a checklist. Check or uncheck individual URLs, or use batch-select to take everything or nothing. Remove login pages, sign-up flows, or irrelevant sections before proceeding.

5
Add to Pipeline

Click Add to Pipeline. Selected URLs join the URL scraping queue — each page is scraped, converted to Markdown, and AI-enriched like any other web page.

Crawl Modes

ModeBehavior
Single URLScrapes only the seed URL — no link discovery. Equivalent to pasting a URL into the URL pipeline directly.
Auto-discoverFollows links breadth-first from the seed page to the configured depth. This is the full crawler mode.

Profile Settings

SettingDescriptionNotes
Seed URLThe starting pagee.g. https://docs.example.com/
DepthHow many link-hops to follow (1–5)Depth > 3 shows a warning — can discover thousands of URLs
Domain ScopeWhich domains links are followed onSee domain scopes table below
Max URLsDiscovery stops after this many URLs (1–1000)Always set a cap when exploring unfamiliar sites
Ignore Query StringsTreat URLs that differ only by query params as the same pageReduces duplicates on paginated sites

Domain Scopes

ScopeBehavior
Same domainOnly follow links on the same host as the seed URL. Most common choice.
Same path prefixSame host, and URLs must start with the seed's path — useful for scoping to a subsection (e.g. /docs/ only).
WhitelistFollow links only on domains you explicitly specify.
No restrictionFollow any link anywhere. Requires a confirmation checkbox — use with a low depth and URL cap.

Smart filtering & behavior

The crawler automatically skips file types that shouldn't be scraped as web pages (images, CSS, JS, archives), ignores non-HTTP schemes, and deduplicates fragment-only links (#section). It follows redirects and uses the final resolved URL for deduplication — so http://example.com and https://example.com won't create duplicates.

Requests include a randomized 200ms–1s delay between pages to be polite to target servers.

Profile management

Profiles can be created, edited, duplicated, and deleted. Duplicating a profile is useful when you want to try a deeper crawl of the same site without overwriting your working configuration. The last 200 crawl runs are stored per vault, so you can review past discovery results even after closing the app.

Best practice: Start with depth 1–2 and Same domain scope. Review the URL list before adding to the pipeline. Use the Self-Care routine to re-run profiles on a schedule so your vault stays current when sites update.

From Email (IMAP)

Wikori can monitor an email inbox and automatically convert incoming messages into knowledge entries. Configure this in SettingsEmail Ingestion.

Important: Wikori downloads and deletes matching emails from the mailbox. We strongly recommend using a dedicated mailbox or email alias, not your main inbox.

Configuration

FieldDescription
IMAP HostYour mail server address (e.g. imap.gmail.com)
PortUsually 993 for SSL
UsernameYour email address
PasswordApp password or IMAP password — encrypted on save
Trusted SendersComma-separated list of email addresses. Only emails from these addresses are processed.
Vault RoutingEmails are routed to the vault whose email tag matches a tag in the subject line. Untagged emails go to the active vault.

Running the pipeline

After saving your settings, click Start Pipeline to begin monitoring. Wikori checks the inbox periodically and processes any new emails from trusted senders. You can also click Check Now for an immediate poll.

Quick Notes Overlay

Quick Notes is a floating window that appears over any app — Figma, VS Code, a browser, a Zoom call — so you can capture thoughts without breaking your flow.

Opening the overlay

OSHotkeyNotes
macOS + (both Option keys)Requires Accessibility permission
Windows / LinuxAlt + Alt (both Alt keys)

Writing and routing notes

1
Type your note

The overlay opens ready to type. Notes auto-save every 5 seconds and when you close the window.

2
Select a vault tag

The bottom bar shows a tag for each vault (e.g. #startup, #research). Click one to route this note to that vault. The tag turns bold and uppercase when selected.

3
Stack multiple notes

Navigate between notes with / (macOS) or CtrlAlt / (Windows/Linux). Each note can be routed to a different vault.

4
Send all notes at once

Click the small dot button at the bottom-right of the tag bar. Wikori routes each tagged note to its vault's INGEST folder. The button flashes green to confirm. Untagged notes remain in temp storage.

The overlay is draggable, resizable, and always stays on top of other windows. Its position and size are remembered across app restarts. Press Esc to dismiss it — it saves your note before closing.

From Direct Text

For quick notes that you want to type directly in the app without using the overlay, go to IngestFrom Text.

Enter a filename (e.g. meeting-notes-2026-05-18) and type or paste your content. Click Save to INGEST and the file is written directly to your vault's INGEST folder and queued for AI processing.

Monitoring Ingest Progress

Switch to the Status page at any time to see what's happening:

SectionWhat it shows
Vault StatusActive vault path, file watcher state, and total indexed count
Queue → UnindexedFiles in vault root not yet in the knowledge index. Click Process Unindexed to queue them.
Queue → Ingest FilesFiles currently waiting in INGEST/ plus active processing count
Queue → Failed FilesFiles that failed AI processing — retry individually or in bulk