WikoriDocsIngesting Content

Ingesting Content

Every path to adding knowledge to your vault — files, URLs, web crawling, email, quick notes, and direct text.

How Ingestion Works

No matter how content enters Wikori, it follows the same pipeline:

1
Content lands in INGEST/

Whether you drop a file, paste a URL, or write a quick note — it always ends up as a .md file in your vault's INGEST/ folder.

2
Office files are extracted first

PDF, DOCX, and XLSX files are converted to Markdown automatically before being queued. The original file is moved to a files/ subdirectory for safekeeping.

3
AI enriches the content

Wikori sends the Markdown text to your AI endpoint. The model returns a structured YAML block with title, summary, entities, tags, source type, and confidence score.

4
Enriched file lands in your vault

The YAML frontmatter is prepended to the file, and it's moved from INGEST/ to your vault root. It's now searchable, navigable, and accessible to AI agents via MCP.

From Files

The simplest way to ingest content is to drop a file directly into the INGEST/ folder. Wikori's file watcher picks it up within seconds.

Two ways to do it

Via file manager: Open your vault directory in Finder (macOS), Explorer (Windows), or your Linux file manager. Drag any supported file into the INGEST/ subfolder.

Via the app: Go to the Ingest page and follow the instructions in the "From Files" section — it shows the exact path to your current vault's INGEST folder.

Supported formats

TypeExtensionsProcessing
DocumentsPDFDOCXXLSXText extracted to Markdown, original preserved in files/
Office (basic)PPTXODTODSODPTXTCSVProcessed as-is or with basic extraction
NotesMDProcessed directly — no conversion needed
ImagesPNGJPGJPEGWEBPTIFFBMPAI vision model analyzes content, generates description

Only .md files (and those that get converted to .md) are processed by the AI pipeline. Other file types are moved to the vault but won't receive YAML enrichment unless they are converted first.

From URLs & YouTube

Wikori can scrape any public web page and extract YouTube transcripts. Go to IngestFrom URLs.

Adding URLs

Paste one or more URLs into the input field and click Add to Queue. URLs are added to a queue — you can add many before starting the pipeline.

YouTube URLs

Paste any YouTube video URL (e.g. https://youtube.com/watch?v=...). Wikori automatically detects it, fetches the full transcript, and saves it as a Markdown file — no audio processing required.

Running the pipeline

1
Add URLs to the queue

The queue shows a count of pending URLs.

2
Click Start Pipeline

Wikori scrapes pages sequentially with polite delays to avoid triggering anti-bot measures.

3
Monitor progress

Processed URLs disappear from the queue. Each becomes a .md file in INGEST/ and goes through AI enrichment.

4
Stop or clear

Click Stop Pipeline at any time. Use Clear Queue to remove pending URLs without processing them.

Test the connection first with Test Connection to ensure your AI endpoint can be reached before starting a large batch.

Web Crawler

The Web Crawler takes a single starting URL and automatically discovers every linked page — then lets you review, curate, and ingest the ones you want. Go to IngestWeb Crawler.

This is ideal for ingesting entire documentation sites, knowledge bases, blog archives, or competitor resource sections in one operation — instead of pasting URLs one by one.

How it works

1
Set a seed URL

Enter the starting page. The crawler will follow all links it finds on that page, then follow links on those pages, and so on — a breadth-first search.

2
Configure the crawl

Set the depth (how many levels of links to follow), domain scope (stay on the same domain or allow external links), maximum number of URLs to discover, and optional filters to include or exclude URL patterns.

3
Start discovery

Click Start Crawl. Wikori shows live progress — pages visited, links found, links discarded by your filters. You can cancel at any time and still review partial results.

4
Review and curate

The discovered URLs appear in a checklist. Select or deselect individual URLs, or batch-select all. This is your chance to remove irrelevant pages (login screens, terms of service, etc.) before processing.

5
Send to pipeline

Click Add to URL Pipeline. The selected URLs are fed into the same URL scraping pipeline used by single-URL ingestion — each page gets scraped, converted to Markdown, and AI-enriched.

Crawler profiles

If you crawl the same sites regularly (e.g., a vendor's documentation that updates monthly), save a crawler profile — a reusable configuration with the seed URL, depth, scope, and filters pre-set. Select a profile next time and click Start instead of reconfiguring.

SettingDescriptionExample
Seed URLThe starting pagehttps://docs.example.com
DepthHow many link levels to follow2 (seed → linked pages → their linked pages)
Domain scopeStay on same domain or allow externalSame domain only
Max URLsStop after this many discovered URLs100
FiltersInclude or exclude URL patternsExclude /login, /signup, .pdf

Pro tip: Start with a low depth (1–2) and a tight domain scope to get a feel for the site's structure. You can always run a deeper crawl later. The curation step means you never accidentally ingest hundreds of irrelevant pages.

From Email (IMAP)

Wikori can monitor an email inbox and automatically convert incoming messages into knowledge entries. Configure this in SettingsEmail Ingestion.

Important: Wikori downloads and deletes matching emails from the mailbox. We strongly recommend using a dedicated mailbox or email alias, not your main inbox.

Configuration

FieldDescription
IMAP HostYour mail server address (e.g. imap.gmail.com)
PortUsually 993 for SSL
UsernameYour email address
PasswordApp password or IMAP password — encrypted on save
Trusted SendersComma-separated list of email addresses. Only emails from these addresses are processed.
Vault RoutingEmails are routed to the vault whose email tag matches a tag in the subject line. Untagged emails go to the active vault.

Running the pipeline

After saving your settings, click Start Pipeline to begin monitoring. Wikori checks the inbox periodically and processes any new emails from trusted senders. You can also click Check Now for an immediate poll.

Quick Notes Overlay

Quick Notes is a floating window that appears over any app — Figma, VS Code, a browser, a Zoom call — so you can capture thoughts without breaking your flow.

Opening the overlay

OSHotkeyNotes
macOS + (both Option keys)Requires Accessibility permission
Windows / LinuxAlt + Alt (both Alt keys)

Writing and routing notes

1
Type your note

The overlay opens ready to type. Notes auto-save every 5 seconds and when you close the window.

2
Select a vault tag

The bottom bar shows a tag for each vault (e.g. #startup, #research). Click one to route this note to that vault. The tag turns bold and uppercase when selected.

3
Stack multiple notes

Navigate between notes with / (macOS) or CtrlAlt / (Windows/Linux). Each note can be routed to a different vault.

4
Send all notes at once

Click the small dot button at the bottom-right of the tag bar. Wikori routes each tagged note to its vault's INGEST folder. The button flashes green to confirm. Untagged notes remain in temp storage.

The overlay is draggable, resizable, and always stays on top of other windows. Its position and size are remembered across app restarts. Press Esc to dismiss it — it saves your note before closing.

From Direct Text

For quick notes that you want to type directly in the app without using the overlay, go to IngestFrom Text.

Enter a filename (e.g. meeting-notes-2026-05-18) and type or paste your content. Click Save to INGEST and the file is written directly to your vault's INGEST folder and queued for AI processing.

Monitoring Ingest Progress

Switch to the Status page at any time to see what's happening:

SectionWhat it shows
Vault StatusActive vault path, file watcher state, and total indexed count
Queue → UnindexedFiles in vault root not yet in the knowledge index. Click Process Unindexed to queue them.
Queue → Ingest FilesFiles currently waiting in INGEST/ plus active processing count
Queue → Failed FilesFiles that failed AI processing — retry individually or in bulk