Ingesting Content
Every path to adding knowledge to your vault — files, URLs, web crawling, email, quick notes, and direct text.
How Ingestion Works
No matter how content enters Wikori, it follows the same pipeline:
Whether you drop a file, paste a URL, or write a quick note — it always ends up as a .md file in your vault's INGEST/ folder.
PDF, DOCX, and XLSX files are converted to Markdown automatically before being queued. The original file is moved to a files/ subdirectory for safekeeping.
Wikori sends the Markdown text to your AI endpoint. The model returns a structured YAML block with title, summary, entities, tags, source type, and confidence score.
The YAML frontmatter is prepended to the file, and it's moved from INGEST/ to your vault root. It's now searchable, navigable, and accessible to AI agents via MCP.
From Files
The simplest way to ingest content is to drop a file directly into the INGEST/ folder. Wikori's file watcher picks it up within seconds.
Two ways to do it
Via file manager: Open your vault directory in Finder (macOS), Explorer (Windows), or your Linux file manager. Drag any supported file into the INGEST/ subfolder.
Via the app: Go to the Ingest page and follow the instructions in the "From Files" section — it shows the exact path to your current vault's INGEST folder.
Supported formats
| Type | Extensions | Processing |
|---|---|---|
| Documents | PDFDOCXXLSX | Text extracted to Markdown, original preserved in files/ |
| Office (basic) | PPTXODTODSODPTXTCSV | Processed as-is or with basic extraction |
| Notes | MD | Processed directly — no conversion needed |
| Images | PNGJPGJPEGWEBPTIFFBMP | AI vision model analyzes content, generates description |
Only .md files (and those that get converted to .md) are processed by the AI pipeline. Other file types are moved to the vault but won't receive YAML enrichment unless they are converted first.
From URLs & YouTube
Wikori can scrape any public web page and extract YouTube transcripts. Go to Ingest → From URLs.
Adding URLs
Paste one or more URLs into the input field and click Add to Queue. URLs are added to a queue — you can add many before starting the pipeline.
YouTube URLs
Paste any YouTube video URL (e.g. https://youtube.com/watch?v=...). Wikori automatically detects it, fetches the full transcript, and saves it as a Markdown file — no audio processing required.
Running the pipeline
The queue shows a count of pending URLs.
Wikori scrapes pages sequentially with polite delays to avoid triggering anti-bot measures.
Processed URLs disappear from the queue. Each becomes a .md file in INGEST/ and goes through AI enrichment.
Click Stop Pipeline at any time. Use Clear Queue to remove pending URLs without processing them.
Test the connection first with Test Connection to ensure your AI endpoint can be reached before starting a large batch.
Web Crawler
The Web Crawler takes a single seed URL and automatically discovers every linked page — then lets you preview, curate, and send the ones you want into the scraping pipeline. Go to Ingest → Web Crawler.
Perfect for ingesting entire documentation sites, knowledge bases, blog archives, or competitor resource pages without pasting URLs one by one.
Step-by-step
Click New Profile. Each profile stores a complete crawl configuration — seed URL, mode, depth, scope, and URL cap. You can save up to 50 profiles per vault.
Set your seed URL, choose a Crawl Mode, configure depth and scope (see tables below), and set a Max URLs cap to prevent runaway discovery.
Discovery starts immediately. A live counter shows pages visited, links found, and links discarded in real time. Hit the red Stop button at any point — you'll still get the partial results.
Every discovered URL appears in a checklist. Check or uncheck individual URLs, or use batch-select to take everything or nothing. Remove login pages, sign-up flows, or irrelevant sections before proceeding.
Click Add to Pipeline. Selected URLs join the URL scraping queue — each page is scraped, converted to Markdown, and AI-enriched like any other web page.
Crawl Modes
| Mode | Behavior |
|---|---|
| Single URL | Scrapes only the seed URL — no link discovery. Equivalent to pasting a URL into the URL pipeline directly. |
| Auto-discover | Follows links breadth-first from the seed page to the configured depth. This is the full crawler mode. |
Profile Settings
| Setting | Description | Notes |
|---|---|---|
| Seed URL | The starting page | e.g. https://docs.example.com/ |
| Depth | How many link-hops to follow (1–5) | Depth > 3 shows a warning — can discover thousands of URLs |
| Domain Scope | Which domains links are followed on | See domain scopes table below |
| Max URLs | Discovery stops after this many URLs (1–1000) | Always set a cap when exploring unfamiliar sites |
| Ignore Query Strings | Treat URLs that differ only by query params as the same page | Reduces duplicates on paginated sites |
Domain Scopes
| Scope | Behavior |
|---|---|
| Same domain | Only follow links on the same host as the seed URL. Most common choice. |
| Same path prefix | Same host, and URLs must start with the seed's path — useful for scoping to a subsection (e.g. /docs/ only). |
| Whitelist | Follow links only on domains you explicitly specify. |
| No restriction | Follow any link anywhere. Requires a confirmation checkbox — use with a low depth and URL cap. |
Smart filtering & behavior
The crawler automatically skips file types that shouldn't be scraped as web pages (images, CSS, JS, archives), ignores non-HTTP schemes, and deduplicates fragment-only links (#section). It follows redirects and uses the final resolved URL for deduplication — so http://example.com and https://example.com won't create duplicates.
Requests include a randomized 200ms–1s delay between pages to be polite to target servers.
Profile management
Profiles can be created, edited, duplicated, and deleted. Duplicating a profile is useful when you want to try a deeper crawl of the same site without overwriting your working configuration. The last 200 crawl runs are stored per vault, so you can review past discovery results even after closing the app.
Best practice: Start with depth 1–2 and Same domain scope. Review the URL list before adding to the pipeline. Use the Self-Care routine to re-run profiles on a schedule so your vault stays current when sites update.
From Email (IMAP)
Wikori can monitor an email inbox and automatically convert incoming messages into knowledge entries. Configure this in Settings → Email Ingestion.
Important: Wikori downloads and deletes matching emails from the mailbox. We strongly recommend using a dedicated mailbox or email alias, not your main inbox.
Configuration
| Field | Description |
|---|---|
| IMAP Host | Your mail server address (e.g. imap.gmail.com) |
| Port | Usually 993 for SSL |
| Username | Your email address |
| Password | App password or IMAP password — encrypted on save |
| Trusted Senders | Comma-separated list of email addresses. Only emails from these addresses are processed. |
| Vault Routing | Emails are routed to the vault whose email tag matches a tag in the subject line. Untagged emails go to the active vault. |
Running the pipeline
After saving your settings, click Start Pipeline to begin monitoring. Wikori checks the inbox periodically and processes any new emails from trusted senders. You can also click Check Now for an immediate poll.
Quick Notes Overlay
Quick Notes is a floating window that appears over any app — Figma, VS Code, a browser, a Zoom call — so you can capture thoughts without breaking your flow.
Opening the overlay
| OS | Hotkey | Notes |
|---|---|---|
| macOS | ⌥ + ⌥ (both Option keys) | Requires Accessibility permission |
| Windows / Linux | Alt + Alt (both Alt keys) | — |
Writing and routing notes
The overlay opens ready to type. Notes auto-save every 5 seconds and when you close the window.
The bottom bar shows a tag for each vault (e.g. #startup, #research). Click one to route this note to that vault. The tag turns bold and uppercase when selected.
Navigate between notes with ⌃⌥ → / ← (macOS) or CtrlAlt → / ← (Windows/Linux). Each note can be routed to a different vault.
Click the small dot button at the bottom-right of the tag bar. Wikori routes each tagged note to its vault's INGEST folder. The button flashes green to confirm. Untagged notes remain in temp storage.
The overlay is draggable, resizable, and always stays on top of other windows. Its position and size are remembered across app restarts. Press Esc to dismiss it — it saves your note before closing.
From Direct Text
For quick notes that you want to type directly in the app without using the overlay, go to Ingest → From Text.
Enter a filename (e.g. meeting-notes-2026-05-18) and type or paste your content. Click Save to INGEST and the file is written directly to your vault's INGEST folder and queued for AI processing.
Monitoring Ingest Progress
Switch to the Status page at any time to see what's happening:
| Section | What it shows |
|---|---|
| Vault Status | Active vault path, file watcher state, and total indexed count |
| Queue → Unindexed | Files in vault root not yet in the knowledge index. Click Process Unindexed to queue them. |
| Queue → Ingest Files | Files currently waiting in INGEST/ plus active processing count |
| Queue → Failed Files | Files that failed AI processing — retry individually or in bulk |