RAG source discovery
Find docs, blogs, changelogs, and help center pages.
/sitemap discovers public pages from sitemaps, robots hints, known pages, and optional link crawl settings. Use it before RAG ingestion, bulk capture, competitive research, and public site audits.
curl -X POST https://api.bytekit.com/v1/sitemap \ -H "Authorization: Bearer $BYTEKIT_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://docs.bytekit.com", "strategy": "standard", "max_urls": 5000, "webhook_url": "https://example.com/hook" }'
Find docs, blogs, changelogs, and help center pages.
Inventory public pages before capture, QA, or migration work.
See what pages exist before you decide what to monitor.
| Strategy | Use it when | Tradeoff |
|---|---|---|
quick | You need known URLs. | Less discovery depth. |
standard | You need coverage. | Good default. |
deep | You need more depth. | More bandwidth and time. |
Download discovered URLs into your ingestion job, warehouse, or review flow.
URL count, source counts, byte usage, warnings, timestamps, status.
Repeated domain requests can be served faster and cheaper when freshness allows.
Feed results into /bulk or pick pages for /monitors.
Sitemap crawl is not Common Crawl. It is not a promise to discover every hidden route on the internet. It is a practical way to collect public URLs for a site you want to ingest or monitor.