Does self-hosting keep my searches on my own infrastructure?

It keeps the server process and any API keys on your infrastructure. A search or scraping server still sends queries out to the web, and the arXiv server reaches arXiv's API for papers, so self-hosting controls the process, not whether the requests travel.

Self-hosted arXiv MCP alternatives

The arXiv MCP server installs locally and runs over stdio, so the process stays on your own machine. The servers below run locally too. If owning the process is the requirement, any of them qualifies.

One caveat: a search or scraping server still sends requests out to the web. Self-hosting controls where the process and any keys live, not whether the queries and fetched pages travel. The arXiv server itself reaches arXiv's API for paper data the same way.

The 8 best self-hosted alternatives

FirecrawlOfficial6,500
Runs locally and turns any website into clean, LLM-ready data through scrape, crawl, map, search, and extract. The self-hosted way to pull full pages from across the web, beyond arXiv's papers.
Set up Firecrawl →
ExaOfficial4,511
Exa's official server runs locally and does neural web search with clean full-page content built for LLMs. A self-hosted search-first option for finding sources arXiv does not cover.
Set up Exa →
Bright DataOfficial2,426
Getting past blocks, CAPTCHAs, and geo-restrictions is the point: Bright Data's official server installs locally and does web search and scraping where simpler tools stall. The local pick for sources that resist scraping.
Set up Bright Data →
TavilyOfficial2,100
Real-time web search with page extraction, crawling, and mapping all run on your own box through Tavily's official server. A balanced self-hosted research feed beyond preprints.
Set up Tavily →
ApifyOfficial1,300
Running locally is an option for the official Apify server, which exposes 6,000+ Actors plus run, dataset, and store tools for scraping and automating the web. A fit for structured extraction at scale, kept on your machine.
Set up Apify →
DuckDuckGoCommunity1,199
Key-free and local, this maintained server gives an agent DuckDuckGo web search plus clean page-content fetching. The simplest self-hosted way to widen a search past arXiv.
Set up DuckDuckGo →
Brave SearchOfficial1,123
Web, news, image, video, and local results come through one API in Brave's official search server, run locally. Varied result types from a process you control.
Set up Brave Search →
KagiOfficial402
Kagi's official server runs locally and gives agents ad-free, privacy-respecting web search plus clean full-page extraction. A fit when privacy and result quality matter for research.
Set up Kagi →

How to choose

All of these install locally, keeping the process and any keys on your infrastructure. For pulling web pages into research, Firecrawl and Tavily fit; Exa is the neural search pick; Bright Data and Apify scrape at scale; DuckDuckGo, Brave Search, and Kagi are lighter or privacy-first search feeds. None index arXiv's papers, so keep arXiv for preprints and add one of these for the open web.

FAQ

Can the arXiv MCP server be self-hosted?: Yes. The arXiv server installs locally and runs over stdio, so the process stays on your own machine. Every alternative on this page can run locally too.
Does self-hosting keep my searches on my own infrastructure?: It keeps the server process and any API keys on your infrastructure. A search or scraping server still sends queries out to the web, and the arXiv server reaches arXiv's API for papers, so self-hosting controls the process, not whether the requests travel.

← Back to the arXiv MCP server