Web Spy: The Personal Web Crawler I Never Released

Offline browsing before offline browsing. Desktop app, never shipped.

Illustration for Web Spy: The Personal Web Crawler I Never Released
web-spy-personal-crawler I built Web Spy - a desktop app that crawled and cached websites for offline reading. This was before HTTrack, before Pocket, before Instapaper. Never shipped it. web crawler, offline browser, Web Spy, desktop app, HTTrack, Pocket, Instapaper, 1990s

According to TechCrunch, Pocket sold to Mozilla for an undisclosed sum in Mozilla's first-ever acquisition - solving a problem I'd already solved in 1995. I built a personal web crawler called Web Spy that would cache websites locally and let you browse offline. This was before HTTrack. Before Pocket. Before Instapaper. Before browser reading modes. I never released it.

TL;DR

Ship imperfect products when the timing is right. Market timing matters more than feature completeness. The idea you're sitting on might already be irrelevant.

Add it to the pile of things I built, used daily, and never shipped. The pattern is familiar now. But Web Spy is interesting because the problem it solved - reading web content when you're not connected - keeps getting solved by different products for different eras.

The Dial-Up Context

To understand why Web Spy mattered, you need to remember what internet access looked like in 1995. When I was at MSNBC and then running Core Logic Software, I was there, running up phone bills:

Dial-up was per-minute. Many ISPs charged by connection time. AOL's hourly rates meant every minute online cost money. You wanted to minimize time connected.

Phone lines were shared. If you were online, nobody could call your house. If someone picked up the phone, you got disconnected. Extended browsing sessions tied up family communication.

Connections were slow. 14.4 or 28.8 kbps. Loading a single page with images could take minutes. Browsing was an exercise in patience.

Connections were unreliable. Dropped connections were normal. You'd be in the middle of reading something and lose your connection. When you reconnected, you'd have to navigate back to where you were.

The rational response: don't browse live. Download what you want to read, disconnect, read offline. Reconnect when you need more content.

What Web Spy Did

Web Spy was a Windows desktop application with a simple workflow:

Start with a URL. Give it a starting page - maybe a news site, a reference site, or a site you wanted to read deeply.

Set crawl parameters. How deep should it go? (1 level = just that page. 2 levels = that page plus everything it links to. 3 levels = those pages plus their links.) What file types to include? Should it follow links to other domains?

Let it crawl. Web Spy would methodically fetch every page, following links according to your parameters. It showed progress - URLs being fetched, bytes downloaded, estimated time remaining.

Browse offline. Once complete, you had a local copy of the site. You could disconnect and browse at full speed. Links worked (they pointed to local copies). Images loaded instantly. No per-minute charges.

Update incrementally. Later, you could tell Web Spy to re-crawl, and it would only fetch pages that had changed since your last crawl. Efficient updates to keep your local copy fresh.

It was simple, it worked, and I used it constantly. Whenever I found a site worth reading deeply - documentation, tutorials, reference material - I'd crawl it and read at my leisure.

The Desktop App Experience

Web Spy was a native Windows app. This mattered:

No browser required. You didn't need to have your browser open. Web Spy had its own rendering engine (basic, but functional) for viewing cached content.

System tray integration. It could minimize to the system tray and crawl in the background. Start a crawl, go do something else, come back when it's done.

Scheduled crawls. Set it to crawl your favorite sites at 3am when phone rates were cheapest. Wake up to fresh content.

Disk management. Web Spy tracked how much disk space each cached site used. You could set quotas, delete old content, prioritize what to keep.

This was native app thinking applied to web content. The web was a data source; the application was how you interacted with it. The browser wasn't the center of the experience - your local cache was.

What Came Later

The problem Web Spy solved didn't go away. It evolved:

HTTrack (1998). A free website copier that did essentially what Web Spy did. HTTrack became the standard tool for offline website archiving. It's still maintained today.

Offline browsing in browsers (early 2000s). Internet Explorer and others added "Work Offline" modes. Primitive - they only cached what you'd already visited - but the same idea.

Instapaper (2008). Save articles to read later. Stripped down to just the content. Synced across devices. Different implementation, same core insight: you want to read when it's convenient, not when you're connected. Instapaper remains one of the oldest read-it-later apps still in active use today.

Pocket (2007, originally Read It Later). Same concept as Instapaper, different execution. Save now, read later. The "read later" category was born.

Browser reading modes (2010s). Safari Reader, Firefox Reader View. Strip away the noise, focus on content. Not offline, but the same impulse - make web content more readable.

Progressive Web Apps (2010s). Web apps that work offline. Service workers caching content. The web platform finally supporting what Web Spy did with desktop software. When Mozilla acquired Pocket in 2017, it validated the read-it-later category as fundamental to web browsing.

The specific problem (dial-up costs, slow connections) changed. The underlying need (consume content on your terms, not the network's terms) persisted.

Why I Didn't Ship

Web Spy was another project for the pile of things I built and never released. The reasons are familiar:

It worked for me. I had a tool that solved my problem. The motivation to productize it was low when my own need was already met.

The market seemed small. Who else was annoyed enough by dial-up browsing to pay for a solution? At the time, I wasn't sure there was a market. In retrospect, HTTrack's popularity suggests there was.

Polish seemed hard. Web Spy worked for me because I understood its quirks. Making it work for others meant documentation, error handling, support. That felt like more work than I wanted to do.

The window closed. By the late 90s, broadband was spreading. The dial-up constraints that made Web Spy valuable were disappearing. The market for "offline web browsing" seemed to be evaporating.

I was wrong about that last part. The market didn't evaporate - it transformed. People still wanted to save content for later, read without distraction, consume on their own schedule. Instapaper raised funding. Pocket sold to Mozilla for an undisclosed sum. The need persisted even after always-on connectivity arrived.

The Lesson About Timing

In my experience building tools I never shipped, Web Spy is a case study in technology timing:

Too early looks like too late. I thought the need for offline browsing was about to disappear with broadband. Instead, the need evolved. The constraint changed from "can't connect" to "don't want to connect" (battery life, attention, reading experience).

Implementation changes, problems persist. Nobody uses Web Spy-style website crawlers for casual reading anymore. But "save this to read later" is a product category. The technical approach died; the human need it addressed survived.

User behavior insights transfer. The insight that people want to consume content on their own terms, not on the network's terms, was correct. I just couldn't see how that insight would manifest after the dial-up era ended.

What I'd Do Differently

If I'd shipped Web Spy, I probably wouldn't have caught the Instapaper/Pocket wave. The product was too tied to website crawling, not article saving. The transition from "cache websites" to "save articles" was a product evolution I probably wouldn't have made.

But shipping it would have taught me things. About users, about markets, about product evolution. Even failed products teach more than unshipped ones. The code sitting on my hard drive taught me nothing except that I'd solved a problem for myself.

The pattern continues. After 45 years in tech, I've built dozens of tools that solved real problems - like my challenge-response spam filter that predated commercial solutions by years. Most never shipped. I learned the hard way that some of those problems got solved by others who did ship. The lesson isn't that I should have shipped everything - it's that the barrier to shipping was in my head, not in the market.

The Bottom Line

Web Spy was a good tool that solved a real problem. The problem evolved faster than I expected, but it didn't disappear. Twenty-five years later, people still want to save content for later, read without distraction, consume on their terms.

The specific implementation - crawling websites to local disk - is obsolete. The underlying insight - users want control over when and how they consume content - built a product category worth hundreds of millions of dollars.

I had the insight. I had a working implementation. I didn't ship it. Someone else shipped a different implementation of the same insight and built a real business. That's how it goes sometimes.

"I had the insight. I had a working implementation. I didn't ship it. Someone else shipped a different implementation of the same insight and built a real business."

Sources

Product Development

Ideas are cheap. Shipping is everything.

Let's Talk

Were You There?

If you lived through this era and remember it differently, or have details I missed, I'd love your perspective.

Send a Reply →