Change Monitor - Building a Website Change Detection System with Puppeteer and Telegram

How I built a full-stack website monitoring application that detects changes, captures screenshots, and sends instant notifications using Puppeteer, Express, and Telegram.

The Problem

We needed a way to monitor multiple websites and get notified instantly when their content changed. Whether it was tracking price updates on e-commerce sites, watching for job postings, or monitoring news articles, manually checking 50+ websites repeatedly was not good and sometimes we forgot to keep track of all the pages we needed to monitor.

There are some great advanced solutions already in the market but they come with subscriptions costs. We already had the hardware to run a monitoring software for our purpose so the only thing we needed was to have a custom solution which can track the websites easily.

The Solution

I built Change Monitor, a full-stack web application that automatically monitors websites for changes and sends instant notifications when updates are detected. The application is built with Next.JS and Node.Js and packaged with Docker for easy deployments. It supports Email and Telegram notifications as rather than spamming our emails we wanted to get notifications on telegram channels.

Key Features Implemented

1. Website Checking with Puppeteer

The core of the system uses Puppeteer to launch a headless Chrome browser and capture website content. The implementation includes several key optimizations:

Content Extraction: The system navigates to each monitored URL and extracts only the meaningful content by removing scripts, styles, and noscript tags. It intelligently selects the main content area using common selectors like main, #main, .main, .content, or article elements.
Screenshot Capture: A full-page screenshot is captured in PNG format for visual verification of any detected changes.
Content Hashing: The extracted content is converted to a SHA-256 hash for efficient comparison. This allows the system to quickly determine if content has changed without storing full page content.
Graceful Error Handling: The system handles timeouts and network errors gracefully, with appropriate response time tracking and status code reporting.

2. Scheduled Monitoring with Cron

Using node-cron, I implemented a flexible scheduler that checks all active monitors. The scheduler runs on a regular interval (every 60 minutes by default) and processes all active monitors.

The system implements several important optimizations:

Batch Processing: Monitors are processed in batches (concurrency limit of 3) to avoid overwhelming the system with too many simultaneous Puppeteer instances.
Individual Intervals: Each monitor can have its own check interval (from 1 minute to 24 hours). The scheduler respects these individual intervals, only running checks when they're due.

This approach ensures efficient resource utilization while preventing the system from becoming overloaded.

3. Telegram & Email Notifications with Screenshots

When a change is detected, the system sends a rich HTML message to Telegram Or Email along with the screenshot. The message is formatted as HTML to support bold text, links, and other formatting. This rich notification format provides immediate context and visual verification of the change, making it easy to understand what happened without needing to visit the website.

Challenges Faced

1. Puppeteer Browser Management

Problem: Chrome processes weren't closing properly, leading to memory leaks and zombie processes that consumed system resources.

Solution:

Close browser after each check instead of keeping it alive across multiple checks
Force kill any remaining Chrome processes using shell commands
Added comprehensive cleanup on graceful shutdown
Used single-process mode for better control over browser lifecycle

The cleanup process runs multiple kill commands targeting various Chrome-related processes (chrome_crashpad, chrome, chromium) to ensure complete cleanup. Errors from these commands are silently ignored since the processes might not exist.

2. Change Detection Accuracy

Problem: Simple HTML comparison was too noisy - detected changes in timestamps, counters, and irrelevant dynamic content that didn't represent meaningful content changes.

The solution was to extract only main content using smart selectors (main, article, .content) then strip scripts, styles, and noscript tags before comparison. Then, use the textContent instead of innerHTML for cleaner comparison. Finally, it generate SHA-256 hash for efficient comparison.

It's not fullproof solution as there can be text in webpage which can cause unwanted changes but it did dramatically reduced false positives by focusing only on the actual content that matters, ignoring dynamic elements like timestamps, counters, and analytics scripts.

3. Retry Logic with Exponential Backoff

Network requests sometimes failed for various reasons, so I implemented robust retry logic with exponential backoff. The system determines when to retry based on the type of error:

Retryable Errors Include:

Target closed errors (browser crashed)
Protocol errors (communication issues)
Network errors (ERR_CONNECTION, ERR_TIMEOUT, etc.)
Timeout errors
Connection refused errors

This exponential backoff strategy gives the server time to recover before retrying, increasing the likelihood of success while avoiding overwhelming the system with rapid retries.

4. Concurrent Monitor Checks

Problem: Running too many Puppeteer instances simultaneously overwhelmed the server, causing performance degradation and crashes.

Solution:

Implemented concurrency limit (3 monitors at a time)
Process monitors in chunks/batches
Proper cleanup between checks
Resource cleanup with proper error handling

The chunking approach ensures that even if you have 1000+ monitors, only 3 will be checked at any given time. Once those 3 complete, the next batch of 3 begins. This prevents resource exhaustion while still allowing efficient parallel processing.

Deployments

The whole project is containerized with Docker and can be deployed with a single command on any server which has Docker availability. This helped build CI/CD pipeline quite easily and this should make migration a lot easier if needed in the future.

Future Improvements

Overall the projects serves it's purpose and has saved a lot of hours in monitoring websites but there is always something to improve. In the future we

Looking at the project roadmap, there are plans to:

Add support for more notification channels (Slack, Discord, etc.)
Implement machine learning or LLM API for smarter change detection (ignoring minor fluctuations) using natural language