What is SavePage.io used for?

SavePage.io is used for programmatically capturing and archiving web page content. Its primary applications include legal evidence collection, regulatory compliance, content change monitoring, and building historical datasets of websites.

What output formats does SavePage.io support?

SavePage.io can save captured web pages in multiple formats, including full HTML, rendered PDF documents, and WARC (Web ARChive) files.

Does SavePage.io offer a free tier?

Yes, SavePage.io provides a free tier that includes 50 web page captures per month, suitable for evaluation or light usage.

Can I automate web page captures with SavePage.io?

Yes, SavePage.io provides an API that allows for automated and scheduled web page captures, enabling integration into existing applications and workflows.

How does SavePage.io handle dynamic content?

SavePage.io is designed to capture dynamic web content, including JavaScript-rendered elements and media, to ensure an accurate and complete snapshot of the page.

What is the difference between HTML and WARC output?

HTML output typically provides the rendered HTML of a single page. WARC (Web ARChive) is a standardized format designed for storing web content in an archival context, often including more comprehensive data like linked resources and metadata, making it suitable for long-term preservation.

SavePage.io — API for Website Content Archiving & Monitoring

Overview

SavePage.io offers an API-driven solution for capturing and preserving web content. The platform specializes in creating verifiable, immutable snapshots of live websites, converting them into standard formats such as HTML, PDF, and WARC (Web ARChive) files. This functionality supports use cases requiring reliable historical records of web pages, ranging from legal documentation to content integrity monitoring.

Developers and technical buyers utilize SavePage.io to automate the collection of web data, ensuring that the state of a web page at a specific point in time is recorded accurately. This process can be critical for organizations needing to comply with regulatory requirements, such as those in finance or healthcare, where maintaining detailed records of publicly accessible information is mandated. The API provides endpoints to initiate captures and retrieve the archived content, with options for asynchronous processing via webhooks to handle longer capture times.

The service is designed for scenarios where manual screenshotting or browser-based archiving is insufficient due to scale, frequency, or the need for programmatic control. For instance, legal teams may use it to gather evidence of online defamation or intellectual property infringement, while market researchers might archive competitor websites to track changes over time. SavePage.io emphasizes the integrity of captured data, a key aspect for legal and compliance applications where the authenticity of archived content must be defensible. Unlike general-purpose web crawlers, SavePage.io focuses specifically on creating and storing a precise, timestamped record of a single web page or domain, including dynamic content and media where applicable.

It integrates into existing workflows through its API, allowing for custom triggers and automated archiving schedules. The API supports various programming languages, providing code examples to facilitate integration. This approach contrasts with manual browser-based archiving tools, which may lack the scalability and automation features necessary for enterprise-level content preservation strategies.

Key features

Website Content Archiving: Programmatically capture web pages as static files, including HTML, PDF, and WARC, preserving their state at a specific moment in time.
API-Driven Capture: Utilize a RESTful API to initiate web page captures, retrieve results, and manage archiving tasks (SavePage.io API reference).
Multiple Output Formats: Supports exporting captured pages as full HTML, rendered PDFs, or WARC files, catering to different archival needs.
Asynchronous Processing with Callbacks: Enables efficient handling of capture requests, with webhook support to notify applications upon completion of an archiving job.
Content Integrity: Designed to capture web content accurately, including dynamic elements and media, suitable for legal and compliance scenarios.
Scheduled Captures: Allows for automation of recurring captures to monitor changes on web pages over time.
Developer-Friendly Documentation: Provides code examples in Python, Node.js, Go, PHP, and cURL to aid integration (SavePage.io documentation).

Pricing

SavePage.io offers a free tier for basic usage, with paid plans scaling based on the number of captures per month. Custom enterprise pricing is available for higher volume requirements.

Plan	Captures/Month	Price/Month	Key Features
Free	50	Free	Basic archiving, API access
Pro	500	$9	Increased captures, priority support
Business	2,500	$39	Higher capture limits, advanced features
Enterprise	Custom	Custom	High volume, dedicated support, custom integrations

Pricing data as of May 2026. For the most current pricing details, refer to the SavePage.io pricing page.

Common integrations

SavePage.io's API facilitates integration with various systems:

Content Management Systems (CMS): Archive specific pages or posts automatically when published or updated.
Legal Tech Platforms: Integrate for automated collection of online evidence for litigation or compliance.
Marketing & SEO Tools: Monitor competitor websites or track changes to marketing landing pages.
Data Archiving & Storage Solutions: Connect with cloud storage providers (e.g., AWS S3, Google Cloud Storage) to store WARC or PDF archives.
Business Process Automation (BPA) Tools: Utilize platforms like Tray.io or Zapier to create automated workflows for web page capture, though direct connectors may require custom API calls (Tray.io for automation).

Alternatives

Archive.today: A non-profit web archiving service that saves snapshots of web pages and generates a short URL.
Stillio: Provides automated website screenshots and archiving, focusing on visual capture and monitoring.
WebPreserver: Offers legal-grade web page archiving for evidence collection, including social media and dynamic content.

Getting started

To get started with SavePage.io, you typically make a POST request to their capture endpoint with the URL you wish to archive. Below is an example using Python to initiate a capture and retrieve the results.


import requests
import time

API_KEY = "YOUR_SAVE_PAGE_API_KEY"
API_BASE_URL = "https://api.savepage.io/v1"

def capture_page(url):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "output_format": "pdf"  # or "html", "warc"
    }
    
    try:
        response = requests.post(f"{API_BASE_URL}/capture", headers=headers, json=payload)
        response.raise_for_status() # Raise an HTTPError for bad responses (4xx or 5xx)
        
        capture_job = response.json()
        job_id = capture_job.get("job_id")
        print(f"Capture job initiated. Job ID: {job_id}")
        return job_id
    except requests.exceptions.RequestException as e:
        print(f"Error initiating capture: {e}")
        return None

def get_capture_status(job_id):
    headers = {
        "Authorization": f"Bearer {API_KEY}"
    }
    try:
        response = requests.get(f"{API_BASE_URL}/capture/{job_id}", headers=headers)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error getting capture status: {e}")
        return None

def download_capture(job_id, output_path):
    status_data = get_capture_status(job_id)
    if not status_data:
        return False

    if status_data.get("status") == "completed":
        download_url = status_data.get("download_url")
        if download_url:
            try:
                print(f"Downloading from: {download_url}")
                response = requests.get(download_url, stream=True)
                response.raise_for_status()

                with open(output_path, 'wb') as f:
                    for chunk in response.iter_content(chunk_size=8192):
                        f.write(chunk)
                print(f"Capture downloaded to {output_path}")
                return True
            except requests.exceptions.RequestException as e:
                print(f"Error downloading capture: {e}")
                return False
        else:
            print("Download URL not available.")
            return False
    else:
        print(f"Capture job {job_id} is not yet completed. Current status: {status_data.get('status')}")
        return False

if __name__ == "__main__":
    target_url = "https://www.example.com"
    output_file_name = "example_archive.pdf"

    job_id = capture_page(target_url)
    if job_id:
        # Poll for status (in a real application, consider webhooks for async notifications)
        status = "pending"
        while status not in ["completed", "failed", "error"]:
            time.sleep(10) # Wait 10 seconds before polling again
            status_data = get_capture_status(job_id)
            if status_data:
                status = status_data.get("status")
                print(f"Job {job_id} status: {status}")
            else:
                print("Could not retrieve job status.")
                break
        
        if status == "completed":
            download_capture(job_id, output_file_name)
        else:
            print(f"Capture failed or errored for job {job_id}.")

This Python script initiates a capture request for a specified URL, polls the API for the job's status, and then downloads the resulting PDF file once the capture is complete. In production environments, utilizing webhooks for asynchronous notifications is recommended over polling to manage event-driven workflows more efficiently.

SavePage.io

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

# frequently asked questions

## reviews

## comments

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

# frequently asked questions

# see also

## reviews

## comments