Tools tagged: open-source
All SEO tools tagged with open-source
Showing 302 tools
Hugging Face Transformers
Open-source library to build and test NLP models for understanding conversational voice queries
Hugging Face
Open-source platform for NLP and machine learning models
Hugging Face Models
Platform to run open-source LLMs to audit how different models categorize specific domains
LangChain
Framework for developing LLM applications with integrated source citation tracking
LangChain-SEO-Check
Open source integration for monitoring RAG retrieval frequency of specific domains
yt-dlp
Open-source command-line tool for downloading videos and extracting metadata for analysis
Hugo
Fastest open-source static site generator for massive SEO-ready architectures
Hoppscotch
Open-source API development ecosystem used by SEOs to test and debug headless CMS endpoints and schema APIs
Strapi
Open-source headless CMS that provides full control over eCommerce SEO metadata via API
Strapi SEO Plugin
Open-source plugin for Strapi CMS to manage meta tags sitemaps and social sharing previews
Kuma
Self-hosted uptime monitoring tool like Uptime Robot but with a clean UI
Llama 3 (Meta)
Highly capable open-source large language model for private AI content infrastructure
Sherlock
Powerful CLI tool to find usernames across over 400 social networks for footprinting
Hugging Face SEO Tools
Community-driven machine learning models for title generation and content classification
Mastodon
Decentralized social profiles that provide high crawlable search signals
Ghost
Open-source publishing platform focused on speed and native SEO features
Meilisearch
Lightning fast open-source search engine API to improve site-search SEO and UX
Astro
The web framework for content-driven websites with built-in SEO optimizations and API routes
ClickHouse
Fast open-source column-oriented database for real-time log analytics
LlamaIndex
Framework for connecting private data to LLMs for custom GEO applications
Metabase
Open-source BI tool that makes SEO data exploration easy for teams
Directus SEO
Open-data platform that turns any SQL database into a headless CMS with SEO meta management
K6
Open-source load testing tool for monitoring site performance under stress
Medusa
The open-source Shopify alternative focused on developer experience and high-speed SEO performance
React Flow
Open-source library for building node-based editors and behavior visualizers
PostHog Replay
Self-hostable session recording for developers who want full data control
Umami
Self-hosted privacy-focused open-source analytics with a simple interface
WebArchiver
Open-source tool to create local archives of websites for change comparison
Saleor
Headless open-source eCommerce platform with high-performance GraphQL API and SEO-optimized storefronts
PostHog
Open-source product analytics suite that provides heatmaps and session recording for SEO conversion analysis
Coolify
An open-source self-hostable alternative to Heroku/Netlify/Vercel for app deployment
DSPy
Framework for programmatically optimizing language model prompts and weights for reliable content outputs
Gemma (Google)
Open-source family of lightweight, state-of-the-art models for AI developers
GoAccess
Real-time web log analyzer and interactive viewer for terminal or browser
Plausible Analytics
Privacy-focused, open-source alternative to Google Analytics with lightweight script
Web-Check
All-in-one open-source OSINT and technical SEO tool for analyzing infrastructure and headers
Changedetection.io
Self-hosted open-source website change detection and notification service
SingleFile
Chrome extension that saves an entire page into a single HTML file for archiving
Airbyte
Open-source data integration engine for syncing SEO data to warehouses
SearXNG
Privacy-respecting metasearch engine and alternative to Google
Hugging Face Open LLM Leaderboard
Compare and analyze models used in AEO to understand which LLMs prioritize specific data sources
Quicklink
Open source library that speeds up subsequent page loads by prefetching in-viewport links during idle time
Amass
In-depth DNS enumeration and network mapping for asset discovery
LanguageTool API
Open-source proofreading service for more than 30 languages with an accessible developer API
OpenReplay
Open-source session replay suite with privacy focus and self-hosting options
OpenSearch
Community-driven open-source search and analytics suite for log data
Subfinder
Fast passive subdomain enumeration tool for domain discovery
Http-Toolkit
Open-source tool for intercepting, debugging, and rewriting HTTP traffic for routing validation
Storm
Open-source knowledge curation engine that generates SEO-friendly wiki pages
GrowthBook
Open-source feature flagging and A/B testing platform that works with your existing data warehouse
Statamic (Free Core)
Flat-first CMS that is inherently SEO-friendly and fast for smaller projects
RudderStack
Customer data platform (CDP) for routing SEO event data to BI warehouses
Verba
Open-source RAG application to explore SEO datasets using LLMs locally
Shopware
Modern open-source eCommerce platform from Germany focusing on storytelling and SEO experiences
Hush
A simple no-nonsense blocker for those annoying cookie consent notices
Meltano
Open-source DataOps platform for building SEO data pipelines
Mind-Search
Open-source AI search agent that mimics human research behavior for SEO
Microsoft Clarity SDK
Lightweight SDK for integrating Clarity session replays into native mobile apps
Podcast Index
Open-source database preserving decentralized podcasting for independent SEO research
DefiLlama
Leading DeFi aggregator for tracking search volume by TVL and protocol
IndexNow
Protocol for instant notification to search engines when website content is created or updated
SERPAPI (G0)
High-performance Go library for interacting with major search engine scraping APIs
IndexNow Python Script
CLI tool to automate bulk URL submissions to IndexNow API
AutoGPT
Experimental open-source application showcasing the capabilities of the GPT-4 language model for SEO automation
ELK Stack (Elasticsearch)
The standard for searching, analyzing, and visualizing log data in real-time at scale
Grafana
Observability and BI platform used for real-time SEO monitoring dashboards
Gatsby
Open-source frontend framework for creating blazing fast SEO-friendly eCommerce storefronts
Scrapy
Powerful open-source web crawling framework for Python developers to build custom SEO scrapers
Huginn
Open-source system for building agents that monitor and act on web changes
AMP Browser Extension
Instantly validates Accelerated Mobile Pages and provides real-time debugging for web developers
Lighthouse
Open-source automated tool for improving the quality of web pages including SEO and performance
Bitwarden (Self-Hosted)
Open-source password management solution that can be hosted on your own infrastructure
Matomo Analytics
Leading open-source web analytics platform that gives you full control over your SEO data
Matomo Heatmaps
Privacy-focused heatmap plugin for the leading open-source analytics platform
Dark Reader
Enables dark mode for every website to reduce eye strain during long SEO audit sessions
Matomo
Privacy-focused open-source web analytics platform that provides 100% data ownership
Rasa
Open-source machine learning framework for building text and voice-based conversational AI assistants
Haystack by Deepset
Open-source NLP framework for building search systems with verifiable citations
Netlify CMS (Decap)
Open-source Git-based CMS for static site generators with simple SEO markdown config
Logstash
Server-side data processing pipeline that ingests data from multiple sources
Rendertron
A headless Chrome rendering solution designed to render & serialise web pages on the fly
Spree Commerce
Modular open-source headless eCommerce platform with multi-store support and SEO flexibility
Filebeat
Lightweight shipper for forwarding and centralizing log data from servers
OWASP ZAP
The worldโs most widely used web app scanner for finding vulnerabilities
Tabula
Tool for liberating data tables trapped inside PDF files into CSV for SEO keyword research
LanguageTool
Open-source grammar, style, and spell checker for high-quality SEO content
Magento Open Source
Enterprise-grade open-source commerce platform offering deep technical SEO and catalog customization
Wappalyzer
Identify web technologies on websites including CMS frameworks and payment processors
OpenCart
Free open-source eCommerce platform with a massive marketplace for SEO extensions
Thumbor
Open-source smart imaging service which allows on-demand cropping, resizing and flipping of images
Yourls
Your Own URL Shortener - open-source PHP scripts for custom domain routing
WooCommerce
The most popular open-source eCommerce plugin for WordPress with extensive SEO extension ecosystem
ImageMagick
The industry-standard open-source software suite for creating, editing, and converting bitmap images
OpenProject
Open-source project management for secure SEO infrastructure
PrestaShop
Feature-rich open-source eCommerce solution with modular architecture for technical SEO control
ZMap
Open-source network scanner that enables researchers to easily perform Internet-wide surveys
ExifTool
Powerful command-line application for reading writing and editing document metadata including PDF and images
Taiga
Open-source project management for agile SEO teams
Tota11y
An accessibility visualization toolkit that helps visualize how a page performs with assistive technology
Oscar
Open-source eCommerce framework for Django designed for domain-driven SEO architecture
Countly
Product analytics for mobile and web that can be self-hosted for complete data control
SiteSpeed.io
Open-source tool suite that helps you measure and monitor the performance of your web pages
PDF-Miner
Data extraction tool that helps SEOs turn PDF content into crawlable HTML or text formats
Varnish Cache
High-performance HTTP accelerator for caching content and improving response times
MozJpeg
Open-source JPEG encoder that improves compression while maintaining full compatibility
Solidus
A powerful open-source eCommerce framework for high-volume retailers built on Ruby on Rails
A11y Project
A community-driven effort to make digital accessibility easier to understand and implement
WooCommerce SEO
Open-source SEO extension for WordPress e-commerce stores
Jovo Framework
Open-source framework for building professional voice apps across multiple platforms and devices
Singer
Open-source standard for writing scripts that move SEO data
Accessibility Developer Tools
Library of accessibility audit rules used by Google in various testing environments
Common Crawl
Open repository of web crawl data that can be used for large-scale link graph analysis
Open-Web-Analytics
Open source web analytics that provides similar features to Google Analytics
Apache Traffic Server
Scalable HTTP intermediate and forward proxy server for high-performance routing
Wasabi
Open-source A/B testing platform originally developed by Intuit for high-performance experiments
Webhint
Linting tool for the web that helps improve accessibility, speed, and cross-browser compatibility
Tab Wrangler
Automatically closes inactive tabs to save memory and stay organized
Ghostery
Privacy extension that shows trackers and allows you to block them to see page speed impact
Metarank
Open-source ranking service for personalized search results and LTR
Dimensions
A tool for designers to measure screen dimensions in the browser for pixel perfect layouts
GMB API Tracker
Custom scripts and tools for tracking GMB ranks via official Google APIs
JSONView
Formats and validates JSON responses in the browser for easier API and structured data debugging
Open-Source-GEO
Library for managing geographic coordinates and local business schema for SEO
EditThisCookie
Manage cookies for SEO testing including location spoofing and session management.
Picfit
An image resizing server written in Go that can use different storage backends like S3 or GCS
Tab Manager Plus
Efficiently manage hundreds of tabs with search filter and deduplication features
Baidu MIP (Mobile Instant Pages)
Open-source framework to speed up mobile pages for instant loading on Baidu Search results
GrandNode
Open-source cross-platform eCommerce solution based on .NET Core and MongoDB for SEO speed
LinkClump
Allows you to open copy or bookmark multiple links at once by dragging a selection box
Yoast Real-time Content Analysis
SDK for developers to integrate Yoast-style SEO analysis into custom publishing platforms
CSP Evaluator
Evaluates Content Security Policies to detect bypasses and improve site security
Extruct
Python library for extracting structured data from HTML files supporting Microdata and JSON-LD
Image Downloader
Bulk download all images on a page with filtering by size and type
WhatFont
The easiest way to identify fonts on web pages by simply hovering over the text
Linkclump
Drag a box around links to open them all in new tabs or copy them to clipboard
Clear Cache
One-click cache and data clearing to test live SEO changes without browser lag.
DuckDuckHack
Open source community for building Instant Answers and plugins for DuckDuckGo
SeoPanel
Open source SEO control panel for managing multiple sites and tracking keyword rankings in one place
Check My Links
Link crawler that finds broken links on a webpage to assist in technical SEO cleanup
SerpApi Client Libraries
Open-source wrappers for SerpApi across 10+ languages to simplify SEO data integration
Clear Cache Shortcut
One-click button to clear browser cache and cookies for testing fresh site changes
Koko Analytics
Privacy-friendly analytics plugin for WordPress that does not use cookies or external services
Eye Dropper
Color picker that allows you to pick colors from any webpage or color picker interface
Open Multiple URLs
Paste a list of URLs and open them all in new tabs simultaneously
Link Gopher
Browser extension to extract all links from a webpage and sort them for outreach lists
AMP Validator
Official tool to validate Accelerated Mobile Pages for news performance and compliance
Drupal Simple XML Sitemap
The standard module for Drupal to generate multilingual and custom entity sitemaps automatically
Gnu-Wget
Command-line utility for retrieving files using HTTP/HTTPS frequently used for site mirroring
HeidiSQL
Lightweight, open-source SQL client for Windows to manage databases and servers
NVDA Screen Reader
Free and open-source screen reader for Windows enabling blind people to use computers
OpenStreetMap
The free wiki world map often used as a data source for local geo tools
OptiPNG
An advanced PNG optimizer that recompresses image files to a smaller size without losing information
Scribus
Open-source desktop publishing software capable of creating highly structured and SEO-ready PDFs
SmokePing
Latentcy logging and graphing system for monitoring network connections
Zabbix Performance
Enterprise monitoring solution that can be configured to track website performance metrics
Payload CMS
Next-generation headless CMS with TypeScript support and extensible SEO configuration hooks
Postiz
Open-source social media and SEO content scheduling platform with AI insights
Kutt
Modern open-source URL shortener with custom domains and analytics routing
ScrapeGraphAI
Open-source Python library for scraping search results using Large Language Models
Linkwarden
Self-hosted collaborative bookmark manager with routing preservation and archiving
Perplexica
Advanced AI-powered search engine and open-source Perplexity alternative
Dub.co
Open-source link management infrastructure for modern marketing teams
Farfalle
Open-source AI search engine and answer engine for local or cloud LLMs
Serpbear
Self-hosted open-source search engine rank tracking software
Forwardmail
Privacy-focused open-source service for email and domain forwarding routing
Scribe
Open-source markdown-based documentation and SEO platform for developer-centric content marketing
SerpBear
Open-source search engine rank tracking software that allows you to track your website rankings for free
LinkStack
Self-hosted link-in-bio platform for managing backlink landing pages and professional profiles
Carbonalyser
Evaluate the environmental impact of your website based on data transfer
LinkExtractor
Open-source Python library for extracting and monitoring backlinks from raw HTML and web crawls
OpenPerplex
Open-source AI-powered search engine focused on citations and source transparency
TextGenerator
Open-source AI content generation and SEO tool with community-driven team templates
GEO-Bench
Open-source benchmark for evaluating Generative Engine Optimization strategies
Serp-Explorer
Open-source rank tracker that scrapes search results without expensive proxies
ALwrity
Open-source AI digital marketing platform supporting RAG and SERP factual content generation
LLM-Explorer-Open
Open source framework to test how different prompt structures affect site visibility
SEOEstate
Open-source website crawler and technical SEO analyzer for developers and technical consultants.
Zeno
Open-source crawler and SEO toolset for developers built with Node.js
SEO Audits Toolkit
All-in-one open-source collection for website health and security
SEOnaut
Comprehensive open-source SEO audit tool for deep site scans
Swetrix
Fully open-source and privacy-focused analytics with deep focus on performance and SEO tracking
Dino DNS
Open-source, lightweight DNS record manager designed for fast SEO testing and domain mapping
Indexing API Tool
Open-source script for bulk-sending URLs to Googleโs Indexing API via Google Sheets
LLM Analytics
Open-source framework to track how LLMs cite and reference website content
RankTools
Open-source command line tool for monitoring keyword rankings across major engines
SEO Dash
Open-source dashboard template for tracking SEO performance across multiple domains
SEOlizer
Open-source tool for real-time log analysis and technical SEO monitoring
Search Engine Scanner
Fast open-source CLI for scraping search engine results
Serply
Open-source Python library for scraping search engine results for SaaS dev teams
Dalton
Open-source local search rank tracker and citation audit tool for developers and small agencies
PySEO Keyword Tool
Python-based open source tool for programmatically generating keyword reports and clusters
RustySEO
Cross-platform marketing toolkit for deep crawl and server log analysis
Slowpoke
Open-source desktop tool to analyze and visualize website performance over time
LLM-Visibility-Library
Open-source repository for benchmarking site appearance in various Large Language Models
Favicon-Switcher
Small JS library to switch favicons based on media queries like dark mode or system theme changes
Sa11y
Accessibility checker that focuses on content and provides visual cues for editors within their CMS
Search-One Local SEO Tool
Open-source repository for scraping local search results and identifying competitors
SEO Python Library
Open-source library for automating technical SEO tasks like redirect checks and meta analysis
Local-LLM-SEO
Open-source scripts for bulk generating localized service pages using Llama 3 or GPT-4
Dynamic Favicon
JavaScript library that allows developers to change favicons dynamically based on user status or site notifications
Keywell
Open-source utility for semantic keyword grouping and basic topic modeling for developers
Site-Audit-CLI
High-performance CLI tool for technical SEO audits and dead link detection
ContentSwift
Open-source content research and optimization tool for semantic SEO
Free-GPB-Auditor
Open-source script for auditing Google Business Profiles for local ranking factors
SEOPivot
Emerging open-source desktop dashboard for monitoring site health and schema
Search engine keywords
CLI tool for fetching keyword suggestions from various search engine autocomplete APIs
Crux.vis
A clean dashboard for visualizing your domain's historical Chrome User Experience Report data trends
Ferryman
Open-source tool for handling large-scale e-commerce SEO redirects and path mapping
Gmerlin
Open-source email verification tool designed for high-speed checking of link building lists
Greenflare
Open-source desktop SEO crawler for privacy-conscious technical website audits
Hugging Face SEO Description
AI tool to generate meta descriptions and titles using LLMs
LLM-Benchmark-Crawler
Scripts to automate querying multiple LLMs and tracking brand citation frequency
Local Search Data
Open-source scraper for extracting local business data from maps
Mallard Duck
Lightweight open-source eCommerce SEO auditing tool for small online shop owners
PySEO
Open-source Python framework for custom scheduled SEO crawling and data extraction
Serpident
Open-source light-weight rank tracker for personal or agency use
Slim SEO
Automated and lightweight SEO plugin with zero configuration required for beginners
Ant-Team Yandex XML Tools
Specialized scripts and tools for managing Yandex XML limits and parsing tasks
Map-Tracker
Open-source python script to track local search rankings on Google Maps via SerpApi
Naver-SEO-Python
Python scripts for automating Naver Search Advisor tasks and checking indexing status via CLI
LinkBuildingTools
Open-source repository of scripts and workflows for link prospecting and domain validation
SEO.js
Open-source JavaScript library for checking SEO best practices in code
Klustr
Open-source Python library for hierarchical keyword clustering and automated taxonomy generation
Scry
Open-source oracle search explorer for tracking price feed SEO accuracy
Photon
Incredibly fast crawler designed for OSINT and domain profiling
Ackee
Self-hosted privacy-focused analytics tool for developers who care about their own data
Anvaka Sayit
Interactive graph visualization of subreddit relationships to identify niche clusters and link opportunities
GoatCounter
Open-source web analytics service that is easy to use and meaningful without tracking individuals
SocioBoard
Open-source social media management with white-label capabilities for marketing agencies
YouTube Transcript Extractor
Utility for developers to programmatically pull video transcripts for SEO analysis
Staticman
Open-source static site comment engine for building community on JAMstack sites
Wapiti
Open-source web security auditor that identifies SEO-impacting vulnerabilities
Shynet
Modern self-hosted web analytics that works without cookies or JavaScript for extreme privacy
BestIcon
A simple favicon service written in Go that can find and proxy favicons for any website with a single API call
Yellow Lab Tools
Open-source online test that detects performance smells and front-end code quality issues
A11y Machine
Automated accessibility testing tool that crawls a website and generates a visual report of issues
Sitemap Generator CLI
Node.js command-line interface tool to generate sitemaps for any website directly from the terminal
Abot
Open-source crawler for developers to build custom scheduled technical SEO audits
Ally.js
Library to help modern web applications with accessibility, focus management, and keyboard navigation
Broken Link Checker
Open-source CLI tool to find broken links and server errors across entire domains
Offen
Fair web analytics that requires user consent and treats website owners and users as equals
Puppeteer-Renderer
Simple web service to render web pages as HTML or PDF using Puppeteer backend
Static Jinja
Open-source library to build static websites with Jinja2 templates
Domain MOD
Open-source application to manage your domains and registrars
Web Archives
Browser extension to view archived and cached versions of web pages
Favicon Grabber
Open-source API and tool to extract the highest resolution favicon from any website URL for competitive research
Anchor Image
Lightweight image server for self-hosting with built-in on-the-fly resizing and cropping
LinkSource
Open-source link prospect management tool for organizing potential domains and tracking contact status
MobiMetrics
Open-source script collection for basic mobile rank tracking
PDF-XRay
Python-based tool for extracting and analyzing PDF structures to identify SEO issues or hidden data
PySEO Tools
Open-source Python scripts for automating repetitive SEO tasks and audits
SEO Mapper
Open-source tool to visualize internal linking and site architecture
SEO-Checker
Node.js based open source SEO checker for developers
SEO-Go
Lightweight open-source CLI for checking meta tags and broken links
Sitemap Generator
Open-source tool to generate massive XML sitemaps for enterprise eCommerce catalogs
Sitemapper
High-performance open-source sitemap generator and validator for large-scale enterprise web architectures
Taudio
Open-source repository for audio processing and metadata extraction for SEO
Text-to-JSON Schema
Open source utility to convert unstructured text into valid Schema.org JSON-LD snippets
Ubersuggest-Open
Community maintained scraper for keyword suggestions and search volume data
YouTube Comment Suite
Open-source tool for managing and analyzing thousands of YouTube video comments
Content-Auditor
Open-source Python script for analyzing site-wide content quality and planning updates
Page Ruler Redux
Measure page elements in pixels for layout and UX optimization
Hemingway-Open
Open-source implementation of writing analysis for readability and SEO
SEO-Scraper-JS
High-performance Javascript scraper designed to extract SEO metrics and meta data from large batches of URLs
Serp-Api (Open Source)
Self-hosted SERP scraping API using Puppeteer and Node.js for private data pipelines
Site-Inspector
Open-source tool to find broken links and common technical SEO errors
User-Agent Switcher
Switch between different user-agents to test mobile and bot crawling
Wordsmith
An open-source script alternative for Google Sheets to handle bulk keyword research and SERP analysis via API
Quick Source Viewer
A refined view of the page source with syntax highlighting and resource grouping
Imager
Simple and lightweight image manipulation library for PHP with a focus on ease of use
Tanaguru
Open-source automated accessibility assessment tool focused on high-volume website testing
Extensions Manager
Toggle browser extensions on and off quickly to avoid memory bloat and conflicts
Copy All URLs
Quickly copy all open tab URLs to the clipboard in multiple formats for research sharing
Page Load Time
Minimalist extension that displays the exact loading time of a webpage in the browser toolbar
Stet
Open-source privacy-first website analytics server designed for high-performance self-hosting
Alt Text Tester
Hover over images to see their alt text and identify missing attributes for accessibility
Link Grabber
Extracts all links from a webpage and displays them in a sortable list
A11ygator
Web tool and API that analyzes a page and provides a clear list of accessibility improvements
Analog
Extremely fast log file analyzer that provides basic traffic and bot statistics
BacklinkChecker
Open-source Python utility for monitoring backlink status and detecting removed links
Delta
Simple desktop app to monitor web elements and get native notifications
Go-SEOSpider
CLI-based website crawler written in Go for high-speed technical SEO audits
Keywords Every Day
Open source alternative for keyword and competitor lookup
NAP Hunter
Chrome extension to search for Name, Address, and Phone inconsistencies across the web
NapHunter
Open-source script for finding NAP inconsistencies for local SEO audits
SSEO
Open source technical SEO analyzer for identifying onsite optimizations
Schema.org Scanner
Open source CLI tool for scanning entire domains for schema health
Skeleton
Open-source crawler for identifying broken links and SEO issues in a CLI environment
View Image Info
Extracts image metadata including alt text, file size, and dimensions directly from the browser.
Wiki-Audit Tool
Open-source script for checking Wikipedia and Wikidata mentions to audit brand authority
Word Counter Plus
Simple tool to count words and characters in selected text on any web page
NoFollow Simple
Highlights all nofollow and sponsored links on a page to visualize internal link equity
WordCounter
Quickly count words and characters of selected text on any webpage
Quick SEO
A lightweight toolbar for quick access to domain and page SEO metrics
Simple URL Copy
One-click to copy current URL or all URLs from a window with clean formatting
Suckless SEO Checker
Ultra-minimalist CLI tool for checking basic on-page SEO factors
HARO SEO Filters
Open-source script to filter HARO emails for high-relevance link opportunities
SeoToaster
Open-source CMS and e-commerce platform with built-in scheduled SEO crawling
WP Local SEO
WordPress plugin for local business schema and map integration
hreflang-php
Lightweight PHP library for generating and managing SEO-compliant hreflang HTML tags in backend applications
Favicon Grabber API
Fast and reliable JSON API to retrieve a website's favicon and touch icons for use in applications
Sitemap XML Editor
Open source desktop editor for Windows to manage and edit extremely large XML sitemaps offline