.. getsitemap documentation master file, created by
sphinx-quickstart on Sun Oct 9 17:01:22 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to getsitemap's documentation!
======================================
`getsitemap` is a simple Python library that retrieves all the URLs in the sitemaps associated with a website.
This library may be useful when building a web search crawler, an SEO validation tool, or a sitemap monitor.
You can download `getsitemap` using the following comamnd:
.. code-block:: bash
pip install getsitemap
See the documentation for `getsitemap` below.
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
Get all URLs in a website's sitemaps
------------------------------------
The `retrieve_sitemap_urls()` function returns all URLs found in a website's sitemaps.
This function:
1. Checks for `Sitemap` directives in a website's robots.txt file. All sitemap found are crawled recursively.
2. Checks for the presence of a sitemap.xml file. If one is found, it is crawled recursively.
3. Merges the results of all checks to return either a list of all URLs or a dictionary that maps each URL to the sitemap in which it was found.
.. autofunction:: getsitemap.retrieve_sitemap_urls
To get a list of all sitemaps in a website, you can append `.keys()` to the result of this function, as long as you specify `as_flat_list=False` in the command arguments.
Please note this function may take time to run if there are a lot of sitemaps to crawl. This is because a network request has to be made for each URL.
Get all URLs in a single sitemap
--------------------------------
The `get_individual_sitemap()` function returns all URLs found in a single sitemap.
With the `recurse=True` argument, this function will also crawl all sitemaps found in the sitemap and do so recursively.
If `recurse=False`, this function will return only the list of URLs in the provided sitemap file. This will include sitemap files if you use this function on a sitemap index.
.. autofunction:: getsitemap.get_individual_sitemap
Changelog
=========
All notable changes to this project will be documented in this file.
The format is based on `Keep a
Changelog `__, and this project
adheres to `Semantic
Versioning `__.
[0.1.1] - 2022-10-09
--------------------
Added
~~~~~
- Refactored ``get_individual_sitemap`` to allow use as a public
function.
- Documentation for the ``get_individual_sitemap`` function.
.. _section-1:
[0.1.0] - 2022-10-09
--------------------
.. _added-1:
Added
~~~~~
- Initial release of ``getsitemap``.
- ``retrieve_sitemap_urls`` to retrieve all the URLs from a sitemap.
- Documentation for the ``retrieve_sitemap_urls`` function.