haloman30


Graphic/Web Designer, System Administrator, Programmer, Community Manager, Texture Artist

Home
Blog
  • All Categories
  • Search
Tools
  • HeidiSQL Password Decoder
  • Unity Font Tool
  • IP Checker
Projects
  • Chaotic United
  • Elaztek Studios
  • Ruin Community
  • Personal Projects
  • Forge World Ultimate (Forerunner.map)
  • DiscourseDownloader
  • Domains
  • Miscellaneous Pages
Close Home
Blog
Blog All Categories Search
Tools
HeidiSQL Password Decoder Unity Font Tool IP Checker
Projects
Chaotic United Elaztek Studios Ruin Community
Personal Projects
Forge World Ultimate (Forerunner.map) DiscourseDownloader Domains Miscellaneous Pages


Configuration & Switches


DiscourseDownloader features a wide range of configuration options, allowing you to customize your archive in as many ways as possible.

For the most part, the majority of the default options will be perfectly sufficient for most. However, if you'd like to further customize your archive, read below for the complete list of options available.

website.cfg

This is the primary configuration file used to control most aspects of the application. The available options are grouped into "sections", making the file easier to read.

Website Config Basic Download Settings (website_config)

This section contains most of the options that a typical user would need to adjust.

Name Type Default Value Description
website_url string https://forums.halowaypoint.com The base URL of the forum to download.
site_directory_root string ./forums.halowaypoint.com/ The path to store all downloaded JSON and built HTML content. Can be relative or absolute.
skip_download bool false Whether or not to skip the download step. Set this to true if you've already performed a complete download and simply want to build (or rebuild) the HTML website from that original data.
download_users bool true Whether or not to download user profiles and related data.
download_topics bool true Whether or not to download forum topics and related data.
download_misc bool true Whether or not to download miscellaneous site information.
perform_html_build bool false Whether or not to perform the HTML website build step. If you wish to download content from a website, but do not want to build the HTML website, set this to false.

Networking Settings (networking)

This section contains options relating to networking features of the application. These are used to control and alter how the application interacts with the forum's server, handles errors, and so on.

Name Type Default Value Description
max_http_retries int 60 The maximum number of times a single request will be retried before giving up and returning an error. Typically, if this limit is reached and an error code is returned, the requested content is skipped.

Note that this option's value can be overridden in some cases by other options (such as fail_on_403, fail_on_404, and max_404s).
http_retry_use_backoff bool true Whether or not to use a backoff factor when experiencing failed requests. If enabled, the application will wait an increasing amount of time before retrying a request. The formula for the request time is as follows:

delay = http_backoff_increment * (retry_count + 1)
  • delay - The total delay time
  • http_backoff_increment - The value of the http_backoff_increment option
  • retry_count - The amount of retries performed for a given request
http_backoff_increment int 5 The amount of time that should be added to the retry delay after each failed retry. Only applies when http_retry_use_backoff is set to true.
override_user_agent bool false Whether or not to use a custom user agent. While generally unnecessary, it could help in the case where a forum may block certain unrecognized user agents, like the one that DiscourseDownloader uses by default.

The default user agent for DiscourseDownloader is DiscourseDL v{VER}, replacing {VER} with the application version - such as 1.0.0.
user_agent string The custom user agent string to use. Only used if override_user_agent is set to true
request_retry_delay int 5 The standard delay for retrying a request. If http_retry_use_backoff is enabled, then this will only be used for the first failed retry. Otherwise, this delay will be used for each failed retry.
fail_on_403 string true If enabled, an HTTP 403 response will automatically be treated as a failure, and will not be retried. This defaults to true, because the most likely reason that a 403 would be encountered is due to contacting an API which the user does not have access to.

If you are downloading your own forum and experience a 403, you may look into enabling certain API features for guest users. Alternatively, you may also try using the cookie settings (detailed further down).
fail_on_404 string false If enabled, an HTTP 404 response will automatically be treated as a failure, and will not be retried.
max_404s string 5 The maximum number of HTTP 404 responses that can be encountered before treating it as a failure. Has no effect if fail_on_404 is enabled.

Download Settings (download)

This section contains options for fine-tuning how content is downloaded, as well as options for controlling how partial downloads are handled.

Name Type Default Value Description
resume_download bool true Whether or not to attempt resuming a partial download.
enable_url_caching bool true Whether or not to cache URL lists to disk after they are collected.

When downloading certain content (such as topics), the application will build a large initial list of item URLs before downloading any individual items. In order to save time during a resume, this URL list can be stored on disk. Keep in mind, however, that if this URL cache is used, the application will NOT attempt to fetch any newer URLS - and so there is the possibility of missing content.
enable_data_caching bool true Whether or not to cache certain data to disk during the download process.

When downloading a very large forum, the application's memory usage can become extreme by default. Enabling this option does slightly increase the overall runtime, but allows the application to free up memory after certain steps are finished (ie, a forum category is fully downloaded). This data is then loaded from disk again later during the sanity checks as needed (detailed below).
delete_caches_on_finish bool false Whether or not to delete URL or data caches after a download has completed. Not yet implemented.
redownload_if_missing_cache bool false Not used.
sanity_check_on_finish bool true Whether or not to perform sanity checks after certain download steps are completed. This option by default only enables the basic sanity check, which simply verifies topic and post counts match between the downloaded topic list (in memory) compared to the topic/post counts that are reported by the Discourse API.

A more in-depth sanity check can be performed by enabling thorough_sanity_check. See below for more information.
thorough_sanity_check bool true Whether or not to perform the in-depth sanity check.

This more advanced check will actually go through all data in memory and ensure that a .json file for each topic and post exist, within each category. In the event that topics or posts are missing, the application will try to re-download that content.
download_skip_existing_categories bool false Whether or not to skip existing category folders when downloading.

If enabled, the application will check for a category folder prior to any download steps. If the folder exists, the category will be skipped. This is disabled by default, as it could potentially result in categories being incomplete, and thus, result in an incomplete download.
download_skip_existing_topics bool false Whether or not to skip existing topic folders when downloading.

If enabled, the application will check for a topic folder prior to downloading the topic (and its post data). If the folder exists, the topic will be skipped. This is disabled by default, as it could potentially result in topics being incomplete and missing posts, and thus, result in an incomplete download.
download_skip_existing_posts bool true Whether or not to skip existing topic posts when downloading.

If enabled, existing post files will be skipped when downloading a topic. While this does pose the potential risk for skipping edited posts, this can help reduce download times significantly. If you are not concerned about reducing download times, or simply want to know for sure that any edits made to any posts are downloaded, set this to false. Note that this will effectively disable download resuming, as all existing content will be downloaded again.

Forum Topic Download Settings (forums)

This section contains options specific to downloading forum categories, topics, and posts.

Name Type Default Value Description
max_get_more_topics int -1 Unused.
max_posts_per_request int 20 Unused.
topic_url_collection_notify_interval int 15 Controls how often update messages are printed to console and the log file when downloading topic URLs. After this many topic URL requests have been performed, a notification will be posted.
download_subcategory_topics bool false Whether or not to exclude subcategory topic URLs when building a topic URL list for a category. This should generally be left disabled, as subcategories are downloaded separately into their own folders. With this enabled, a potentially large amount of content will be duplicated, both in the downloaded JSON data and the resulting HTML archive website.
use_category_id_filter bool false Whether or not to use the configured category ID filter when downloading categories.

If enabled, only the category IDs listed in the category_id_filter option will be downloaded. Note that this behavior is reversed when use_filter_as_blacklist is enabled.
category_id_filter string 5,10 A list of category IDs to download, separated by commas. Any other categories are skipped. Only used if use_category_id_filter is enabled.
use_filter_as_blacklist bool false If enabled, reverses the behavior of the category ID filter. The filter will instead act as a blacklist - meaning that any categories listed will be excluded from the download, and all other categories will be downloaded.
strict_topic_count_checks bool false Whether or not topic counts should match exactly when performing topic count checks.

If disabled, a download topic count that is larger than the reported topic count from the API will not be treated as a mismatch. This option is disabled by default, as it appears that the Discourse API will sometimes not report pinned topics within the total topic count for a category.
download_all_tag_extras bool false Whether or not to download the complete list of topics that have a particular tag. This is usually unnecessary, as each topic is already downloaded separately.
max_skipped_topic_urls int 100 The maximum amount of topic URLs to skip before stopping topic URL list building. Topic URLs are only counted as skipped when the category ID does not match (ie, when downloading a category with subcategories). After this many skipped URLs, it is assumed that all remaining topic URLs belong to subcategories, rather than the parent category.

User Profile Download Settings (users)

This section contains options for controlling how user profiles are downloaded.

Name Type Default Value Description
download_all_user_actions bool true Whether or not to download all user actions. If enabled on a large forum, this could substantially increase the time required to download all profiles.
download_all_avatar_sizes bool true Whether or not to download all avatar sizes. If disabled, only the highest resolution avatar available (360x360) will be downloaded.
download_private_messages bool false Whether or not to attempt downloading a user's private messages. Not yet implemented.

Local Directory Settings (paths)

This section contains options for determining where downloaded content is stored on disk.

Name Type Default Value Description
html_dir string export/ The directory used to store generated HTML archive content. This is relative to site_directory_root.
json_dir string json/ The directory used to store downloaded JSON data from the API. This is relative to site_directory_root.

Cookie Settings (cookies)

This section contains options for specifying cookies. These can be used to provide authentication with the API under a specific user account. This may be desired in cases where you want to download content that is only available when logged in, or only accessible to certain groups.

Name Type Default Value Description
cookie_name string _t The name of the cookie to provide to the server.
cookie string The value of the cookie to provide to the server.

Miscellaneous Settings (misc)

This section contains miscellaneous options that don't fit into the other categories.

Name Type Default Value Description
disable_long_finish_message bool false Whether or not to disable the long message shown after the application has finished all tasks. The extended message provides information on uploading and distributing the resulting archive on websites such as archive.org, as well as other useful information for those new to the application and/or to website archival.

If set to true, a shorter message is printed to the console instead.
log_level_debug bool false Whether or not to print log level debug messages in console during startup. This will show a sample message in each log level. Used for debug/development purposes.

Command-Line Switches

In addition to the configuration file, the application also allows for certain options to be controlled via command-line switches. All available switches are detailed below.

Name Flags Description
-config_debug Instructs the application to show additional debug messages when reading configuration files.

Navigation


Home Features Overview Configuration & Switches History Credits
Return to Project Homepage
Theme: Copyright (c) haloman30 1999 - 2025