site_access_1

click to enlarge

There is no [span class=”hue”]requirement [/span] to do anything with this feature. You could simply leave it as it is after you install kiwitrees. It was designed so that kiwitrees will work perfectly if you just ignore it.

The page consists of two parts:

  • The top half of the page are the rules that are in place. In this image (right) they are the defaults that come with kiwitrees and allow “normal” browsers to access the site. Each part of each rule can be edited and the whole of each rule deleted.
    • [span class=”hue”]Do  not delete the default default browser access rules[/span] as this may prevent access to your site from standard browsers, including your own!
    • At the bottom of the table is a “Reset” button. This will delete ALL the rules you have in place and replace them with the defaults that come with a new installation of kiwitrees.
  • The bottom half are the “unrecognised” visitors that [span class=”hue”]might[/span] require a rule creating later to manage their access. You convert them to a rule in the top half by clicking on one of the three green ticks to the right, to set a rule to allow, deny, or limit (robots) access to that IP address or address range.

Exactly what you do with this information depends on how aggressively you want to restrict access to the site.

If you do want to actively manage things, you should start by finding out a little more information about each of your “unrecognised visitors”:

  • Use whois.net to find out who owns the IP address.
  • Use user-agent-string.info to find out a little more about the user-agent string.
  • Check your webserver logs to see if it visits too frequently, and what sort of requests it makes.

Remember also that user-agent strings are easily faked. Anyone can send a request with a UA string of: Mozilla/5.0 (compatible; Googlebot/2.1; +www.google.com/bot.html)

[small_full]Robots.txt file [/small_full]

site_access_3

click to enlarge

Most robots are harmless. They only tend to cause problems when they visit too frequently, and consume too many server resources. But it is important to identify them. For example, you don’t want them to have access to the calendar page. By following it’s links they would end up with an almost infinite number of pages. The “Site Access Rules” page helps to control access through universally allowing, denying, or limiting access to IP address ranges, but you also need to have a “robots.txt” file in place to more specifically control which pages robots are denied access to.

[span class=”hue”]If you have had a robots.txt file in place for some time you should review it. The latest version of kiwitrees (3.1) has a new example robots.txt file (robots-example.txt) included in its installation package.[/span] This reverses the previous files “white-list” rules in favour of a simpler “black-list approach. This ensures search engines such as Google can access all site resources in order to accurately assess SEO rankings; while still denying them access to pages that are not necessary for them and that might consume excessive server resources if crawled repeatedly. These are mainly pages with large numbers of links to, for example, hundreds of individual pages that the robot will find anyway.

This file needs to be copied, renamed, and placed in your web site domain root directory,  such as “www.example.com/robots.txt”. It will not work in a subdirectory, such as “www.example.com/webtrees/robots.txt”. If you do need to move it, remember to adjust the paths as well, e.g. “Disallow: /login.php” becomes “Disallow: /kiwitrees/login.php”.