RobotUA — specify user-agents that will be classified as crawler bots (search engines)
The RobotUA
directive defines a list of useragent strings which will be
classified as crawler robots (search engines), and cause Interchange to alter its
behavior to improve the chance of Interchange-served content being crawled
and indexed.
Note that this directive (and all other work done to identify robots) only serves to improve the way in which Interchange pages are indexed, and to reduce server overhead for clients that don't require our full attention in the way humans do (for example, session information is not kept around for spider bots). Using this to "tune" the actual page content depending on a crawler visiting does not earn you extra points, and may in fact be detected by the robot and punished.
It's important to note that the directive accepts a wildcard list similar
to globbing —
*
represents any number of characters, while
?
represents a single character.
Example: Defining RobotUA
RobotUA <<EOR ATN_Worldwide, AltaVista, Arachnoidea, Aranha, Architext, Ask, Atomz, BackRub, Builder, CMC, Contact, Digital*Integrity, Directory, EZResult, Excite, Ferret, Fireball, Google, Gromit, Gulliver, Harvest, Hubater, H?m?h?kki, INGRID, IncyWincy, Jack, KIT*Fireball, Kototoi, LWP, Lycos, MegaSheep, Mercator, Nazilla, NetMechanic, NetResearchServer, NetScoop, ParaSite, Refiner, RoboDude, Rover, Rutgers, Scooter, Slurp, Spyder, T-H-U-N-D-E-R-S-T-O-N-E, Toutatis, Tv*Merc, Valkyrie, Voyager, WIRE, Walker, Wget, WhizBang, Wire, Wombat, Yahoo, Yandex, ZyBorg, appie, asterias, bot, contact, crawl, collector, fido, find, gazz, grabber, griffon, archiver, legs, marvin, mirago, moget, newscan, seek, speedy, spider, suke, tarantula, agent, topiclink, whowhere, winona, worm, xtreme, EOR
Interchange 5.9.0:
Source: lib/Vend/Config.pm
Line 3853 (context shows lines 3853-3857)
sub parse_list_wildcard { my $value = get_wildcard_list(@_,0); return '' unless length($value); return qr/$value/i; }