1 <?xml version="1.0" encoding="ISO-8859-1"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head><!--
4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
5 This file is generated from xml source: DO NOT EDIT
6 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
8 <title>URL Rewriting Guide - Advanced topics - Apache HTTP Server</title>
9 <link href="../style/css/manual.css" rel="stylesheet" media="all" type="text/css" title="Main stylesheet" />
10 <link href="../style/css/manual-loose-100pc.css" rel="alternate stylesheet" media="all" type="text/css" title="No Sidebar - Default font size" />
11 <link href="../style/css/manual-print.css" rel="stylesheet" media="print" type="text/css" />
12 <link href="../images/favicon.ico" rel="shortcut icon" /></head>
13 <body id="manual-page"><div id="page-header">
14 <p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p>
15 <p class="apache">Apache HTTP Server Version 2.0</p>
16 <img alt="" src="../images/feather.gif" /></div>
17 <div class="up"><a href="./"><img title="<-" alt="<-" src="../images/left.gif" /></a></div>
19 <a href="http://www.apache.org/">Apache</a> > <a href="http://httpd.apache.org/">HTTP Server</a> > <a href="http://httpd.apache.org/docs/">Documentation</a> > <a href="../">Version 2.0</a></div><div id="page-content"><div id="preamble"><h1>URL Rewriting Guide - Advanced topics</h1>
21 <p><span>Available Languages: </span><a href="../en/rewrite/rewrite_guide_advanced.html" title="English"> en </a></p>
25 <p>This document supplements the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
26 <a href="../mod/mod_rewrite.html">reference documentation</a>.
27 It describes how one can use Apache's <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
28 to solve typical URL-based problems with which webmasters are
29 commonly confronted. We give detailed descriptions on how to
30 solve each problem by configuring URL rewriting rulesets.</p>
32 <div class="warning">ATTENTION: Depending on your server configuration
33 it may be necessary to adjust the examples for your
34 situation, e.g., adding the <code>[PT]</code> flag if
35 using <code class="module"><a href="../mod/mod_alias.html">mod_alias</a></code> and
36 <code class="module"><a href="../mod/mod_userdir.html">mod_userdir</a></code>, etc. Or rewriting a ruleset
37 to work in <code>.htaccess</code> context instead
38 of per-server context. Always try to understand what a
39 particular ruleset really does before you use it; this
40 avoids many problems.</div>
43 <div id="quickview"><ul id="toc"><li><img alt="" src="../images/down.gif" /> <a href="#cluster">Web Cluster with Consistent URL Space</a></li>
44 <li><img alt="" src="../images/down.gif" /> <a href="#structuredhomedirs">Structured Homedirs</a></li>
45 <li><img alt="" src="../images/down.gif" /> <a href="#filereorg">Filesystem Reorganization</a></li>
46 <li><img alt="" src="../images/down.gif" /> <a href="#redirect404">Redirect Failing URLs to Another Web Server</a></li>
47 <li><img alt="" src="../images/down.gif" /> Archive Access Multiplexer</li>
48 <li><img alt="" src="../images/down.gif" /> <a href="#content">Content Handling</a></li>
49 <li><img alt="" src="../images/down.gif" /> <a href="#access">Access Restriction</a></li>
50 </ul><h3>See also</h3><ul class="seealso"><li><a href="../mod/mod_rewrite.html">Module
51 documentation</a></li><li><a href="rewrite_intro.html">mod_rewrite
52 introduction</a></li><li><a href="rewrite_tech.html">Technical details</a></li></ul></div>
53 <div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
55 <h2><a name="cluster" id="cluster">Web Cluster with Consistent URL Space</a></h2>
63 <p>We want to create a homogeneous and consistent URL
64 layout across all WWW servers on an Intranet web cluster, i.e.,
65 all URLs (by definition server-local and thus
66 server-dependent!) become server <em>independent</em>!
67 What we want is to give the WWW namespace a single consistent
68 layout: no URL should refer to
69 any particular target server. The cluster itself
70 should connect users automatically to a physical target
71 host as needed, invisibly.</p>
77 <p>First, the knowledge of the target servers comes from
78 (distributed) external maps which contain information on
79 where our users, groups, and entities reside. They have the
82 <div class="example"><pre>
88 <p>We put them into files <code>map.xxx-to-host</code>.
89 Second we need to instruct all servers to redirect URLs
92 <div class="example"><pre>
100 <div class="example"><pre>
101 http://physical-host/u/user/anypath
102 http://physical-host/g/group/anypath
103 http://physical-host/e/entity/anypath
106 <p>when any URL path need not be valid on every server. The
107 following ruleset does this for us with the help of the map
108 files (assuming that server0 is a default server which
109 will be used if a user has no entry in the map):</p>
111 <div class="example"><pre>
114 RewriteMap user-to-host txt:/path/to/map.user-to-host
115 RewriteMap group-to-host txt:/path/to/map.group-to-host
116 RewriteMap entity-to-host txt:/path/to/map.entity-to-host
118 RewriteRule ^/u/<strong>([^/]+)</strong>/?(.*) http://<strong>${user-to-host:$1|server0}</strong>/u/$1/$2
119 RewriteRule ^/g/<strong>([^/]+)</strong>/?(.*) http://<strong>${group-to-host:$1|server0}</strong>/g/$1/$2
120 RewriteRule ^/e/<strong>([^/]+)</strong>/?(.*) http://<strong>${entity-to-host:$1|server0}</strong>/e/$1/$2
122 RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/
123 RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
128 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
129 <div class="section">
130 <h2><a name="structuredhomedirs" id="structuredhomedirs">Structured Homedirs</a></h2>
135 <dt>Description:</dt>
138 <p>Some sites with thousands of users use a
139 structured homedir layout, <em>i.e.</em> each homedir is in a
140 subdirectory which begins (for instance) with the first
141 character of the username. So, <code>/~foo/anypath</code>
142 is <code>/home/<strong>f</strong>/foo/.www/anypath</code>
143 while <code>/~bar/anypath</code> is
144 <code>/home/<strong>b</strong>/bar/.www/anypath</code>.</p>
150 <p>We use the following ruleset to expand the tilde URLs
151 into the above layout.</p>
153 <div class="example"><pre>
155 RewriteRule ^/~(<strong>([a-z])</strong>[a-z0-9]+)(.*) /home/<strong>$2</strong>/$1/.www$3
160 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
161 <div class="section">
162 <h2><a name="filereorg" id="filereorg">Filesystem Reorganization</a></h2>
167 <dt>Description:</dt>
170 <p>This really is a hardcore example: a killer application
171 which heavily uses per-directory
172 <code>RewriteRules</code> to get a smooth look and feel
173 on the Web while its data structure is never touched or
174 adjusted. Background: <strong><em>net.sw</em></strong> is
175 my archive of freely available Unix software packages,
176 which I started to collect in 1992. It is both my hobby
177 and job to do this, because while I'm studying computer
178 science I have also worked for many years as a system and
179 network administrator in my spare time. Every week I need
180 some sort of software so I created a deep hierarchy of
181 directories where I stored the packages:</p>
183 <div class="example"><pre>
184 drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/
185 drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/
186 drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/
187 drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/
188 drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/
189 drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/
190 drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/
191 drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/
192 drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/
193 drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/
194 drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/
195 drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/
196 drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/
197 drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/
198 drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/
199 drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/
202 <p>In July 1996 I decided to make this archive public to
203 the world via a nice Web interface. "Nice" means that I
204 wanted to offer an interface where you can browse
205 directly through the archive hierarchy. And "nice" means
206 that I didn't want to change anything inside this
207 hierarchy - not even by putting some CGI scripts at the
208 top of it. Why? Because the above structure should later be
209 accessible via FTP as well, and I didn't want any
210 Web or CGI stuff mixed in there.</p>
216 <p>The solution has two parts: The first is a set of CGI
217 scripts which create all the pages at all directory
218 levels on-the-fly. I put them under
219 <code>/e/netsw/.www/</code> as follows:</p>
221 <div class="example"><pre>
222 -rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl
223 drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/
224 -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE
225 -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO
226 -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html
227 -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl
228 -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi
229 -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi
230 drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
231 -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi
232 -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi
233 -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi
234 -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst
237 <p>The <code>DATA/</code> subdirectory holds the above
238 directory structure, <em>i.e.</em> the real
239 <strong><em>net.sw</em></strong> stuff, and gets
240 automatically updated via <code>rdist</code> from time to
241 time. The second part of the problem remains: how to link
242 these two structures together into one smooth-looking URL
243 tree? We want to hide the <code>DATA/</code> directory
244 from the user while running the appropriate CGI scripts
245 for the various URLs. Here is the solution: first I put
246 the following into the per-directory configuration file
247 in the <code class="directive"><a href="../mod/core.html#documentroot">DocumentRoot</a></code>
248 of the server to rewrite the public URL path
249 <code>/net.sw/</code> to the internal path
250 <code>/e/netsw</code>:</p>
252 <div class="example"><pre>
253 RewriteRule ^net.sw$ net.sw/ [R]
254 RewriteRule ^net.sw/(.*)$ e/netsw/$1
257 <p>The first rule is for requests which miss the trailing
258 slash! The second rule does the real thing. And then
259 comes the killer configuration which stays in the
260 per-directory config file
261 <code>/e/netsw/.www/.wwwacl</code>:</p>
263 <div class="example"><pre>
264 Options ExecCGI FollowSymLinks Includes MultiViews
268 # we are reached via /net.sw/ prefix
271 # first we rewrite the root dir to
272 # the handling cgi script
273 RewriteRule ^$ netsw-home.cgi [L]
274 RewriteRule ^index\.html$ netsw-home.cgi [L]
276 # strip out the subdirs when
277 # the browser requests us from perdir pages
278 RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L]
280 # and now break the rewriting for local files
281 RewriteRule ^netsw-home\.cgi.* - [L]
282 RewriteRule ^netsw-changes\.cgi.* - [L]
283 RewriteRule ^netsw-search\.cgi.* - [L]
284 RewriteRule ^netsw-tree\.cgi$ - [L]
285 RewriteRule ^netsw-about\.html$ - [L]
286 RewriteRule ^netsw-img/.*$ - [L]
288 # anything else is a subdir which gets handled
289 # by another cgi script
290 RewriteRule !^netsw-lsdir\.cgi.* - [C]
291 RewriteRule (.*) netsw-lsdir.cgi/$1
294 <p>Some hints for interpretation:</p>
297 <li>Notice the <code>L</code> (last) flag and no
298 substitution field ('<code>-</code>') in the fourth part</li>
300 <li>Notice the <code>!</code> (not) character and
301 the <code>C</code> (chain) flag at the first rule
302 in the last part</li>
304 <li>Notice the catch-all pattern in the last rule</li>
309 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
310 <div class="section">
311 <h2><a name="redirect404" id="redirect404">Redirect Failing URLs to Another Web Server</a></h2>
316 <dt>Description:</dt>
319 <p>A typical FAQ about URL rewriting is how to redirect
320 failing requests on webserver A to webserver B. Usually
321 this is done via <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI scripts in Perl, but
322 there is also a <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> solution.
323 But note that this performs more poorly than using an
324 <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code>
331 <p>The first solution has the best performance but less
332 flexibility, and is less safe:</p>
334 <div class="example"><pre>
336 RewriteCond /your/docroot/%{REQUEST_FILENAME} <strong>!-f</strong>
337 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
340 <p>The problem here is that this will only work for pages
341 inside the <code class="directive"><a href="../mod/core.html#documentroot">DocumentRoot</a></code>. While you can add more
342 Conditions (for instance to also handle homedirs, etc.)
343 there is a better variant:</p>
345 <div class="example"><pre>
347 RewriteCond %{REQUEST_URI} <strong>!-U</strong>
348 RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1
351 <p>This uses the URL look-ahead feature of <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>.
352 The result is that this will work for all types of URLs
353 and is safe. But it does have a performance impact on
354 the web server, because for every request there is one
355 more internal subrequest. So, if your web server runs on a
356 powerful CPU, use this one. If it is a slow machine, use
357 the first approach or better an <code class="directive"><a href="../mod/core.html#errordocument">ErrorDocument</a></code> CGI script.</p>
361 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
362 <div class="section">
363 <h2>Archive Access Multiplexer</h2>
368 <dt>Description:</dt>
371 <p>Do you know the great CPAN (Comprehensive Perl Archive
372 Network) under <a href="http://www.perl.com/CPAN">http://www.perl.com/CPAN</a>?
373 CPAN automatically redirects browsers to one of many FTP
374 servers around the world (generally one near the requesting
375 client); each server carries a full CPAN mirror. This is
376 effectively an FTP access multiplexing service.
377 CPAN runs via CGI scripts, but how could a similar approach
378 be implemented via <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>?</p>
384 <p>First we notice that as of version 3.0.0,
385 <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code> can
386 also use the "<code>ftp:</code>" scheme on redirects.
387 And second, the location approximation can be done by a
388 <code class="directive"><a href="../mod/mod_rewrite.html#rewritemap">RewriteMap</a></code>
389 over the top-level domain of the client.
390 With a tricky chained ruleset we can use this top-level
391 domain as a key to our multiplexing map.</p>
393 <div class="example"><pre>
395 RewriteMap multiplex txt:/path/to/map.cxan
396 RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C]
397 RewriteRule ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$ ${multiplex:<strong>$1</strong>|ftp.default.dom}$2 [R,L]
400 <div class="example"><pre>
402 ## map.cxan -- Multiplexing Map for CxAN
405 de ftp://ftp.cxan.de/CxAN/
406 uk ftp://ftp.cxan.uk/CxAN/
407 com ftp://ftp.cxan.com/CxAN/
414 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
415 <div class="section">
416 <h2><a name="content" id="content">Content Handling</a></h2>
420 <h3>Browser Dependent Content</h3>
425 <dt>Description:</dt>
428 <p>At least for important top-level pages it is sometimes
429 necessary to provide the optimum of browser dependent
430 content, i.e., one has to provide one version for
431 current browsers, a different version for the Lynx and text-mode
432 browsers, and another for other browsers.</p>
438 <p>We cannot use content negotiation because the browsers do
439 not provide their type in that form. Instead we have to
440 act on the HTTP header "User-Agent". The following config
441 does the following: If the HTTP header "User-Agent"
442 begins with "Mozilla/3", the page <code>foo.html</code>
443 is rewritten to <code>foo.NS.html</code> and the
444 rewriting stops. If the browser is "Lynx" or "Mozilla" of
445 version 1 or 2, the URL becomes <code>foo.20.html</code>.
446 All other browsers receive page <code>foo.32.html</code>.
447 This is done with the following ruleset:</p>
449 <div class="example"><pre>
450 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/3</strong>.*
451 RewriteRule ^foo\.html$ foo.<strong>NS</strong>.html [<strong>L</strong>]
453 RewriteCond %{HTTP_USER_AGENT} ^<strong>Lynx/</strong>.* [OR]
454 RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/[12]</strong>.*
455 RewriteRule ^foo\.html$ foo.<strong>20</strong>.html [<strong>L</strong>]
457 RewriteRule ^foo\.html$ foo.<strong>32</strong>.html [<strong>L</strong>]
464 <h3>Dynamic Mirror</h3>
469 <dt>Description:</dt>
472 <p>Assume there are nice web pages on remote hosts we want
473 to bring into our namespace. For FTP servers we would use
474 the <code>mirror</code> program which actually maintains an
475 explicit up-to-date copy of the remote data on the local
476 machine. For a web server we could use the program
477 <code>webcopy</code> which runs via HTTP. But both
478 techniques have a major drawback: The local copy is
479 always only as up-to-date as the last time we ran the program. It
480 would be much better if the mirror was not a static one we
481 have to establish explicitly. Instead we want a dynamic
482 mirror with data which gets updated automatically
483 as needed on the remote host(s).</p>
489 <p>To provide this feature we map the remote web page or even
490 the complete remote web area to our namespace by the use
491 of the <dfn>Proxy Throughput</dfn> feature
492 (flag <code>[P]</code>):</p>
494 <div class="example"><pre>
497 RewriteRule ^<strong>hotsheet/</strong>(.*)$ <strong>http://www.tstimpreso.com/hotsheet/</strong>$1 [<strong>P</strong>]
500 <div class="example"><pre>
503 RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>]
510 <h3>Reverse Dynamic Mirror</h3>
515 <dt>Description:</dt>
522 <div class="example"><pre>
524 RewriteCond /mirror/of/remotesite/$1 -U
525 RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
532 <h3>Retrieve Missing Data from Intranet</h3>
537 <dt>Description:</dt>
540 <p>This is a tricky way of virtually running a corporate
541 (external) Internet web server
542 (<code>www.quux-corp.dom</code>), while actually keeping
543 and maintaining its data on an (internal) Intranet web server
544 (<code>www2.quux-corp.dom</code>) which is protected by a
545 firewall. The trick is that the external web server retrieves
546 the requested data on-the-fly from the internal
553 <p>First, we must make sure that our firewall still
554 protects the internal web server and only the
555 external web server is allowed to retrieve data from it.
556 On a packet-filtering firewall, for instance, we could
557 configure a firewall ruleset like the following:</p>
559 <div class="example"><pre>
560 <strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong>
561 <strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong>
564 <p>Just adjust it to your actual configuration syntax.
565 Now we can establish the <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
566 rules which request the missing data in the background
567 through the proxy throughput feature:</p>
569 <div class="example"><pre>
570 RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2
571 RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong>
572 RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong>
573 RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]
580 <h3>Load Balancing</h3>
585 <dt>Description:</dt>
588 <p>Suppose we want to load balance the traffic to
589 <code>www.foo.com</code> over <code>www[0-5].foo.com</code>
590 (a total of 6 servers). How can this be done?</p>
596 <p>There are many possible solutions for this problem.
597 We will first discuss a common DNS-based method,
598 and then one based on <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>:</p>
602 <strong>DNS Round-Robin</strong>
604 <p>The simplest method for load-balancing is to use
606 Here you just configure <code>www[0-9].foo.com</code>
607 as usual in your DNS with A (address) records, e.g.,</p>
609 <div class="example"><pre>
618 <p>Then you additionally add the following entries:</p>
620 <div class="example"><pre>
628 <p>Now when <code>www.foo.com</code> gets
629 resolved, <code>BIND</code> gives out <code>www0-www5</code>
630 - but in a permutated (rotated) order every time.
631 This way the clients are spread over the various
632 servers. But notice that this is not a perfect load
633 balancing scheme, because DNS resolutions are
634 cached by clients and other nameservers, so
635 once a client has resolved <code>www.foo.com</code>
636 to a particular <code>wwwN.foo.com</code>, all its
637 subsequent requests will continue to go to the same
638 IP (and thus a single server), rather than being
639 distributed across the other available servers. But the
641 okay because the requests are collectively
642 spread over the various web servers.</p>
646 <strong>DNS Load-Balancing</strong>
648 <p>A sophisticated DNS-based method for
649 load-balancing is to use the program
650 <code>lbnamed</code> which can be found at <a href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
651 http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
652 It is a Perl 5 program which, in conjunction with auxilliary
653 tools, provides real load-balancing via
658 <strong>Proxy Throughput Round-Robin</strong>
660 <p>In this variant we use <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
661 and its proxy throughput feature. First we dedicate
662 <code>www0.foo.com</code> to be actually
663 <code>www.foo.com</code> by using a single</p>
665 <div class="example"><pre>
666 www IN CNAME www0.foo.com.
669 <p>entry in the DNS. Then we convert
670 <code>www0.foo.com</code> to a proxy-only server,
671 i.e., we configure this machine so all arriving URLs
672 are simply passed through its internal proxy to one of
673 the 5 other servers (<code>www1-www5</code>). To
674 accomplish this we first establish a ruleset which
675 contacts a load balancing script <code>lb.pl</code>
678 <div class="example"><pre>
680 RewriteMap lb prg:/path/to/lb.pl
681 RewriteRule ^/(.+)$ ${lb:$1} [P,L]
684 <p>Then we write <code>lb.pl</code>:</p>
686 <div class="example"><pre>
689 ## lb.pl -- load balancing script
694 $name = "www"; # the hostname base
695 $first = 1; # the first server (not 0 here, because 0 is myself)
696 $last = 5; # the last server in the round-robin
697 $domain = "foo.dom"; # the domainname
700 while (<STDIN>) {
701 $cnt = (($cnt+1) % ($last+1-$first));
702 $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
703 print "http://$server/$_";
709 <div class="note">A last notice: Why is this useful? Seems like
710 <code>www0.foo.com</code> still is overloaded? The
711 answer is yes, it is overloaded, but with plain proxy
712 throughput requests, only! All SSI, CGI, ePerl, etc.
713 processing is handled done on the other machines.
714 For a complicated site, this may work well. The biggest
715 risk here is that www0 is now a single point of failure --
716 if it crashes, the other servers are inaccessible.</div>
720 <strong>Dedicated Load Balancers</strong>
722 <p>There are more sophisticated solutions, as well. Cisco,
723 F5, and several other companies sell hardware load
724 balancers (typically used in pairs for redundancy), which
725 offer sophisticated load balancing and auto-failover
726 features. There are software packages which offer similar
727 features on commodity hardware, as well. If you have
728 enough money or need, check these out. The <a href="http://vegan.net/lb/">lb-l mailing list</a> is a
729 good place to research.</p>
737 <h3>New MIME-type, New Service</h3>
742 <dt>Description:</dt>
745 <p>On the net there are many nifty CGI programs. But
746 their usage is usually boring, so a lot of webmasters
747 don't use them. Even Apache's Action handler feature for
748 MIME-types is only appropriate when the CGI programs
749 don't need special URLs (actually <code>PATH_INFO</code>
750 and <code>QUERY_STRINGS</code>) as their input. First,
751 let us configure a new file type with extension
752 <code>.scgi</code> (for secure CGI) which will be processed
753 by the popular <code>cgiwrap</code> program. The problem
754 here is that for instance if we use a Homogeneous URL Layout
755 (see above) a file inside the user homedirs might have a URL
756 like <code>/u/user/foo/bar.scgi</code>, but
757 <code>cgiwrap</code> needs URLs in the form
758 <code>/~user/foo/bar.scgi/</code>. The following rule
759 solves the problem:</p>
761 <div class="example"><pre>
762 RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
763 ... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>]
766 <p>Or assume we have some more nifty programs:
767 <code>wwwlog</code> (which displays the
768 <code>access.log</code> for a URL subtree) and
769 <code>wwwidx</code> (which runs Glimpse on a URL
770 subtree). We have to provide the URL area to these
771 programs so they know which area they are really working with.
772 But usually this is complicated, because they may still be
773 requested by the alternate URL form, i.e., typically we would
774 run the <code>swwidx</code> program from within
775 <code>/u/user/foo/</code> via hyperlink to</p>
777 <div class="example"><pre>
778 /internal/cgi/user/swwidx?i=/u/user/foo/
781 <p>which is ugly, because we have to hard-code
782 <strong>both</strong> the location of the area
783 <strong>and</strong> the location of the CGI inside the
784 hyperlink. When we have to reorganize, we spend a
785 lot of time changing the various hyperlinks.</p>
791 <p>The solution here is to provide a special new URL format
792 which automatically leads to the proper CGI invocation.
793 We configure the following:</p>
795 <div class="example"><pre>
796 RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/
797 RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
800 <p>Now the hyperlink to search at
801 <code>/u/user/foo/</code> reads only</p>
803 <div class="example"><pre>
807 <p>which internally gets automatically transformed to</p>
809 <div class="example"><pre>
810 /internal/cgi/user/wwwidx?i=/u/user/foo/
813 <p>The same approach leads to an invocation for the
814 access log CGI program when the hyperlink
815 <code>:log</code> gets used.</p>
821 <h3>On-the-fly Content-Regeneration</h3>
826 <dt>Description:</dt>
829 <p>Here comes a really esoteric feature: Dynamically
830 generated but statically served pages, i.e., pages should be
831 delivered as pure static pages (read from the filesystem
832 and just passed through), but they have to be generated
833 dynamically by the web server if missing. This way you can
834 have CGI-generated pages which are statically served unless an
835 admin (or a <code>cron</code> job) removes the static contents. Then the
836 contents gets refreshed.</p>
842 This is done via the following ruleset:
844 <div class="example"><pre>
845 RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong>
846 RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L]
849 <p>Here a request for <code>page.html</code> leads to an
850 internal run of a corresponding <code>page.cgi</code> if
851 <code>page.html</code> is missing or has filesize
852 null. The trick here is that <code>page.cgi</code> is a
853 CGI script which (additionally to its <code>STDOUT</code>)
854 writes its output to the file <code>page.html</code>.
855 Once it has completed, the server sends out
856 <code>page.html</code>. When the webmaster wants to force
857 a refresh of the contents, he just removes
858 <code>page.html</code> (typically from <code>cron</code>).</p>
864 <h3>Document With Autorefresh</h3>
869 <dt>Description:</dt>
872 <p>Wouldn't it be nice, while creating a complex web page, if
873 the web browser would automatically refresh the page every
874 time we save a new version from within our editor?
881 <p>No! We just combine the MIME multipart feature, the
882 web server NPH feature, and the URL manipulation power of
883 <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>. First, we establish a new
884 URL feature: Adding just <code>:refresh</code> to any
885 URL causes the 'page' to be refreshed every time it is
886 updated on the filesystem.</p>
888 <div class="example"><pre>
889 RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1
892 <p>Now when we reference the URL</p>
894 <div class="example"><pre>
895 /u/foo/bar/page.html:refresh
898 <p>this leads to the internal invocation of the URL</p>
900 <div class="example"><pre>
901 /internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
904 <p>The only missing part is the NPH-CGI script. Although
905 one would usually say "left as an exercise to the reader"
906 ;-) I will provide this, too.</p>
908 <div class="example"><pre>
911 ## nph-refresh -- NPH/CGI script for auto refreshing pages
912 ## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
916 # split the QUERY_STRING variable
917 @pairs = split(/&/, $ENV{'QUERY_STRING'});
918 foreach $pair (@pairs) {
919 ($name, $value) = split(/=/, $pair);
920 $name =~ tr/A-Z/a-z/;
921 $name = 'QS_' . $name;
922 $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
923 eval "\$$name = \"$value\"";
925 $QS_s = 1 if ($QS_s eq '');
926 $QS_n = 3600 if ($QS_n eq '');
928 print "HTTP/1.0 200 OK\n";
929 print "Content-type: text/html\n\n";
930 print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
934 print "HTTP/1.0 200 OK\n";
935 print "Content-type: text/html\n\n";
936 print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
940 sub print_http_headers_multipart_begin {
941 print "HTTP/1.0 200 OK\n";
942 $bound = "ThisRandomString12345";
943 print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
944 &print_http_headers_multipart_next;
947 sub print_http_headers_multipart_next {
948 print "\n--$bound\n";
951 sub print_http_headers_multipart_end {
952 print "\n--$bound--\n";
957 $len = length($buffer);
958 print "Content-type: text/html\n";
959 print "Content-length: $len\n\n";
965 local(*FP, $size, $buffer, $bytes);
966 ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
967 $size = sprintf("%d", $size);
968 open(FP, "&lt;$file");
969 $bytes = sysread(FP, $buffer, $size);
974 $buffer = &readfile($QS_f);
975 &print_http_headers_multipart_begin;
976 &displayhtml($buffer);
979 local($file) = $_[0];
982 ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
986 $mtimeL = &mystat($QS_f);
988 for ($n = 0; $n &lt; $QS_n; $n++) {
990 $mtime = &mystat($QS_f);
991 if ($mtime ne $mtimeL) {
994 $buffer = &readfile($QS_f);
995 &print_http_headers_multipart_next;
996 &displayhtml($buffer);
998 $mtimeL = &mystat($QS_f);
1005 &print_http_headers_multipart_end;
1016 <h3>Mass Virtual Hosting</h3>
1021 <dt>Description:</dt>
1024 <p>The <code class="directive"><a href="../mod/core.html#virtualhost"><VirtualHost></a></code> feature of Apache is nice
1025 and works great when you just have a few dozen
1026 virtual hosts. But when you are an ISP and have hundreds of
1027 virtual hosts, this feature is suboptimal.</p>
1033 <p>To provide this feature we map the remote web page or even
1034 the complete remote web area to our namespace using the
1035 <dfn>Proxy Throughput</dfn> feature (flag <code>[P]</code>):</p>
1037 <div class="example"><pre>
1041 www.vhost1.dom:80 /path/to/docroot/vhost1
1042 www.vhost2.dom:80 /path/to/docroot/vhost2
1044 www.vhostN.dom:80 /path/to/docroot/vhostN
1047 <div class="example"><pre>
1052 # use the canonical hostname on redirects, etc.
1056 # add the virtual host in front of the CLF-format
1057 CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
1060 # enable the rewriting engine in the main server
1063 # define two maps: one for fixing the URL and one which defines
1064 # the available virtual hosts with their corresponding
1066 RewriteMap lowercase int:tolower
1067 RewriteMap vhost txt:/path/to/vhost.map
1069 # Now do the actual virtual host mapping
1070 # via a huge and complicated single rule:
1072 # 1. make sure we don't map for common locations
1073 RewriteCond %{REQUEST_URI} !^/commonurl1/.*
1074 RewriteCond %{REQUEST_URI} !^/commonurl2/.*
1076 RewriteCond %{REQUEST_URI} !^/commonurlN/.*
1078 # 2. make sure we have a Host header, because
1079 # currently our approach only supports
1080 # virtual hosting through this header
1081 RewriteCond %{HTTP_HOST} !^$
1083 # 3. lowercase the hostname
1084 RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$
1086 # 4. lookup this hostname in vhost.map and
1087 # remember it only when it is a path
1088 # (and not "NONE" from above)
1089 RewriteCond ${vhost:%1} ^(/.*)$
1091 # 5. finally we can map the URL to its docroot location
1092 # and remember the virtual host for logging purposes
1093 RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}]
1101 </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
1102 <div class="section">
1103 <h2><a name="access" id="access">Access Restriction</a></h2>
1112 <dt>Description:</dt>
1115 <p>How can we forbid a list of externally configured hosts
1116 from using our server?</p>
1122 <p>For Apache >= 1.3b6:</p>
1124 <div class="example"><pre>
1126 RewriteMap hosts-deny txt:/path/to/hosts.deny
1127 RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
1128 RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
1129 RewriteRule ^/.* - [F]
1132 <p>For Apache <= 1.3b6:</p>
1134 <div class="example"><pre>
1136 RewriteMap hosts-deny txt:/path/to/hosts.deny
1137 RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
1138 RewriteRule !^NOT-FOUND/.* - [F]
1139 RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
1140 RewriteRule !^NOT-FOUND/.* - [F]
1141 RewriteRule ^NOT-FOUND/(.*)$ /$1
1144 <div class="example"><pre>
1148 ## ATTENTION! This is a map, not a list, even when we treat it as such.
1149 ## mod_rewrite parses it for key/value pairs, so at least a
1150 ## dummy value "-" must be present for each entry.
1167 <dt>Description:</dt>
1170 <p>How can we forbid a certain host or even a user of a
1171 special host from using the Apache proxy?</p>
1177 <p>We first have to make sure <code class="module"><a href="../mod/mod_rewrite.html">mod_rewrite</a></code>
1178 is below(!) <code class="module"><a href="../mod/mod_proxy.html">mod_proxy</a></code> in the Configuration
1179 file when compiling the Apache web server. This way it gets
1180 called <em>before</em> <code class="module"><a href="../mod/mod_proxy.html">mod_proxy</a></code>. Then we
1181 configure the following for a host-dependent deny...</p>
1183 <div class="example"><pre>
1184 RewriteCond %{REMOTE_HOST} <strong>^badhost\.mydomain\.com$</strong>
1185 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1188 <p>...and this one for a user@host-dependent deny:</p>
1190 <div class="example"><pre>
1191 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>^badguy@badhost\.mydomain\.com$</strong>
1192 RewriteRule !^http://[^/.]\.mydomain.com.* - [F]
1199 <h3>Special Authentication Variant</h3>
1204 <dt>Description:</dt>
1207 <p>Sometimes very special authentication is needed, for
1208 instance authentication which checks for a set of
1209 explicitly configured users. Only these should receive
1210 access and without explicit prompting (which would occur
1211 when using Basic Auth via <code class="module"><a href="../mod/mod_auth.html">mod_auth</a></code>).</p>
1217 <p>We use a list of rewrite conditions to exclude all except
1220 <div class="example"><pre>
1221 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend1@client1.quux-corp\.com$</strong>
1222 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend2</strong>@client2.quux-corp\.com$
1223 RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend3</strong>@client3.quux-corp\.com$
1224 RewriteRule ^/~quux/only-for-friends/ - [F]
1231 <h3>Referer-based Deflector</h3>
1236 <dt>Description:</dt>
1239 <p>How can we program a flexible URL Deflector which acts
1240 on the "Referer" HTTP header and can be configured with as
1241 many referring pages as we like?</p>
1247 <p>Use the following really tricky ruleset...</p>
1249 <div class="example"><pre>
1250 RewriteMap deflector txt:/path/to/deflector.map
1252 RewriteCond %{HTTP_REFERER} !=""
1253 RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
1254 RewriteRule ^.* %{HTTP_REFERER} [R,L]
1256 RewriteCond %{HTTP_REFERER} !=""
1257 RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
1258 RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
1261 <p>... in conjunction with a corresponding rewrite
1264 <div class="example"><pre>
1269 http://www.badguys.com/bad/index.html -
1270 http://www.badguys.com/bad/index2.html -
1271 http://www.badguys.com/bad/index3.html http://somewhere.com/
1274 <p>This automatically redirects the request back to the
1275 referring page (when "<code>-</code>" is used as the value
1276 in the map) or to a specific URL (when an URL is specified
1277 in the map as the second argument).</p>
1284 <div class="bottomlang">
1285 <p><span>Available Languages: </span><a href="../en/rewrite/rewrite_guide_advanced.html" title="English"> en </a></p>
1286 </div><div id="footer">
1287 <p class="apache">Copyright 2009 The Apache Software Foundation.<br />Licensed under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.</p>
1288 <p class="menu"><a href="../mod/">Modules</a> | <a href="../mod/directives.html">Directives</a> | <a href="../faq/">FAQ</a> | <a href="../glossary.html">Glossary</a> | <a href="../sitemap.html">Sitemap</a></p></div>