rubbos/app/tomcat-connectors-1.2.32-src/xdocs/generic_howto/timeouts.xml

   1 <?xml version="1.0" encoding="ISO-8859-1"?>
   2 <!DOCTYPE document [
   3   <!ENTITY project SYSTEM "project.xml">
   4 ]>
   5 <document url="timeouts.html">
   6
   7   &project;
   8 <copyright>
   9    Licensed to the Apache Software Foundation (ASF) under one or more
  10    contributor license agreements.  See the NOTICE file distributed with
  11    this work for additional information regarding copyright ownership.
  12    The ASF licenses this file to You under the Apache License, Version 2.0
  13    (the "License"); you may not use this file except in compliance with
  14    the License.  You may obtain a copy of the License at
  15
  16        http://www.apache.org/licenses/LICENSE-2.0
  17
  18    Unless required by applicable law or agreed to in writing, software
  19    distributed under the License is distributed on an "AS IS" BASIS,
  20    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  21    See the License for the specific language governing permissions and
  22    limitations under the License.
  23 </copyright>
  24 <properties>
  25 <title>Timeouts HowTo</title>
  26 <author email="rjung@apache.org">Rainer Jung</author>
  27 <date>$Date: 2009-03-22 00:11:39 +0100 (Sun, 22 Mar 2009) $</date>
  28 </properties>
  29 <body>
  30 <section name="Introduction">
  31 <br/>
  32 <p>Setting communication timeouts is very important to improve the
  33 communication process. They help to detect problems and stabilise
  34 a distributed system. JK can use several different timeout types, which
  35 can be individually configured. For historical reasons, all of them are
  36 disabled by default. This HowTo explains their use and gives
  37 hints how to find appropriate values.
  38 </p>
  39 <p>All timeouts can be configured in the workers.properties file.
  40 For a complete reference of all worker configuration
  41 items, please consult the worker <a href="../reference/workers.html">reference</a>.
  42 This page assumes, that you are using at least version 1.2.16 of JK.
  43 Dependencies on newer versions will be mentioned where necessary.
  44 </p>
  45 <warn>
  46 Do not set timeouts to extreme values. Very small timeouts will likely
  47 be counterproductive.
  48 </warn>
  49 <warn>
  50 Long Garbage Collection pauses on the backend do not make a good
  51 fit with some timeouts. Try to optimise your Java memory and GC settings.
  52 </warn>
  53 </section>
  54
  55 <section name="JK Timeout Attributes">
  56 <br/>
  57 <subsection name="CPing/CPong">
  58 <p>
  59 CPing/CPong is our notion for using small test packets to check the
  60 status of backend connections. JK can use such test packets directly after establishing
  61 a new backend connection (connect mode) and also directly before each request gets
  62 send to a backend (prepost mode).
  63 Starting with version 1.2.27 it can also be used when a connection was idle
  64 for a long time (interval mode).
  65 The maximum waiting time (timeout) for a CPong answer to a CPing and the idle
  66 time in interval mode can be configured.
  67 </p>
  68 <p>
  69 The test packets will be answered by the backend very fast with a minimal amount of
  70 needed processing resources. A positive answer tells us, that the backend can be reached
  71 and is actively processing requests. It does not detect, if some context is deployed
  72 and working. The benefit of CPing/CPong is a fast detection of a communication
  73 problem with the backend. The downside is a slightly increased latency.
  74 </p>
  75 <p>
  76 The worker attribute <b>ping_mode</b> can be set to a combination of characters
  77 to determine, in which situations test packets are used:
  78 <ul>
  79 <li><b>C</b>: connect mode, timeout <b>ping_timeout</b> overwritten by <b>connect_timeout</b></li>
  80 <li><b>P</b>: prepost mode, timeout <b>ping_timeout</b> overwritten by <b>prepost_timeout</b></li>
  81 <li><b>I</b>: interval mode, timeout <b>ping_timeout</b>, idle time <b>connection_ping_interval</b></li>
  82 <li><b>A</b>: all modes</li>
  83 </ul>
  84 </p>
  85 <p>
  86 Multiple values must be concatenated without any separator characters.
  87 We recommend using all CPing tests. If your application is very latency sensitive, then
  88 you should only use the combination of connect and interval mode.
  89 </p>
  90 <p>
  91 Activating the CPing probing via <b>ping_mode</b> has been added in version 1.2.27.
  92 For older versions only the connect and prepost modes exist and must be activated by
  93 explicitely setting <b>connect_timeout</b> and <b>prepost_timeout</b>.
  94 </p>
  95 <p>
  96 The worker attribute <b>ping_timeout</b> sets the default wait timeout
  97 in milliseconds for CPong for all modes. By default the value is "10000"
  98 milliseconds. The value only gets used, if you activate CPing/Cpong probes
  99 via <b>ping_mode</b>. The default value should be fine, except if you experience
 100 very long Java garbage collection pauses.
 101 Depending on your network latency and stability, good custom values
 102 often are between 5000 and 15000 milliseconds.
 103 You can overwrite the timeout used for connect and prepost mode with
 104 <b>connect_timeout</b> and <b>prepost_timeout</b>.
 105 Remember: don't use extremely small values.
 106 </p>
 107 <p>
 108 The worker attribute <b>connect_timeout</b> sets the wait timeout
 109 in milliseconds for CPong during connection establishment. You can use it
 110 if you want to overwrite the general timeout set with <b>ping_timeout</b>.
 111 To use connect mode CPing, you need to enable it via <b>ping_mode</b>.
 112 Since JK usually uses persistent connections, opening new connections is a
 113 rare event. We therefore recommend activating connect mode.
 114 Depending on your network latency and stability, good values often
 115 are between 5000 and 15000 milliseconds.
 116 Remember: don't use extremely small values.
 117 </p>
 118 <p>
 119 The worker attribute <b>prepost_timeout</b> sets the wait timeout
 120 in milliseconds for CPong before request forwarding. You can use it
 121 if you want to overwrite the general timeout set with <b>ping_timeout</b>.
 122 To use prepost mode CPing, you need to enable it via <b>ping_mode</b>.
 123 Activating this type of CPing/CPong adds a small latency to each
 124 request. Usually this is small enough and the benefit of CPing/CPong is more important.
 125 So in general we also recommend using <b>prepost_timeout</b>.
 126 Depending on your network latency and stability, good values often
 127 are between 5000 and 10000 milliseconds.
 128 Remember: don't use extremely small values.
 129 </p>
 130 <p>
 131 Until version 1.2.27 <b>ping_mode</b> and <b>ping_timeout</b> did not
 132 exist and to enable connect or prepost mode CPing you had to set <b>connect_timeout</b>
 133 respectively <b>prepost_timeout</b> to some reasonable positive value.
 134 </p>
 135 </subsection>
 136
 137 <subsection name="Low-Level TCP Timeouts">
 138 <p>
 139 Some platforms allow to set timeouts for all operations on TCP sockets.
 140 This is available for Linux and Windows, other platforms do not support this,
 141 e.g. Solaris. If your platform supports TCP send and receive timeouts,
 142 you can set them using the worker attribute <b>socket_timeout</b>.
 143 You can not set the two timeouts to different values.
 144 </p>
 145 <p>
 146 JK will accept this attribute even if your platform does not support
 147 socket timeouts. In this case setting the attribute will have no effect.
 148 By default the value is "0" and the timeout is disabled.
 149 You can set the attribute to some seconds value (not: milliseconds).
 150 JK will then set the send and the receive timeouts of the backend
 151 connections to this value. The timeout is low-level, it is
 152 used for each read and write operation on the socket individually.
 153 </p>
 154 <p>
 155 Using this attribute will make JK react faster to some types of network problems.
 156 Unfortunately socket timeouts have negative side effects, because for most
 157 platforms, there is no good way to recover from such a timeout, once it fired.
 158 For JK there is no way to decide, if this timeout fired because of real network
 159 problems, or only because it didn't receive an answer packet from a backend in time.
 160 So remember: don't use extremely small values.
 161 </p>
 162 <p>
 163 For the general case of connection establishment you can use
 164 <b>socket_connect_timeout</b>. It takes a millisecond value and works
 165 on most platforms, even if <b>socket_timeout</b> is not supported.
 166 We recommend using <b>socket_connect_timeout</b> because in some network
 167 failure situations failure detection during connection establishment
 168 can take several minutes due to TCP retransmits. Depending on the quality
 169 of your network a timeout somewhere between 1000 and 5000 milliseconds
 170 should be fine. Note that <code>socket_timeout</code> is in seconds, and
 171 <code>socket_connect_timeout</code> in milliseconds.
 172 </p>
 173 </subsection>
 174
 175 <subsection name="Connection Pools and Idle Timeouts">
 176 <p>
 177 JK handles backend connections in a connection pool per web server process.
 178 The connections are used in a persistent mode. After a request completed
 179 successfully we keep the connection open and wait for the next
 180 request to forward. The connection pool is able to grow according
 181 to the number of threads that want to forward requests in parallel.
 182 </p>
 183 <p>
 184 Most applications have a varying load depending on the hour of the day
 185 or the day of the month. Other reasons for a growing connection pool
 186 would be temporary slowness of backends, leading to an increasing
 187 congestion of the frontends like web servers. Many backends use a dedicated
 188 thread for each incoming connection they handle. So usually one wants the
 189 connection pool to shrink, if the load diminishes.
 190 </p>
 191 <p>
 192 JK allows connections in the pool to get closed after some idle time.
 193 This maximum idle time can be configured with the attribute
 194 <b>connection_pool_timeout</b> which is given in units of seconds.
 195 The default value is "0", which disables closing idle connections.
 196 </p>
 197 <p>
 198 We generally recommend values around 10 minutes, so setting
 199 <b>connection_pool_timeout</b> to 600 (seconds). If you use this attribute,
 200 please also set the attribute <b>connectionTimeout</b> in the AJP
 201 Connector element of your Tomcat server.xml configuration file to
 202 an analogous value. <b>Caution</b>: connectionTimeout is in milliseconds.
 203 So if you set JK connection_pool_timeout to 600, you should set Tomcat
 204 connectionTimeout to 600000.
 205 </p>
 206 <p>
 207 JK connections do not get closed immediately after the timeout passed.
 208 Instead there is an automatic internal maintenance task
 209 running every 60 seconds, that checks the idle status of all connections.
 210 The 60 seconds interval
 211 can be adjusted with the global attribute worker.maintain. We do not
 212 recommend to change this value, because it has a lot of side effects.
 213 Until version 1.2.26, the maintenance task only runs, if requests get
 214 processed. So if your web server has processes that do not receive any
 215 requests for a long time, there is no way to close the idle connections
 216 in its pool. Starting with version 1.2.27 you can configure an independent
 217 watchdog thread when using Apache 2.x with threaded APR or IIS.
 218 </p>
 219 <p>
 220 The maximum connection pool size can be configured with the
 221 attribute <b>connection_pool_size</b>. We generally do not recommend
 222 to use this attribute in combination with Apache httpd. For
 223 Apache httpd we automatically detect the number of threads per
 224 process and set the maximum pool size to this value. For IIS we use
 225 a default value of 250 (before version 1.2.20: 10),
 226 for the Sun Web Server the default is "1".
 227 We strongly recommend adjusting this value for IIS and the Sun Web Server
 228 to the number of requests one web server process should
 229 be able to send to a backend in parallel. You should measure how many connections
 230 you need during peak hours without performance problems, and then add some
 231 percentage depending on your growth rate etc. Finally you should check,
 232 whether your web server processes are able to use at least as many threads,
 233 as you configured as the pool size.
 234 </p>
 235 <p>
 236 The JK attribute <b>connection_pool_minsize</b> defines,
 237 how many idle connections remain when the pool gets shrunken.
 238 By default this is half of the maximum pool size.
 239 </p>
 240 </subsection>
 241
 242 <subsection name="Firewall Connection Dropping">
 243 <p>
 244 One particular problem with idle connections comes from firewalls, that
 245 are often deployed between the web server layer and the backend.
 246 Depending on their configuration, they will silently drop
 247 connections from their status table if they are idle for to long.
 248 </p>
 249 <p>
 250 From the point of view of JK and of the web server, the other side
 251 simply doesn't answer any traffic. Since TCP is a reliable protocol
 252 it detects the missing TCP ACKs and tries to resend the packets for
 253 a relatively long time, typically several minutes.
 254 </p>
 255 <p>
 256 Many firewalls will allow connection closing, even if they dropped
 257 the connection for normal traffic. Therefore you should always use
 258 <a href="#Connection Pools and Idle Timeouts">connection_pool_timeout and
 259 connection_pool_minsize</a> on the JK side
 260 and connectionTimeout on the Tomcat side.
 261 </p>
 262 <p>
 263 Furthermore using the boolean attribute <b>socket_keepalive</b> you can
 264 set a standard socket option, that automatically sends TCP keepalive packets
 265 after some idle time on each connection. By default this is set to "False".
 266 If you suspect idle connection drops by firewalls you should set this to
 267 "True".
 268 </p>
 269 <p>
 270 Unfortunately the default intervals and algorithms for these packets
 271 are platform specific. You might need to inspect TCP tuning options for
 272 your platform on how to control TCP keepalive.
 273 Often the default intervals are much longer than the firewall timeouts
 274 for idle connections. Nevertheless we recommend talking to your firewall
 275 administration and your platform administration in order to make them agree
 276 on good configuration values for the firewall and the platform TCP tuning.
 277 </p>
 278 <p>
 279 In case none of our recommendations help and you are definitively having
 280 problems with idle connection drops, you can disable the use of persistent
 281 connections when using JK together with Apache httpd. For this you set
 282 "JkOptions +DisableReuse" in your Apache httpd configuration.
 283 This will have a huge negative performance impact!
 284 </p>
 285 </subsection>
 286
 287 <subsection name="Reply Timeout">
 288 <p>
 289 JK can also use a timeout on request replies. This timeout does not
 290 measure the full processing time of the response. Instead it controls,
 291 how much time between consecutive response packets is allowed.
 292 </p>
 293 <p>
 294 In most cases, this is what one actually wants. Consider for example
 295 long running downloads. You would not be able to set an effective global
 296 reply timeout, because downloads could last for many minutes.
 297 Most applications though have limited processing time before starting
 298 to return the response. For those applications you could set an explicit
 299 reply timeout. Applications that do not harmonise with reply timeouts
 300 are batch type applications, data warehouse and reporting applications
 301 which are expected to observe long processing times.
 302 </p>
 303 <warn>
 304 If JK aborts waiting for a response, because a reply timeout fired,
 305 there is no way to stop processing on the backend. Although you free
 306 processing resources in your web server, the request
 307 will continue to run on the backend - without any way to send back a
 308 result once the reply timeout fired.
 309 </warn>
 310 <p>
 311 JK uses the worker attribute <b>reply_timeout</b> to set reply timeouts.
 312 The default value is "0" (timeout disabled) and you can set it to any
 313 millisecond value.
 314 </p>
 315 <p>
 316 In combination with Apache httpd, you can also set a more flexible reply_timeout
 317 using an httpd environment variable. If you set the variable JK_REPLY_TIMEOUT
 318 to some integer value, this value will be used instead of the value in
 319 the worker configuration. This way you can set reply timeouts more flexible
 320 with mod_setenvif and mod_rewrite depending on URI, query string etc.
 321 If the environment variable JK_REPLY_TIMEOUT is not set, or is set to a
 322 negative value, the default reply timeout of the worker will be used. If
 323 JK_REPLY_TIMEOUT contains the value "0", then the reply timeout will be disabled
 324 for the request.
 325 </p>
 326 <p>
 327 In combination with a load balancing worker, JK will disable a member
 328 worker of the load balancer if a reply timeout fires. The worker will then
 329 no longer be used until it gets recovered during the next automatic
 330 maintenance task. Starting with JK 1.2.24 you can improve this behaviour using
 331 <b><a href="../reference/workers.html">max_reply_timeouts</a></b>. This
 332 attribute will allow occasional long running requests without disabling the
 333 worker. Only if those requests happen to often, the worker gets disabled by the
 334 load balancer.
 335 </p>
 336 </subsection>
 337 </section>
 338
 339 <section name="Load Balancer Error Detection">
 340 <br/>
 341 <subsection name="Local and Global Error States">
 342 <p>
 343 A load balancer worker does not only have the ability to balance load.
 344 It also handles stickyness and failover of requests in case of errors.
 345 When a load balancer detects an error on one of its members, it needs to
 346 decide, whether the error is serious, or only a temporary error or maybe
 347 only related to the actual request that was processed. Temporary errors
 348 are called local errors, serious errors will be called global errors.
 349 </p>
 350 <p>
 351 If the load balancer decides that a backend should be put into the global error
 352 state, then the web server will not send any more requests there. If no session
 353 replication is used, this means that all user sessions located on the respective
 354 backend are no longer available. The users will be send to another backend
 355 and will have to login again. So the global error state is not transparent to the
 356 users. The application is still available, but users might loose some work.
 357 </p>
 358 <p>
 359 In some cases the decision between local error and global error is easy.
 360 For instance if there is an error sending back the response to the client (browser),
 361 then it is very unlikely that the backend is broken.
 362 So this situation is a typical example of a local error.
 363 </p>
 364 <p>
 365 Some situations are harder to decide though. If the load balancer can't establish
 366 a new connection to a backend, it could be because of a temporary overload situation
 367 (so no more free threads in the backend), or because the backend isn't alive any more.
 368 Depending on the details, the right state could either be local error or global error.
 369 </p>
 370 </subsection>
 371 <subsection name="Error Escalation Time">
 372 <p>
 373 Until version 1.2.26 most errors were interpreted as global errors.
 374 Starting with version 1.2.27 many errors which were previously interpreted as global
 375 were switched to being local whenever the backend is still busy. Busy means, that
 376 other concurrent requests are send to the same backend (successful or not).
 377 </p>
 378 <p>
 379 In many cases there is no perfect way of making the decision
 380 between local and global error. The load balancer simply doesn't have enough information.
 381 In version 1.2.28 you can now tune, how fast the load balancer switches from local error to
 382 global error. If a member of a load balancer stays in local error state for too long,
 383 the load balancer will escalate it into global error state.
 384 </p>
 385 <p>
 386 The time tolerated in local error state is controlled by the load balancer attribute
 387 <b>error_escalation_time</b> (in seconds). The default value is half of <b>recover_time</b>,
 388 so unless you changed <b>recover_time</b> the default is 30 seconds.
 389 </p>
 390 <p>
 391 Using a smaller value for <b>error_escalation_time</b> will make the load balancer react
 392 faster to serious errors, but also carries the risk of more often loosing sessions
 393 in not so serious situations. You can lower <b>error_escalation_time</b> down to 0 seconds,
 394 which means all local errors which are potentially serious are escalated to global errors
 395 immediately.
 396 </p>
 397 <p>
 398 Note that without good basic error detection the whole escalation procedure is useless.
 399 So you should definitely use <b>socket_connect_timeout</b> and activate CPing/CPong
 400 with <b>ping_mode</b> and <b>ping_timeout</b> before thinking about also tuning
 401 <b>error_escalation_time</b>.
 402 </p>
 403 </subsection>
 404 </section>
 405
 406 </body>
 407 </document>