rubbos/app/tomcat-connectors-1.2.32-src/native/TODO.txt

   1   Licensed to the Apache Software Foundation (ASF) under one or more
   2   contributor license agreements.  See the NOTICE file distributed with
   3   this work for additional information regarding copyright ownership.
   4   The ASF licenses this file to You under the Apache License, Version 2.0
   5   (the "License"); you may not use this file except in compliance with
   6   the License.  You may obtain a copy of the License at
   7
   8       http://www.apache.org/licenses/LICENSE-2.0
   9
  10   Unless required by applicable law or agreed to in writing, software
  11   distributed under the License is distributed on an "AS IS" BASIS,
  12   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13   See the License for the specific language governing permissions and
  14   limitations under the License.
  15
  16 TODO for tomcat-connectors
  17
  18 $Id: TODO.txt 920112 2010-03-07 20:57:35Z timw $
  19
  20 1) Optimize "distance"
  21 ======================
  22
  23 Sorting the list of balanced workers by distance would be nice, but:
  24 How to combine the sorting with the offset implementation (especially
  25 useful for strategy BUSYNESS under low load).
  26
  27 Local error states and likely other features will also break in case
  28 we do naive reordering.
  29
  30 2) Reduce number of string comparisons in most_suitable
  31 ========================================================
  32
  33 a) redirect/domains
  34
  35 It would be easy to improve the redirect string b an integer, giving the
  36 index of the worker in the lb. Then lb would not need to search for the redirect worker.
  37
  38 The same way, one could add a list with indizes to workers in the same domain.
  39 Whenever domain names are managed (init and status worker update) one would
  40 scan the worker list and update the index list.
  41
  42 Finally one could have a list of workers, whose domain is the same as the redirect
  43 attribute of the worker, because that's also something we consider.
  44
  45 What I'm not sure about, even in the existing code, is the locking between updates
  46 by the status worker and the process local information about the workers,
  47 especially in the case, when status updates a redirect or domain attribute.
  48
  49 I would like to keep these attributes and the new index arrays process local,
  50 and the processes should find out about changes made by status to shm (redirect/domain)
  51 and then rebuild their data. No need to get these on every request from the shm,
  52 only the check for up-to-date should be made.
  53
  54 b) exact matches for jvmRoutes
  55
  56 Could we use hashes instead of string comparisons all the time?
  57 I'm not sure, if a good enough hash takes longer than a string comparison though.
  58
  59 3) Code separation between factory, validate and init in lb
  60 ============================================================
  61
  62 The factory contains:
  63
  64         private_data->worker.retries = JK_RETRIES;
  65         private_data->s->recover_wait_time = WAIT_BEFORE_RECOVER;
  66
  67 I think, this should move to validate() or init().
  68 It might even be obsolete, because in init, we already have:
  69
  70     pThis->retries = jk_get_worker_retries(props, p->s->name,
  71     p->s->retries = pThis->retries;
  72     p->s->recover_wait_time = jk_get_worker_recover_timeout(props, p->s->name, WAIT_BEFORE_RECOVER);
  73     if (p->s->recover_wait_time < WAIT_BEFORE_RECOVER)
  74         p->s->recover_wait_time = WAIT_BEFORE_RECOVER;
  75
  76 Then: In validate there is
  77
  78                 p->lb_workers[i].s->error_time = 0;
  79
  80 So shouldn't there also be
  81
  82                 p->lb_workers[i].s->maintain_time = time(NULL);
  83
  84 4) Logging
  85 ==========
  86
  87 a) Allow logging of request url or uuid in jk log to ease matching with access log.
  88
  89 b) Implement log rotation for IIS. (done in 1.2.31)
  90
  91 c) Allow adding of log notes for IIS like we do with Apache.
  92
  93 d) Add error type info to access log notes
  94
  95 e) Refactor: Use the same code files for the request logging functions in Apache 1.3 and 2.0.
  96
  97 f) Refactor: Use the same code files for piped logging in Apache 1.3 and 2.0.
  98
  99 5) ajpget
 100 ==========
 101
 102 Combine ajplib and Apache ab to an ajp13 commandline client ajpget.
 103
 104 6) Parsing workers.properties
 105 =============================
 106
 107 Parsing of workers.properties aditionally to just looking up attributes
 108 would help users to detect syntax errors in the file. At the moment
 109 no information will be logged, e.g. when attributes contain typos.
 110
 111 Example: worker.list vs. worker.lists.
 112
 113 7) Persisting workers.properties
 114 ================================
 115
 116 Make workers.properties persist from inside status worker.
 117
 118 Add additional properties file, that contains a journal of property changes done
 119 via the status worker. When initializing those overwrite the initial workers.properties.
 120
 121 Update actions in the status worker will allow to optionally add a change to this journal.
 122 We can also add a comment with timestamp etc. to each journal line.
 123
 124 8) Reduce number of uses of time(NULL)
 125 ======================================
 126
 127 We use time(NULL) a lot. Since it only has resolution of a second,
 128 I'm asking myself, if we could update the actual time in only a few
 129 places and get time out of some variables when needed. The same does
 130 not hold true for millisecond time, but in several cases we use the time,
 131 it's not very critical, that it is exact. These cases are related to:
 132
 133 Some of this is already been done, the remaining parts are:
 134
 135 - last_access for usage against timeout value that is ~minutes
 136 - error_time for usage against retry timeout that is ~minutes
 137 - uri_worker_map checked for usage against JK_URIMAP_RELOAD=1 minute
 138
 139 So I think, it would suffice to set an actual time at the beginning of
 140 the request/response cycle (used by everything before the request is being
 141 sent over the socket) and maybe after the response shows up/ an error occurs
 142 (for everything else, if there is).
 143
 144 For which cases would it be OK, to use the time before sending to TC:
 145 - uri_worker_map "checked" (uri map lookup starts early)
 146 - setting/testing last_access in
 147   - jk_ajp_common.c:ajp_connect_to_endpoint()
 148   - jk_ajp_common.c:ajp_get_endpoint()
 149   - jk_ajp_common.c:ajp_maintain()
 150
 151 What about the others:
 152 - setting last_access in init should use the actual time in
 153   jk_ajp_common.c:ajp_create_endpoint_cache()
 154
 155 - setting last_access again after the service could also use the
 156   actual time in jk_ajp_common.c:ajp_done()
 157 - setting error_time should better use the actual time
 158   jk_lb_worker.c service(): rec->s->error_time = time(NULL);
 159
 160 The last two cases could again use the same time, which then would be needed
 161 to be generated at the end or directly after service.
 162
 163 9) Access/Modification Time in shm
 164 ==================================
 165
 166 a) [Discussion] What will this generally be used for? At the moment,
 167 only jk_status "uses" it, but it only sets the values, it never asks for them.
 168
 169 b) [Improvement, minor] jk_shm_set_workers_time() implicitly calls
 170 jk_shm_sync_access_time(), but jk_status does:
 171
 172             jk_shm_set_workers_time(time(NULL));
 173             /* Since we updated the config no need to reload
 174              * on the next request
 175              */
 176             jk_shm_sync_access_time();
 177
 178 two times. So depending on the idea of the functionality of these calls,
 179 either set_workers_time and sync_access_time should be independently,
 180 or the second call in jk_status coulkd be removed.
 181
 182 10) "Destroy" functionality
 183 ===========================
 184
 185 [Hint] Destroy on a worker never seems to free shm,
 186 but I think that was already a flaw without shm.
 187
 188 11) Locks against shm
 189 =====================
 190
 191 It might be an interesting experiment to implement an improved locking structure.
 192 It looks like one would need in fact different types of locks.
 193 In shm we have as read/write information:
 194
 195 Changed only by status worker:
 196 - redirect, domain, lb_factor, sticky_session, sticky_session_force,
 197   recover_wait_time, retries, status (is_disabled, is_stopped).
 198
 199 These changes need some kind of reconfiguration in the threads after
 200 change and before the next request/response. Since changes are rare,
 201 here we would be better of, with a simple detect change and copy from
 202 shm to process procedure. status updates the data in shm and after that
 203 the time stamp on the shh. Each process checks the time stamp before
 204 doing a request, and when the time stamp changed it does a writer CS
 205 lock and updates it's local copy. All threads always do a reader CS
 206 lock when doing a request/response cycle. Reader CS locks are concurrent,
 207 writers are exclusive. So readers are not allowed, when the config data is being updated.
 208
 209 Changed by the threads themselves (and via reset by the status worker):
 210 - counters needed by routing decisions (lb_value, readed, transferred, busy)
 211 - timers needed by maintenance functions (error_time, servic_time/maintain_time)
 212 - status is_busy, in_error_state
 213 - uncritical data with no influence on routing decisions (max_busy, elected, errors,
 214   in_recovering)
 215
 216 Here again we could improve by using reader/writer locks. I have a
 217 tendency for the PESSIMISTIC side of locking, but I think we could
 218 shrink the code blocks that should be locked. At the monent they are
 219 pretty big (most of get_most_suitable_worker).
 220
 221 Read-only: name and id.
 222
 223 By the way: at several places we don't check for errors on getting the lock.
 224
 225 12) Global locks
 226 ================
 227
 228 We might want to make the lock technology choosable, like httpd does.
 229 E.g. on Solaris the default lock type if fcntl, and we can easily
 230 get an invalid EDEADLOCK for our jk_log_lock.
 231
 232 The following pthread based non global locks are used:
 233
 234 - one mutex for each AJP worker, synchronizing access to the connection
 235 pool, which exists per process
 236
 237 - one mutex for each lb worker
 238
 239 - a mutex used during dynamic update of uriworkermap.properties to
 240 prevent concurrent updates. Updates are done per process.
 241
 242 - a mutex to prevent concurrent execution of the process local internal
 243 maintenance task
 244
 245 - a mutex for access to the shared memory when changing or reading
 246 configuration parameters. That might be a little unsafe, because it
 247 actually should be a global mutex, not a process local, but those config
 248 changes are only done due to interaction with the status worker, so
 249 there's very little chance for unwanted concurrency here. All dynamic
 250 runtime data are already marked as being volatile.
 251
 252 All except the last seem to be safe. The last might need some hybrid model
 253 using thread local mostly and process global when doing updates.
 254
 255 See also: http://marc.info/?t=123394059800003&r=1&w=2
 256
 257 13) Understand the exact behaviour of shm and restarts
 258 ======================================================
 259
 260 Furthermore: rotatelogs (?) and gzip (mod_mime_magic) seem to close
 261 the (non-existing) shm file. Maybe a problem on some platforms?
 262
 263 14) What I didn't yet check
 264 ===========================
 265
 266 a) Correctness of is_busy handling
 267
 268 b) Correctness of the reset values after reset by status worker
 269
 270 c) What would be the exact behaviour, if shm does not work (memory case).
 271    Will this be a critical failure, or will we only experience a
 272    degradation in routing decisions.
 273
 274 d) How complete is mod_proxy_ajp/mod_proxy_balancer.
 275    Port changes from mod_jk to them.
 276
 277 15) Status worker
 278 =================
 279
 280 Allow managing pool and connection parameters. Add flags to
 281 pool and connections to signal workers and maintenance whether
 282 existing connections should be closed and renewed.
 283
 284 Check completeness of attribute manageability for AJP workers.
 285
 286 Check completeness of runtime data display, e.g.
 287 reset_time, recover_time, etc. Maybe also add "last error type".
 288
 289 Work on a global display of process local data, e.g. state of the
 290 process local connection pools (sizes, num connected/idle).
 291
 292 Rework the GUI:
 293
 294 - Basic overview start page with links to the workers,
 295   maybe a checkbox to decide whether you want to see config data too.
 296 - Detail view (what type of error was last and when etc.), detail view
 297   for connection pools.
 298
 299 16) URI mapping
 300 ===============
 301
 302 Add more extensions?
 303
 304 17) Connection Pool
 305 ===================
 306
 307 How would a good global maximum count look like?
 308 Simply limit busyness? Soft limit (applies only to non-sticky requests)
 309 and a hard limit (applies to all requests)?
 310
 311 18) IP V6
 312 =========
 313
 314 There's a Bugzilla with a patch.
 315
 316 19) IIS Chunked Encoding
 317 ========================
 318
 319 Move from alternative build to default.
 320
 321 What about the other ifdef'd features?
 322
 323 20) Add REMORE_PORT as a default JkEnvVar
 324 =========================================
 325
 326 ... and port to IIS to fix the getRemotePort() problem.
 327
 328 21) Rework HowTo docs
 329 =====================
 330
 331 22) Add better example config
 332 =============================
 333
 334 23) Remove JNI worker
 335 =====================
 336
 337 24) Watchdog reload of uriworkermap.properties
 338 ==============================================
 339
 340 Questions about uriworkermap.properties watchdog reload (r745898):
 341 - shm lock needed?
 342 - Apache port (need to iterate over vhosts)
 343
 344 25) Watchdog backend probing
 345 ============================
 346
 347 Configurable probe URL to test backend, e.g. to decide about
 348 recovery instead of using random real requests.
 349
 350 26) Commandline shm
 351 ===================
 352
 353 Commandline tool to read data from shm file.
 354
 355 27) Status worker property format
 356 =================================
 357
 358 Check whether we return list properties as one line
 359 (because Java doesn't allow the same property key multiple
 360 times).
 361
 362 Applies possibly to:
 363
 364 "list",
 365 BALANCE_WORKERS,
 366 MOUNT_OF_WORKER,
 367 USER_OF_WORKER,
 368 GOOD_RATING_OF_WORKER,
 369 BAD_RATING_OF_WORKER,
 370 STATUS_FAIL_OF_WORKER,
 371