
Thoughts about this server ...
(Experimental) Four-Calorie-Servers/HTTP
As of v3.1, the server is still a fairly low volume server, capable of
only 20 pages per second or so. Future design changes should improve that
figure considerably. Even with the given page rate, we are using this little
server to run all of the
domains affiliated with this site. Obviously, all of our sites are running a
relatively low volume, and nearly one hundred percent of that volume is coming from
search engines. (The last real human click-through was about two months ago, which
makes me wonder why I bother with the sites, except that it helps me organize my
thoughts, projects, and musings).
How do I know the last human click-through was two months ago? The server has a built-in
logger that is accessible via the internet. It's output looks something like:
Current Maximum Outer Timeout 3 , Current First Chunk Timeout 3
Multi Threaded (1 = true) 1 Absolute Timeout (Seconds) 180 Maximum Simultaneous Threads 100
| Process Thread | Socket | Address | Resource |
| 000 | 000 | [ ] | ... |
| 001 | 004 | [ 66.233.116.31 ] | ... GET /servers/testhints.html |
| 002 | 004 | [ 66.233.116.31 ] | ... GET /servers/images/dig-fan.png |
| 003 | 005 | [ 66.233.116.31 ] | ... GET /pages/sfpages.html |
| 004 | 004 | [ 66.233.116.31 ] | ... GET /pages/ie.js |
| 005 | 004 | [ 66.233.116.31 ] | ... GET /pages/ff.css |
| 006 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/logo1.png |
| 007 | 005 | [ 66.233.116.31 ] | ... GET /pages/images/wordsep16.png |
| 008 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/wordsep50.png |
| 009 | 005 | [ 66.233.116.31 ] | ... GET /pages/images/oss.png |
| 010 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/uncommon-bricks-base.png |
| 011 | 004 | [ 66.233.116.31 ] | ... GET /pages/sfpages.html |
| 012 | 004 | [ 66.233.116.31 ] | ... GET /pages/ie.js |
| 013 | 004 | [ 66.233.116.31 ] | ... GET /pages/ff.css |
| 014 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/wordsep16.png |
| 015 | 005 | [ 66.233.116.31 ] | ... GET /pages/images/logo1.png |
| 016 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/wordsep50.png |
| 017 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/uncommon-bricks-base.png |
| 018 | 004 | [ 66.233.116.31 ] | ... GET /images/bartwo.png |
| 019 | 004 | [ 66.233.116.31 ] | ... GET /servers/images/ovaltitleback.png |
| 020 | 004 | [ 66.233.116.31 ] | ... GET /pages/images/oss.png |
The address shown (66.233.116.31) is one of mine. Now notice how the html resource calls are followed by the javascript files
and image files associated with the html resource. The demonstrated series of log entries is what a human click-through
log event looks like. (These log entries were caused by my own visit to the site with my Firefox browser).
When a search engine bot visits the site, the only log entry is the html file itself, or perhaps a robot file
or two, giving it
away as a search engine visitation. (Plus I
know the addresses of most of the search engine bots on the internet). Thanks go out to MSN and Googlebot
and Yahoo Slurp for their
dozens of hits per day that are so useful for the HTTP server refinement efforts that are part of my forte on the internet.
Amazingly, I seem to be able to find my pages in Google, as they are ranked highly enough to be provided as a search
result when using a search term containing only a few words.
Now, (for the benefit of any of those very rare and wandering visitors), let us get back to our discussion. The initial aim of the project was to create an HTTP server and make it as simple as possible while still maintaining some reasonable level of functionality.
Once again, simple is not always fast, quick, or capable of scaling tall buildings in a single leap. But, initially, it is good because it helps one to find potential trouble spots more readily.

The simple design of the server at this point plugs the received queries directly into
the pthread pool for processing. Since, the pthread pool is limited by memory (the thread stack
plus dynamic allocations for each thread are not trivial), only a finite number of threads (currently 100)
within it can be utilized at any particular instance in time. When the thread pool is exhausted, some
requests have to wait until threads are available.
The direct-to-processing-thread-queue design means that "bursts" of traffic may not be well
accommodated if
a particular burst last long enough to exhaust the thread queue.
As a means of creating higher throughput, a modification may be made that incorporates
an input queue many times larger than the thread queue. Such an input queue could
accommodate "burst" traffic of perhaps thousands of connections. The current code base would
receive some minor
performance benefit from such an arrangement due to the way the queue could be
very speedily populated.
At some point, you have to admit that you just can't have your cake and eat it too.
Higher simultaneous requests-processing-power just requires the addition of more simultaneously
available threads. To this end, a future
enhancement may be to make the thread pool dynamic, and create a dynamic pool size that is reasonable
relative to the
resources of the machine that it is running on. v3.1 already does this to some limited extent, in
an indirect fashion.

Downloads
Fourcaloriehttpd: fourcaloriehttpd-3.1.tar.gz (FreeBSD 7.0 binary executable)
Server version 3.0.7 (FreeBSD 7.0 package)
Server version 3.0.7 (FreeBSD 6.3 package)
The pthread library uses a large default stack size for new threads on on most *nix systems. The "Fourcaloriehttpd" program makes use of the pthread_attr_setstacksize function to make sure that the server's memory requirements don't get too far out of hand:
pthread_attr_t pthrServerAttribute;
pthread_attr_init(&pthrServerAttribute);
pthread_attr_setdetachstate(&pthrServerAttribute, PTHREAD_CREATE_JOINABLE);
pthread_attr_setstacksize(&pthrServerAttribute, THREAD_STACK_SIZE);
Notice that the "fourcaloriehttpd" stack size has been set to 500 kbytes, a seemingly large value made necessary by liberal use of local structs, but much less than the 10Mbyte default stack size. Notice that the JOINABLE attribute has been set on pthrServerAttribute so that the thread can be disposed of handily when we are through with it.
The SO_KEEPALIVE option is set, depending upon the options selected at the command line:
socServer = socket(AF_INET, SOCK_STREAM, 0);
int optSet = stcParams.intKeepAlive;
setsockopt(socServer, SOL_SOCKET, SO_KEEPALIVE, (char*) &optSet, sizeof(optSet));
Immediately after the the BSD bind and listen functions have been called on the server socket, we drop root privilege:
setuid((uid_t) stcParams.shrtLowPrivUID);
Inside the main connection handler portion of code ...
// Inside the main connection handler, each select response is handed off
// to a connection processor launched on a new thread
...
FD_SET(socServer, &readSelset);
intResponse = select(MAX_DESCRIPTOR, &readSelset, NULL, NULL, &tvTime);
if (intResponse > 0)
{
if (FD_ISSET(socServer, &readSelset))
{
intIndex = getNextThread(); // pseudo
if (arraypthreads[intIndex] == NULL)
{
intResponse = pthread_create(&arraypthreads[intIndex],
pthrServerAttribute, connection_processor, &stcProcessor);
...
}
else
...
}
else
...
}
else
...
...
Note the use of an array for the pthreads. The "FourcalorieServer/HTTP" program uses a separate "process garbage collector" to periodically test and destroy thread remnants.
-- Back to Log Analysis ...
As was mentioned earlier, the logs for this site show mostly search engine "bot" visits, day after day, and the end result of all the search bot activity is that these pages get ranked highly enough to be retrieved in the title/description lists that are the result of the submission of a few simple search-words to any of the major search engines.
Of particular concern to this author is the fact that there are no significant numbers of human click-through events. (Click-through events are typically represented by numerous html file dependency downloads, as described at the top of this page).
Unreasonable
A real human click-through occurs approximately once every month or two.
First, consider the size of the population of
internet users (billions of people). Secondly, consider my claim that it is relatively trivial to retrieve Title/Description
lists that include this site as the
result of a search query to any of the major search engines, and that these line-items are often listed in the first or second
pages of those results.
The two assertions (first and second) made in the previous paragraph can not be reconciled with each other. The statistical
odds against a scenario wherein billions of people may, at their option, (easily) return page listings for this site,
but never actually click-through to the site are enormous.
My descriptions and titles are not enough of a turn-off to discourage billions of people. In any case, many random persons
would certainly often mis-interpret the short listings from the search engines, and click-through to the site, not
really knowing what kind
of content they might be selecting. Therefore; there is one and only one conclusion that I can make. What it really boils
down to is censorship. This page is being censored. All of these pages (the entire site) is being censored. Blocked, totally,
excepting for the search engines, for which traffic is allowed to pass.
I am not laying blame on any particular ISP, or any particular sub-system of the internet at this juncture. I am not
saying that I know how it is done, or who is responsible for this censorship, but I know it is happening. I do not know
if ISPs are being involuntarily hacked, or are complicit. I do not know if government systems are being hacked, or if governments
are complicit. But whatever is happening, however it is done, it is world wide.
I have several sites, each
of which addresses an entirely different subject material, and each suffers from the same censorship. It does not seem to
matter which country I select, which IP addresses I use, or which ISP services the accounts. The result is the same. It does not
matter what type of software I run, what type of operating systems I use, or what type of hosting service I select.
This author has attempted to outrun the censorship, by purchasing internet bandwidth and hosting services
in various countries, including
the United States, Ireland, The Netherlands, Australia, and France.
In the United States, after a new service has been selected and paid for, the traffic seems normal for a few days, and then
transforms to the aforementioned (search bot only) regimen. When the site is located to other countries, the traffic
maintains some
semblance of normalcy for a few weeks or even a month before transforming to the aforementioned level.
There is wholesale censorship occuring on the internet. It's been going on for many years.
The sites being censored are not terroristic. Is this site a terror site? Is this site immoral?
Is this site advocating sedition? Mayhem? Mass murder? I think not.
My option at this point seems to be to use some type of anonymizing software such as Freenet or Tor. Both of those
options have very serious downsides.
Site content copyright © 2006, 2007, 2008, 2009 Datazygte, Inc. contact: ron scheckelhoff rscheckelhoff@fourcalorieservers.com