DNS Projects HTTP Projects OSS Projects Testing DNS Browser Extensions Main Page
















Thoughts about this server ...
(Experimental) Four-Calorie-Servers/HTTP



As of v3.1, the server is still a fairly low volume server, capable of only 20 pages per second or so. Future design changes should improve that figure considerably. Even with the given page rate, we are using this little server to run all of the domains affiliated with this site. Obviously, all of our sites are running a relatively low volume, and nearly one hundred percent of that volume is coming from search engines. (The last real human click-through was about two months ago, which makes me wonder why I bother with the sites, except that it helps me organize my thoughts, projects, and musings).


How do I know the last human click-through was two months ago? The server has a built-in logger that is accessible via the internet. It's output looks something like:


Threading Report for fourcaloriehttpd-3.1


Current Maximum Outer Timeout 3 , Current First Chunk Timeout 3
Multi Threaded (1 = true) 1 Absolute Timeout (Seconds) 180 Maximum Simultaneous Threads 100


Process Thread Socket Address Resource
000 000 [ ] ...
001 004 [ 66.233.116.31 ] ... GET /servers/testhints.html
002 004 [ 66.233.116.31 ] ... GET /servers/images/dig-fan.png
003 005 [ 66.233.116.31 ] ... GET /pages/sfpages.html
004 004 [ 66.233.116.31 ] ... GET /pages/ie.js
005 004 [ 66.233.116.31 ] ... GET /pages/ff.css
006 004 [ 66.233.116.31 ] ... GET /pages/images/logo1.png
007 005 [ 66.233.116.31 ] ... GET /pages/images/wordsep16.png
008 004 [ 66.233.116.31 ] ... GET /pages/images/wordsep50.png
009 005 [ 66.233.116.31 ] ... GET /pages/images/oss.png
010 004 [ 66.233.116.31 ] ... GET /pages/images/uncommon-bricks-base.png
011 004 [ 66.233.116.31 ] ... GET /pages/sfpages.html
012 004 [ 66.233.116.31 ] ... GET /pages/ie.js
013 004 [ 66.233.116.31 ] ... GET /pages/ff.css
014 004 [ 66.233.116.31 ] ... GET /pages/images/wordsep16.png
015 005 [ 66.233.116.31 ] ... GET /pages/images/logo1.png
016 004 [ 66.233.116.31 ] ... GET /pages/images/wordsep50.png
017 004 [ 66.233.116.31 ] ... GET /pages/images/uncommon-bricks-base.png
018 004 [ 66.233.116.31 ] ... GET /images/bartwo.png
019 004 [ 66.233.116.31 ] ... GET /servers/images/ovaltitleback.png
020 004 [ 66.233.116.31 ] ... GET /pages/images/oss.png

The address shown (66.233.116.31) is one of mine. Now notice how the html resource calls are followed by the javascript files and image files associated with the html resource. The demonstrated series of log entries is what a human click-through log event looks like. (These log entries were caused by my own visit to the site with my Firefox browser).


When a search engine bot visits the site, the only log entry is the html file itself, or perhaps a robot file or two, giving it away as a search engine visitation. (Plus I know the addresses of most of the search engine bots on the internet). Thanks go out to MSN and Googlebot and Yahoo Slurp for their dozens of hits per day that are so useful for the HTTP server refinement efforts that are part of my forte on the internet. Amazingly, I seem to be able to find my pages in Google, as they are ranked highly enough to be provided as a search result when using a search term containing only a few words.



Now, (for the benefit of any of those very rare and wandering visitors), let us get back to our discussion. The initial aim of the project was to create an HTTP server and make it as simple as possible while still maintaining some reasonable level of functionality.

Once again, simple is not always fast, quick, or capable of scaling tall buildings in a single leap. But, initially, it is good because it helps one to find potential trouble spots more readily.




The simple design of the server at this point plugs the received queries directly into the pthread pool for processing. Since, the pthread pool is limited by memory (the thread stack plus dynamic allocations for each thread are not trivial), only a finite number of threads (currently 100) within it can be utilized at any particular instance in time. When the thread pool is exhausted, some requests have to wait until threads are available.


The direct-to-processing-thread-queue design means that "bursts" of traffic may not be well accommodated if a particular burst last long enough to exhaust the thread queue. As a means of creating higher throughput, a modification may be made that incorporates an input queue many times larger than the thread queue. Such an input queue could accommodate "burst" traffic of perhaps thousands of connections. The current code base would receive some minor performance benefit from such an arrangement due to the way the queue could be very speedily populated.





At some point, you have to admit that you just can't have your cake and eat it too. Higher simultaneous requests-processing-power just requires the addition of more simultaneously available threads. To this end, a future enhancement may be to make the thread pool dynamic, and create a dynamic pool size that is reasonable relative to the resources of the machine that it is running on. v3.1 already does this to some limited extent, in an indirect fashion.




Downloads


Fourcaloriehttpd: fourcaloriehttpd-3.1.tar.gz (FreeBSD 7.0 binary executable)

Server version 3.0.7 (FreeBSD 7.0 package)

Server version 3.0.7 (FreeBSD 6.3 package)

The pthread library uses a large default stack size for new threads on on most *nix systems. The "Fourcaloriehttpd" program makes use of the pthread_attr_setstacksize function to make sure that the server's memory requirements don't get too far out of hand:


              pthread_attr_t pthrServerAttribute;
              pthread_attr_init(&pthrServerAttribute);

              pthread_attr_setdetachstate(&pthrServerAttribute, PTHREAD_CREATE_JOINABLE);

              pthread_attr_setstacksize(&pthrServerAttribute, THREAD_STACK_SIZE);


           

Notice that the "fourcaloriehttpd" stack size has been set to 500 kbytes, a seemingly large value made necessary by liberal use of local structs, but much less than the 10Mbyte default stack size. Notice that the JOINABLE attribute has been set on pthrServerAttribute so that the thread can be disposed of handily when we are through with it.



The SO_KEEPALIVE option is set, depending upon the options selected at the command line:


                socServer = socket(AF_INET, SOCK_STREAM, 0);

                int optSet = stcParams.intKeepAlive;
                setsockopt(socServer, SOL_SOCKET, SO_KEEPALIVE, (char*) &optSet, sizeof(optSet));

           


Immediately after the the BSD bind and listen functions have been called on the server socket, we drop root privilege:


              setuid((uid_t) stcParams.shrtLowPrivUID);

           


Inside the main connection handler portion of code ...


              // Inside the main connection handler, each select response is handed off 
              // to a connection processor launched on a new thread
                  
              ...

              FD_SET(socServer, &readSelset);

              intResponse = select(MAX_DESCRIPTOR, &readSelset, NULL, NULL, &tvTime);

              if (intResponse > 0)
              {           
                     if (FD_ISSET(socServer, &readSelset))
                     {

                          intIndex = getNextThread();  // pseudo

		          if (arraypthreads[intIndex] == NULL)
                          {
                                  intResponse = pthread_create(&arraypthreads[intIndex],
                                      pthrServerAttribute, connection_processor, &stcProcessor);
                                  ... 
                          }
                          else
                              ...
                     }
                     else
                          ...
              }
              else
                  ...

              ...

             

Note the use of an array for the pthreads. The "FourcalorieServer/HTTP" program uses a separate "process garbage collector" to periodically test and destroy thread remnants.



-- Back to Log Analysis ...

As was mentioned earlier, the logs for this site show mostly search engine "bot" visits, day after day, and the end result of all the search bot activity is that these pages get ranked highly enough to be retrieved in the title/description lists that are the result of the submission of a few simple search-words to any of the major search engines.

Of particular concern to this author is the fact that there are no significant numbers of human click-through events. (Click-through events are typically represented by numerous html file dependency downloads, as described at the top of this page).

Unreasonable

A real human click-through occurs approximately once every month or two.


First, consider the size of the population of internet users (billions of people). Secondly, consider my claim that it is relatively trivial to retrieve Title/Description lists that include this site as the result of a search query to any of the major search engines, and that these line-items are often listed in the first or second pages of those results.


The two assertions (first and second) made in the previous paragraph can not be reconciled with each other. The statistical odds against a scenario wherein billions of people may, at their option, (easily) return page listings for this site, but never actually click-through to the site are enormous.


My descriptions and titles are not enough of a turn-off to discourage billions of people. In any case, many random persons would certainly often mis-interpret the short listings from the search engines, and click-through to the site, not really knowing what kind of content they might be selecting. Therefore; there is one and only one conclusion that I can make. What it really boils down to is censorship. This page is being censored. All of these pages (the entire site) is being censored. Blocked, totally, excepting for the search engines, for which traffic is allowed to pass.


I am not laying blame on any particular ISP, or any particular sub-system of the internet at this juncture. I am not saying that I know how it is done, or who is responsible for this censorship, but I know it is happening. I do not know if ISPs are being involuntarily hacked, or are complicit. I do not know if government systems are being hacked, or if governments are complicit. But whatever is happening, however it is done, it is world wide.


I have several sites, each of which addresses an entirely different subject material, and each suffers from the same censorship. It does not seem to matter which country I select, which IP addresses I use, or which ISP services the accounts. The result is the same. It does not matter what type of software I run, what type of operating systems I use, or what type of hosting service I select.


This author has attempted to outrun the censorship, by purchasing internet bandwidth and hosting services in various countries, including the United States, Ireland, The Netherlands, Australia, and France.


In the United States, after a new service has been selected and paid for, the traffic seems normal for a few days, and then transforms to the aforementioned (search bot only) regimen. When the site is located to other countries, the traffic maintains some semblance of normalcy for a few weeks or even a month before transforming to the aforementioned level.

There is wholesale censorship occuring on the internet. It's been going on for many years. The sites being censored are not terroristic. Is this site a terror site? Is this site immoral? Is this site advocating sedition? Mayhem? Mass murder? I think not.


My option at this point seems to be to use some type of anonymizing software such as Freenet or Tor. Both of those options have very serious downsides.

Site content copyright © 2006, 2007, 2008, 2009 Datazygte, Inc. contact: ron scheckelhoff rscheckelhoff@fourcalorieservers.com