Other HTTP Proxy Solutions

Download 247 Kb.
bet	2/3
Sana	17.09.2020
Hajmi	247 Kb.
	#11402

1 2 3

Other HTTP Proxy Solutions

Traffic Server is obviously not a new invention of any kind, as there are plenty of similar solutions both in the Open Source community, and as commercial products. This paper will not detail all of the available solutions. However, we will focus on Free and Open Source solutions.

All of these existing intermediaries provide the basic features necessary for proxying HTTP requests. Each piece of software has its own pros and cons - some are optimized for a smaller set of applications, while others are more generic. Performance differs wildly between the different implementations, but in all honesty, performance is usually the least important piece in the decision-making process. The next sections will discuss several of the more common intermediary solutions that are currently available.

Squid

Squid[4] is probably the most well known, and oldest, of popular HTTP proxy servers that are currently in use. It originated from the Harvest project, and has since gone through many updates and even large rewrites. The code base is very mature, and it is used in a large number of mission-critical applications.

Squid typically runs as a single-process, single-threaded, asynchronous event processor. This means that it is somewhat limited in scalability on modern multi-core systems; however, there is work being done to try to alleviate this problem. When it comes to features and support for all the various extensions to HTTP and HTTP intermediaries, Squid shines. There is really no other Open Source server that is as feature rich as Squid right now, and it should definitely be considered when evaluating servers.

Varnish

Varnish[5] is an HTTP intermediary, which takes advantage of modern kernel features in Linux, FreeBSD and Solaris in order to simplify the code, while at the same time achieving very high performance. A fundamental design decision is that all caching is done using the virtual memory provided by the Operating System, and each active connection uses up a thread. The latter means that Varnish can (and probably will) run with a large number of threads.

The core code in Varnish is fairly small. Instead, the system comes with its own configuration language, VCL, which is very flexible. The downside is that almost any configuration or setup with Varnish will require some VCL coding or tweaking. There are a large number of contributed VCL scripts, which solve many common problems and configuration requirements.

But Varnish wasn’t built to be a general-purpose intermediary. As an example, Varnish will buffer the entire response before sending it to the client, which might not work for all types of HTTP services.

nginx

nginx[6] is an HTTP web server that also can function as a proxy and cache, which puts it in the same category as Apache HTTPD. In fact, nginx is quickly becoming a contender in the HTTP arena, already having grabbed a significant portion of the market share. This jack-of-all-trades design also means that nginx is not a general-purpose intermediary either.

nginx uses a concurrency model similar to Apache Traffic Server, except that it uses multiple processes instead of threads. In addition to HTTP, it can proxy several other TCP protocols, and also have a flexible plugin interface for extensions and additions.

HAProxy

HAProxy[7] implements a proxy server that primarily is tailored for HTTP (and possibly other TCP protocols) load-balancing and request routing. It is an event-driven, single-process application, with a reach feature set for doing interesting Layer 7 routing decisions. With only a single process, it does not scale particularly well on modern multi-core CPUs. It has a limited feature set as a generic HTTP intermediary, but is very robust and reliable as a proxy. The HAProxy official website points out that the server has never crashed in a production environment, which is quite a feat if true.

Feature Comparison

The following table summarizes and compares common features implemented by a few popular HTTP intermediaries:

	ATS	HAProxy	nginx	Squid	Varnish
Work Threads		X	X	X	
Multi-process	X	^¹		^²	
Event-driven					X
Plugin APIs		X	^³	^⁴	^⁵
Forward proxy		X	X		X
Reverse proxy					
Transp. proxy	X^⁶				X
Load Balancer	^⁷				^⁸
Cache		X			
ESI		X	X		
ICP	^⁹	X	X		X
Keep-Alive		X			
SSL		X			X
Pipeline	^¹⁰	X			X

Comparing HTTP intermediaries

Content Delivery Networks

A Content Delivery Network, or CDN, is a service or infrastructure used to deliver certain types of HTTP content. This content is usually static by nature, where Edge caches can effectively store the objects locally for some time. Examples of CDN-type content are JavaScript, CSS, and all types of images and other static media content. Serving such content out of a caching HTTP intermediary makes deployment and management significantly easier, since the content distribution is automatic.

A CDN automates content distribution to many collocations, simplifying the operational tasks and costs. To improve end-user experience, a CDN is commonly deployed on the Edge networks, assuring that the content is as close as possible to the users.

There are several reasons this is beneficial:

Cost reductions, and more effective utilization of resources
Faster page load times
Redundancy and resilience to network outages

The biggest question you face when deciding on a CDN is whether to build it yourself or to buy it as a service from one of the many commercial CDN vendors. In most cases, you are probably better off buying CDN services initially. There are initial costs associated with setting up your own private CDN on the Edge, and this should be considered when doing these evaluations.

Notwithstanding the above limitations, I am a strong proponent of building your own CDN, particularly if your traffic is large enough that the costs of buying the services from a CDN vendor are considerable. Further, to be blunt, building a CDN is not rocket science. Any organization with a good infrastructure and operations team can easily do it. All you need is to configure and deploy a (small) number of servers running as reverse proxy servers for HTTP (and sometimes HTTPS).

Building a CDN with Apache TS

Apache Traffic Server is an excellent choice for building your own CDN. Why? First of all, it scales incredibly well on a large number of CPUs, and well beyond Gigabit network cards. Additionally, the technology behind Traffic Server is well-geared toward a CDN:

The Traffic Server cache is fast and scales very well. It is also very resilient to corruptions and crashes. In over 4 years of use of the Yahoo! CDN, there has not been a single (known) data corruption in the cache.
The server is easy to deploy and manage as a reverse proxy server. The most common configuration tasks and changes can be done on live systems, and never require server restarts.
It scales well for a large number of concurrent connections, and supports all necessary HTTP/1.1 protocol features (such as SSL and Keep-Alive).

As a proven technology, Traffic Server delivers over 350,000 requests/second, and over 30Gbps in the Yahoo! CDN alone. This is an unusually large private CDN, with over 100 servers deployed worldwide. Most setups will be much smaller.

Of course, many of the other existing HTTP caches can be used to build a CDN. We believe Traffic Server is a serious contender in this area, but there is healthy competition.

Configuration

We are not going to go into great details about how to configure Apache Traffic Server for building your CDN. There are primarily two configuration files relevant for setting up Traffic Server as a caching intermediary:

records.config – This file holds a number of key-value pair, and in most situations the defaults are good enough (but we will tweak this for a CDN).
remap.config – This configuration file, which is empty by default, holds the mapping rules so that TS can function as a reverse proxy.

Out of the box, Traffic Server configuration is very restricted; in order to build a basic CDN server we will need to modify both of these files. Let’s start with records.config:

CONFIG proxy.config.http.server_port INT 80

CONFIG proxy.config.cache.ram_cache.size LLONG 512MB

And then remap.config (these are just examples of a “dummy” CDN):

map http://cdn.example.com/js http://js.example.com
map http://cdn.example.com/css http://css.example.com

map http://cdn.example.com/img http://img.example.com

Some example URLs that will work with the above configurations:

http://cdn.example.com/js/cool-stuff.js

http://cdn.example.com/img/thumbnail/ogre.png

Of course, there can be much more complex configurations, particularly in the remap configuration, but the examples demonstrate how little configuration would be required to get a functional CDN with almost zero configuration using Apache Traffic Server.

Connection Management with ATS

Connection management is very similar to a CDN; in fact, many CDN vendors also provide such services as well. The purpose of such a service is primarily to reduce latency for the end-user. Living on the Edge, the connection management service can effectively fight two enemies of web performance:

TCP 3-way handshake. Being on the Edge, the latency introduced by the handshake is reduced. Allowing for long-lived Keep-Alive connections can eliminate such latency entirely.
TCP congestion control (e.g. “Slow Start”). The farther away a user is from the server, the more visible the congestion control mechanisms become. Being on the Edge, users will always connect to an HTTP server (an Origin Server or another intermediary) that is close.

The following picture shows how users in various areas of the world connect to different servers. Some users might connect directly to the HTTP web server (the “service”), while others might connect to an intermediary server that is close to the user.

Connection management

Connections between the intermediaries (the connection managers) and Origin Servers (“web site”) are long-lived, thanks to HTTP Keep-Alive. Reducing the distance between a user and the server, as well as eliminating many new TCP connections, will reduce page-load times significantly. In some cases, we’ve measured up to 1 second or more reduction in first page-load time, only by introducing the connection manager intermediaries.

Download 247 Kb.

1 2 3

Download 247 Kb.

Other HTTP Proxy Solutions

Other HTTP Proxy Solutions

Squid

Varnish

nginx

HAProxy

Feature Comparison

Content Delivery Networks

Building a CDN with Apache TS

Configuration

Connection Management with ATS

Other HTTP Proxy Solutions