Intellectual Property and the Internet/Proxy servers

In computer networks, a proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server. The proxy server evaluates the request according to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is validated by the filter, the proxy provides the resource by connecting to the relevant server and requesting the service on behalf of the client. A proxy server may optionally alter the client's request or the server's response, and sometimes it may serve the request without contacting the specified server. In this case, it 'caches' responses from the remote server, and returns subsequent requests for the same content directly.

Diagram of two computers connected only via a proxy server. The first computer says to the proxy server: "ask the second computer what the time is".
Communication between two computers (shown in grey) connected through a third computer (shown in red) acting as a proxy.

The proxy concept was invented in the early days of distributed systems[1] as a way to simplify and control their complexity. Today, most proxies are a web proxy, allowing access to content on the World Wide Web.

Uses

edit

A proxy server has a large variety of potential purposes, including:

  • To keep machines behind it anonymous, mainly for security.[2]
  • To speed up access to resources (using caching). Web proxies are commonly used to cache web pages from a web server.[3]
  • To apply access policy to network services or content, e.g. to block undesired sites.
  • To access sites prohibited or filtered by your ISP or institution.
  • To log / audit usage, i.e. to provide company employee Internet usage reporting.
  • To bypass security / parental controls.
  • To circumvent Internet filtering to access content otherwise blocked by governments.[4]
  • To scan transmitted content for malware before delivery.
  • To scan outbound content, e.g., for data loss prevention.
  • To allow a web site to make web requests to externally hosted resources (e.g. images, music files, etc.) when cross-domain restrictions prohibit the web site from linking directly to the outside domains.

A proxy server that passes requests and responses unmodified is usually called a gateway or sometimes tunneling proxy.

A proxy server can be placed in the user's local computer or at various points between the user and the destination servers on the Internet.

A reverse proxy is (usually) an Internet-facing proxy used as a front-end to control and protect access to a server on a private network, commonly also performing tasks such as load-balancing, authentication, decryption or caching.

Types of proxy

edit

Forward proxies

edit
 
A forward proxy taking requests from an internal network and forwarding them to the Internet.

Forward proxies are proxies where the client server names the target server to connect to.[5] Forward proxies are able to retrieve from a wide range of sources (in most cases anywhere on the Internet).

The terms "forward proxy" and "forwarding proxy" are a general description of behavior (forwarding traffic) and thus ambiguous. Except for Reverse proxy, the types of proxies described in this article are more specialized sub-types of the general forward proxy concept.

Open proxies

edit
 
An open proxy forwarding requests from and to anywhere on the Internet.

An open proxy is a forwarding proxy server that is accessible by any Internet user. Gordon Lyon estimates there are "hundreds of thousands" of open proxies on the Internet.[6] An anonymous open proxy allows users to conceal their IP address while browsing the Web or using other Internet services. There are varying degrees of anonymity however, as well as a number of methods of 'tricking' the client into revealing itself regardless of the proxy being used.

Reverse proxies

edit
 
A reverse proxy taking requests from the Internet and forwarding them to servers in an internal network. Those making requests connect to the proxy and may not be aware of the internal network.

A reverse proxy (or surrogate) is a proxy server that appears to clients to be an ordinary server. Requests are forwarded to one or more origin servers which handle the request. The response is returned as if it came directly from the proxy server.[5]

Reverse proxies are installed in the neighborhood of one or more web servers. All traffic coming from the Internet and with a destination of one of the neighborhood's web servers goes through the proxy server. The use of "reverse" originates in its counterpart "forward proxy" since the reverse proxy sits closer to the web server and serves only a restricted set of websites.

There are several reasons for installing reverse proxy servers:

  • Encryption / SSL acceleration: when secure web sites are created, the SSL encryption is often not done by the web server itself, but by a reverse proxy that is equipped with SSL acceleration hardware. See Secure Sockets Layer. Furthermore, a host can provide a single "SSL proxy" to provide SSL encryption for an arbitrary number of hosts; removing the need for a separate SSL Server Certificate for each host, with the downside that all hosts behind the SSL proxy have to share a common DNS name or IP address for SSL connections. This problem can partly be overcome by using the SubjectAltName feature of X.509 certificates.
  • Load balancing: the reverse proxy can distribute the load to several web servers, each web server serving its own application area. In such a case, the reverse proxy may need to rewrite the URLs in each web page (translation from externally known URLs to the internal locations).
  • Serve/cache static content: A reverse proxy can offload the web servers by caching static content like pictures and other static graphical content.
  • Compression: the proxy server can optimize and compress the content to speed up the load time.
  • Spoon feeding: reduces resource usage caused by slow clients on the web servers by caching the content the web server sent and slowly "spoon feeding" it to the client. This especially benefits dynamically generated pages.
  • Security: the proxy server is an additional layer of defense and can protect against some OS and WebServer specific attacks. However, it does not provide any protection to attacks against the web application or service itself, which is generally considered the larger threat.
  • Extranet Publishing: a reverse proxy server facing the Internet can be used to communicate to a firewalled server internal to an organization, providing extranet access to some functions while keeping the servers behind the firewalls. If used in this way, security measures should be considered to protect the rest of your infrastructure in case this server is compromised, as its web application is exposed to attack from the Internet.

Performance Enhancing Proxies

edit

A proxy that is designed to mitigate specific link related issues or degradations. PEPs (Performance Enhancing Proxies) are typically used to improve TCP performance in the presence of high Round Trip Times (RTTs) and wireless links with high packet loss. They are also frequently used for highly asynchronous links featuring very different upload and download rates.

Uses of proxy servers

edit

Filtering

edit

A content-filtering web proxy server provides administrative control over the content that may be relayed through the proxy. It is commonly used in both commercial and non-commercial organizations (especially schools) to ensure that Internet usage conforms to acceptable use policy. In some cases users can circumvent the proxy, since there are services designed to proxy information from a filtered website through a non filtered site to allow it through the user's proxy.[7]

A content filtering proxy will often support user authentication, to control web access. It also usually produces logs, either to give detailed information about the URLs accessed by specific users, or to monitor bandwidth usage statistics. It may also communicate to daemon-based and/or ICAP-based antivirus software to provide security against virus and other malware by scanning incoming content in real time before it enters the network.

Many work places, schools, and colleges restrict the web sites and online services that are made available in their buildings. This is done either with a specialized proxy, called a content filter (both commercial and free products are available), or by using a cache-extension protocol such as ICAP, that allows plug-in extensions to an open caching architecture.

Some common methods used for content filtering include: URL or DNS blacklists, URL regex filtering, MIME filtering, or content keyword filtering. Some products have been known to employ content analysis techniques to look for traits commonly used by certain types of content providers.

Requests made to the open internet must first pass through an outbound proxy filter. The web-filtering company provides a database of URL patterns (regular expressions) with associated content attributes. This database is updated weekly by site-wide subscription, much like a virus filter subscription. The administrator instructs the web filter to ban broad classes of content (such as sports, pornography, online shopping, gambling, or social networking). Requests that match a banned URL pattern are rejected immediately.

Assuming the requested URL is acceptable, the content is then fetched by the proxy. At this point a dynamic filter may be applied on the return path. For example, JPEG files could be blocked based on fleshtone matches, or language filters could dynamically detect unwanted language. If the content is rejected then an HTTP fetch error is returned and nothing is cached.

Extranet Publishing: a reverse proxy server facing the Internet can be used to communicate to a firewalled server internal to an organization, providing extranet access to some functions while keeping the servers behind the firewalls. If used in this way, security measures should be considered to protect the rest of your infrastructure in case this server is compromised, as its web application is exposed to attack from the Internet

Most web filtering companies use an internet-wide crawling robot that assesses the likelihood that a content is a certain type. The resultant database is then corrected by manual labor based on complaints or known flaws in the content-matching algorithms.

Web filtering proxies are not able to peer inside secure sockets HTTP transactions, assuming the chain-of-trust of SSL/TLS has not been tampered with. As a result, users wanting to bypass web filtering will typically search the internet for an open and anonymous HTTPS transparent proxy. They will then program their browser to proxy all requests through the web filter to this anonymous proxy. Those requests will be encrypted with https. The web filter cannot distinguish these transactions from, say, a legitimate access to a financial website. Thus, content filters are only effective against unsophisticated users.

As mentioned above, the SSL/TLS chain-of-trust does rely on trusted root certificate authorities; in a workplace setting where the client is managed by the organization, trust might be granted to a root certificate whose private key is known to the proxy. Concretely, a root certificate generated by the proxy is installed into the browser CA list by IT staff. In such scenarios, proxy analysis of the contents of a SSL/TLS transaction becomes possible. The proxy is effectively operating a man-in-the-middle attack, allowed by the client's trust of a root certificate the proxy owns.

A special case of web proxies is "CGI proxies". These are web sites that allow a user to access a site through them. They generally use PHP or CGI to implement the proxy functionality. These types of proxies are frequently used to gain access to web sites blocked by corporate or school proxies. Since they also hide the user's own IP address from the web sites they access through the proxy, they are sometimes also used to gain a degree of anonymity, called "Proxy Avoidance".

Caching

edit

A caching proxy server accelerates service requests by retrieving content saved from a previous request made by the same client or even other clients. Caching proxies keep local copies of frequently requested resources, allowing large organizations to significantly reduce their upstream bandwidth usage and costs, while significantly increasing performance. Most ISPs and large businesses have a caching proxy. Caching proxies were the first kind of proxy server.

Some poorly-implemented caching proxies have had downsides (e.g., an inability to use user authentication). Some problems are described in RFC 3143 (Known HTTP Proxy/Caching Problems).

Another important use of the proxy server is to reduce the hardware cost. An organization may have many systems on the same network or under control of a single server, prohibiting the possibility of an individual connection to the Internet for each system. In such a case, the individual systems can be connected to one proxy server, and the proxy server connected to the main server. An example of a software caching proxy is Squid.

DNS proxy

edit

A DNS proxy server takes DNS queries from a (usually local) network and forwards them to an Internet Domain Name Server. It may also cache DNS records.

Bypassing filters and censorship

edit

If the destination server filters content based on the origin of the request, the use of a proxy can circumvent this filter. For example, a server using IP-based geolocation to restrict its service to a certain country can be accessed using a proxy located in that country to access the service.

Likewise, a badly configured proxy can provide access to a network otherwise isolated from the Internet.[6]

Logging and eavesdropping

edit

Proxies can be installed in order to eavesdrop upon the data-flow between client machines and the web. All content sent or accessed – including passwords submitted and cookies used – can be captured and analyzed by the proxy operator. For this reason, passwords to online services (such as webmail and banking) should always be exchanged over a cryptographically secured connection, such as SSL.

By chaining proxies which do not reveal data about the original requester, it is possible to obfuscate activities from the eyes of the user's destination. However, more traces will be left on the intermediate hops, which could be used or offered up to trace the user's activities. If the policies and administrators of these other proxies are unknown, the user may fall victim to a false sense of security just because those details are out of sight and mind.

In what is more of an inconvenience than a risk, proxy users may find themselves being blocked from certain Web sites, as numerous forums and Web sites block IP addresses from proxies known to have spammed or trolled the site. Proxy bouncing can be used to maintain your privacy.


Gateways to private networks

edit

Proxy servers can perform a role similar to a network switch in linking two networks.

Accessing services anonymously

edit

An anonymous proxy server (sometimes called a web proxy) generally attempts to anonymize web surfing. There are different varieties of anonymizers. The destination server (the server that ultimately satisfies the web request) receives requests from the anonymizing proxy server, and thus does not receive information about the end user's address. However, the requests are not anonymous to the anonymizing proxy server, and so a degree of trust is present between the proxy server and the user. Many of them are funded through a continued advertising link to the user.

Access control: Some proxy servers implement a logon requirement. In large organizations, authorized users must log on to gain access to the web. The organization can thereby track usage to individuals.

Some anonymizing proxy servers may forward data packets with header lines such as HTTP_VIA, HTTP_X_FORWARDED_FOR, or HTTP_FORWARDED, which may reveal the IP address of the client. Other anonymizing proxy servers, known as elite or high anonymity proxies, only include the REMOTE_ADDR header with the IP address of the proxy server, making it appear that the proxy server is the client. A website could still suspect a proxy is being used if the client sends packets which include a cookie from a previous visit that did not use the high anonymity proxy server. Clearing cookies, and possibly the cache, would solve this problem.

Implementations of proxies

edit

Web proxy

edit

A web proxy passes along http protocol requests like any other proxy server. However, the web proxy accepts target URLs within a user's browser window, processes the request, and then displays the contents of the requested URL immediately back within the users browser. This is generally quite different than a corporate intranet proxy which some people mistakenly refer to as a web proxy.

Suffix proxy

edit

A suffix proxy allows a user to access web content by appending the name of the proxy server to the URL of the requested content (e.g. "en.wikipedia.org.SuffixProxy.com"). Suffix proxy servers are easier to use than regular proxy servers. But do not offer anonymity and the primary use is bypassing web filters; however, this is rarely used due to more advanced web filters.

Transparent proxy

edit

Also known as an intercepting proxy or forced proxy, a transparent proxy intercepts normal communication without requiring any special client configuration. Clients need not be aware of the existence of the proxy. A transparent proxy is normally located between the client and the Internet, with the proxy performing some of the functions of a gateway or router.[8]

RFC 2616 (Hypertext Transfer Protocol—HTTP/1.1) offers standard definitions:

"A 'transparent proxy' is a proxy that does not modify the request or response beyond what is required for proxy authentication and identification".
"A 'non-transparent proxy' is a proxy that modifies the request or response in order to provide some added service to the user agent, such as group annotation services, media type transformation, protocol reduction, or anonymity filtering".

In 2009 a security flaw in the way that transparent proxies operate was published by Robert Auger,[9] and the Computer Emergency Response Team issued an advisory listing dozens of affected transparent and intercepting proxy servers.[10]

Purpose

edit

Intercepting proxies are commonly used in businesses to prevent avoidance of acceptable use policy, and to ease administrative burden, since no client browser configuration is required. This second reason however is mitigated by features such as Active Directory group policy, or DHCP and automatic proxy detection.

Intercepting proxies are also commonly used by ISPs in some countries to save upstream bandwidth and improve customer response times by caching. This is more common in countries where bandwidth is more limited (e.g. island nations) or must be paid for.

Issues

edit

The diversion / interception of a TCP connection creates several issues. Firstly the original destination IP and port must somehow be communicated to the proxy. This is not always possible (e.g. where the gateway and proxy reside on different hosts). There is a class of cross site attacks which depend on certain behaviour of intercepting proxies that do not check or have access to information about the original (intercepted) destination. This problem can be resolved by using an integrated packet-level and application level appliance or software which is then able to communicate this information between the packet handler and the proxy.

Intercepting also creates problems for HTTP authentication, especially connection-oriented authentication such as NTLM, since the client browser believes it is talking to a server rather than a proxy. This can cause problems where an intercepting proxy requires authentication, then the user connects to a site which also requires authentication.

Finally intercepting connections can cause problems for HTTP caches, since some requests and responses become uncacheble by a shared cache.

Therefore intercepting connections is generally discouraged. However due to the simplicity of deploying such systems, they are in widespread use.

Implementation methods

edit

Interception can be performed using Cisco's WCCP (Web Cache Control Protocol). This proprietary protocol resides on the router and is configured from the cache, allowing the cache to determine what ports and traffic is sent to it via transparent redirection from the router. This redirection can occur in one of two ways: GRE Tunneling (OSI Layer 3) or MAC rewrites (OSI Layer 2).

Once traffic reaches the proxy machine itself interception is commonly performed with NAT (Network Address Translation). Such setups are invisible to the client browser, but leave the proxy visible to the web server and other devices on the internet side of the proxy. Recent Linux and some BSD releases provide TPROXY (transparent proxy) which performs IP-level (OSI Layer 3) transparent interception and spoofing of outbound traffic, hiding the proxy IP address from other network devices.

Detection

edit

There are several methods that can often be used to detect the presence of an intercepting proxy server:

  • By comparing the client's external IP address to the address seen by an external web server, or sometimes by examining the HTTP headers received by a server. A number of sites have been created to address this issue, by reporting the user's IP address as seen by the site back to the user in a web page.[1]
  • By comparing the sequence of network hops reported by a tool such as traceroute for a proxied protocol such as http (port 80) with that for a non proxied protocol such as SMTP (port 25). [2],[3]
  • By attempting to make a connection to an IP address at which there is known to be no server. The proxy will accept the connection and then attempt to proxy it on. When the proxy finds no server to accept the connection it may return an error message or simply close the connection to the client. This difference in behaviour is simple to detect. For example most web browsers will generate a browser created error page in the case where they cannot connect to an HTTP server but will return a different error in the case where the connection is accepted and then closed.[11]
  • By serving the end-user specially programmed flash files that send HTTP calls back to their server.

Tor onion proxy software

edit
 
The Vidalia Tor-network map.

Tor (short for The Onion Router) is a system intended to enable online anonymity.[12] Tor client software routes Internet traffic through a worldwide volunteer network of servers in order to conceal a user's location or usage from someone conducting network surveillance or traffic analysis. Using Tor makes it more difficult to trace Internet activity, including "visits to Web sites, online posts, instant messages and other communication forms", back to the user.[12] It is intended to protect users' personal freedom, privacy, and ability to conduct confidential business by keeping their internet activities from being monitored.

"Onion routing" refers to the layered nature of the encryption service: The original data are encrypted and re-encrypted multiple times, then sent through successive Tor relays, each one of which decrypts a "layer" of encryption before passing the data on to the next relay and ultimately the destination. This reduces the possibility of the original data being unscrambled or understood in transit.[13]

The Tor client is free software, and there are no additional charges to use the network.

I2P anonymous proxy

edit

The I2P anonymous network ('I2P') is a proxy network aiming at online anonymity. It implements garlic routing, which is an enhancement of Tor's onion routing. I2P is fully distributed and works by encrypting all communications in various layers and relaying them through a network of routers run by volunteers in various locations. By keeping the source of the information hidden, I2P offers censorship resistance. The goals of I2P are to protect users' personal freedom, privacy, and ability to conduct confidential business.

Each user of I2P runs an I2P router on their computer (node). The I2P router takes care of finding other peers and building anonymizing tunnels through them. I2P provides proxies for all protocols (HTTP, irc, SOCKS, ...).

The software is free and open-source, and the network is free of charge to use.

References

edit
  1. Shapiro, Marc (1986). "Structure and encapsulation in distributed systems: the Proxy Principle" (PDF). Int. Conf. on Distributed Computer Sys.: 198–204. Retrieved 4 September 2011. {{cite journal}}: Unknown parameter |month= ignored (help)
  2. "Firewall and Proxy Server HOWTO". tldp.org. Retrieved 4 September 2011. The proxy server is, above all, a security device.
  3. Thomas, Keir (2006). Beginning Ubuntu Linux: From Novice to Professional. Apress. ISBN 9781590596272. A proxy server helps speed up Internet access by storing frequently accessed pages
  4. "2010 Circumvention Tool Usage Report" (PDF). The Berkman Center for Internet & Society at Harvard University. October 2010.
  5. a b "Forward and Reverse Proxies". httpd mod_proxy. Apache. Retrieved 20 December 2010.
  6. a b Lyon, Gordon (2008). Nmap network scanning. US: Insecure. p. 270. ISBN 9780979958717.
  7. "Using a Ninjaproxy to get through a filtered proxy". advanced filtering mechanics. TSNP. Retrieved 17 September 2011.
  8. "What is an intercepting proxy?". uCertify. 28 February 2010. Retrieved 4 September 2011.
  9. "Socket Capable Browser Plugins Result In Transparent Proxy Abuse". The Security Practice. 9 March 2009. Retrieved 14 August 2010.
  10. "Vulnerability Note VU#435052". US CERT. 23 February 2009. Retrieved 14 August 2010.
  11. Wessels, Duane (2004). Squid The Definitive Guide. O'Reilly. p. 130. ISBN 9780596001629.
  12. a b Glater, Jonathan (25 January 2006). "Privacy for People Who Don't Show Their Navels". The New York Times. http://www.nytimes.com/2006/01/25/technology/techspecial2/25privacy.html?_r=1. Retrieved 4 August 2011. 
  13. The Tor Project. "Tor: anonymity online". Retrieved 9 January 2011.
edit