.. meta::
  :navigation.order: 0
  :navigation.name: WebCleaner

===================================
WebCleaner - a filtering HTTP proxy
===================================


Features
========

- remove unwanted HTML (adverts, flash, etc.)
- popup blocker
- disable animated GIFs
- filter images by size, remove banner adverts
- compress documents on-the-fly (with gzip)
- reduce images to low-bandwidth JPEGs
- remove/add/modify arbitrary HTTP headers
- configurable over web interface
- usage of SquidGuard blacklists
- antivirus filter module
- detection and correction of known HTML security flaws
- Basic, Digest and (untested) NTLM proxy authentication support
- per-host access control
- HTTP/1.1 support (persistent connections, pipelining)
- HTTPS proxy CONNECT and optional SSL gateway support

WebCleaner is featured in the `Linux Magazine Issue 43`_.
The `article`_ is downloadable as PDF.

.. _Linux Magazine Issue 43:
   http://www.linux-magazine.com/issue/43
.. _article:
   http://www.linux-magazine.com/issue/43/Charly_Column.pdf


Download
========

Download the latest packages from `WebCleaner download section`_.

.. _WebCleaner download section:
   http://sourceforge.net/project/showfiles.php?group_id=7692

Requirements and installation instructions are located at the
`install documentation`_. To see what has changed between releases
look at the ChangeLog_.

.. _install documentation: install.html
.. _ChangeLog: http://cvs.sourceforge.net/viewcvs.py/webcleaner/webcleaner2/ChangeLog?view=markup


Screenshots
===========

   +----------------------------+----------------------------+
   | .. image:: shot1_thumb.jpg | .. image:: shot2_thumb.jpg |
   |      :align: middle        |      :align: middle        |
   |      :target: shot1.png    |      :target: shot2.png    |
   +----------------------------+----------------------------+
   | Proxy configuration        | Filter configuration       |
   +----------------------------+----------------------------+


Why should I use WebCleaner?
============================

The first feature that sets WebCleaner apart from other proxies is
exact HTML filtering, and this removes a lot of advertisings.
The filter does not just replace some strings, the proxy parses all HTML
data. The parser is fast (written in C) and can cope with every
broken HTML page out there; if the parser does not recognize HTML
structures, it just passes the data over to the proxy until it recognizes
a tag again. No valid HTML data is ever discarded or dropped.

Another feature is the JavaScript filtering: JavaScript data is
executed in the integrated Spidermonkey JavaScript engine which is also
used by the Mozilla browser suite.
This eliminates all JavaScript obfuscation, popups, and document.write()
stuff, but the other JavaScript functions still work as usual.

Exact HTML filtering has another good side-effect: it is possible to
detect and prevent known security flaws in HTML processors. Several
known (but not all!) buffer overflow exploits or Denial of Service
attacks are detected and fixed by the HtmlSecurity class.

If you find an HTML exploit that is not covered by the security filter,
please let me know.

Furthermore, WebCleaner can filter SSL traffic used in ``https://`` URLs.
See the `SSL gateway`_ documentation for more info.

.. _SSL gateway: devel/sslgateway.html


Configuration
=============

Assuming your proxy runs on port *8080*, point your browser to
http://localhost:8080/ to configure the proxy.
The underlying configuration format is a custom XML format which is
explained in config/filter.dtd and config/webcleaner.dtd.


Running
=======

Please note that the web configuration interface needs write permissions
in the configuration directory.


Running under Unix/Linux
------------------------

The proxy is supervised and automatically (re-)started from the runit
package.
See the `runit homepage`_ for more information.

.. _runit homepage: http://smarden.org/runit/

Running under Windows
---------------------

The proxy is a normal NT service and can be started/stopped from the
"Administrative Tasks" entry in the system configuration.

Setting access permissions
--------------------------

To allow using your proxy from other hosts than the one it is running
on, you have to edit the allowed host list in the configuration
interface.

For example to allow access from your local LAN network at
``192.168.1.*`` you would add ``192.168.1.1/8`` to the allowed host
list.

If you do allow access from other hosts than your own, please do not
remove the password protection.
Otherwise you will be running an `open proxy`_ which is a security risk.

.. _open proxy:
   http://en.wikipedia.org/wiki/Open_proxy


Bug reports and mailing list
============================

For help and bug reports you can join the
webcleaner-users@lists.sourceforge.net
mailing list at the `subscription page`_ or read the `list archives`_.

.. _subscription page:
   http://lists.sourceforge.net/lists/listinfo/webcleaner-users

.. _list archives:
   http://sourceforge.net/mailarchive/forum.php?forum=webcleaner-users

Notes
=====

WebCleaner is *not* a HTTP compliant proxy because it modifies requests,
headers and data. Modifications aside, the proxy tries to fulfill
the HTTP/1.1 specifications found in RFC 2616.

Surf performance will decrease, especially with the Rewriter and the
Replacer module enabled. It will decrease further with JavaScript
parsing enabled, since the proxy downloads and parses <script src="">
tags in the background.

The Rewriter module parses the HTML. It optimizes HTML by making
tags and attribute names lowercase and removing some (but not all)
ignorable whitespace.

The warning "unsupported content encoding" could lead to corrupted
HTML pages. WebCleaner tries to filter even unknown-encoded content
to prevent Denial of Service attacks (eg webservers sending always an
unknown content-encoding).
Currently, this affects only the "compress" or "x-compress" encoding,
because the LZW algorithm to uncompress such content is patented and
therefore not included in WebCleaner. See http://burnallgifs.org/.

.. image:: http://sourceforge.net/sflogo.php?group_id=7692&type=1
   :align: right
   :target: http://sourceforge.net/projects/webcleaner/
   :alt: SourceForge Logo
   :width: 88
   :height: 31

