SourceForge.net Logo

mechanize

Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .

An example:

import re
from mechanize import Browser

br = Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
response = br.follow_link(text_regex=re.compile(r"cheese\s*shop"), nr=1)
assert br.viewing_html()
print br.title()
print response.geturl()
print response.info()  # headers
print response.read()  # body
response.close()

br.select_form(name="order")
# Browser passes through unknown attributes (including methods)
# to the selected HTMLForm (from ClientForm).
br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__)
response2 = br.submit()  # submit current form

response3 = br.back()  # back to cheese shop
# the history mechanism uses cached requests and responses
assert response3 is response
# we can still use the response, even though we closed it:
response3.seek(0)
response3.read()
response4 = br.reload()
assert response4 is not response3

for form in br.forms():
    print form
# .links() optionally accepts the keyword args of .follow_/.find_link()
for link in br.links(url_regex=re.compile("python.org")):
    print link
    br.follow_link(link)  # takes EITHER Link instance OR keyword args
    br.back()

You may control the browser's policy by using the methods of mechanize.Browser's base class, mechanize.UserAgent. For example:

br = Browser()
# Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML).
br.set_handle_equiv(False)
# Ignore robots.txt.  Do not do this without thought and consideration.
br.set_handle_robots(False)
# Don't handle cookies
br.set_cookiejar()
# Supply your own ClientCookie.CookieJar (NOTE: cookie handling is ON by
# default: no need to do this unless you have some reason to use a
# particular cookiejar)
br.set_cookiejar(cj)
# Print information about HTTP redirects and Refreshes.
br.set_debug_redirects(True)
# Print HTTP response bodies (ie. the HTML, most of the time).
br.set_debug_responses(True)
# Print HTTP headers.
br.set_debug_http(True)

Full documentation is in the docstrings.

Thanks to Ian Bicking, for persuading me that a UserAgent class would be useful.

Todo

Download

All documentation (including this web page) is included in the distribution.

This is an alpha release: interfaces may change, and there will be bugs.

Development release.

For installation instructions, see the INSTALL file included in the distribution.

Subversion

The Subversion (SVN) trunk is http://codespeak.net/svn/wwwsearch/mechanize/trunk, so to check out the source:

svn co http://codespeak.net/svn/wwwsearch/mechanize/trunk mechanize

See also

Richard Jones' webunit (this is not the same as Steven Purcell's code of the same name). webunit and mechanize are quite similar. On the minus side, webunit is missing things like browser history, high-level forms and links handling, thorough cookie handling, refresh redirection, adding of the Referer header, observance of robots.txt and easy extensibility. On the plus side, webunit has a bunch of utility functions bound up in its WebFetcher class, which look useful for writing tests (though they'd be easy to duplicate using mechanize). In general, webunit has more of a frameworky emphasis, with aims limited to writing tests, where mechanize and the modules it depends on try hard to be general-purpose libraries.

There are many related links in the General FAQ page, too.

FAQs

I prefer questions and comments to be sent to the mailing list rather than direct to me.

John J. Lee, November 2005.