Python http module is “a package that collects several modules for working with
the HyperText Transfer Protocol”. The module is a middle level module that sits
between low level socket and high level urllib modules. It is relatively simpler
than the other two modules.
I can’t find many documentation for the http module. The standard library has one page for each file in this package. The Python 3 Module of the Week site has two pages for http.server and http.cookies.
The files are saved in this directory.
~/.pyenv/versions/3.9.7/lib/python3.9/http
The module has five files and the statistics of the files are shown below.
$ find . -maxdepth 1 -name '*.py' -exec wc -l '{}' + | sort -n
149 ./__init__.py
612 ./cookies.py
1295 ./server.py
1519 ./client.py
2113 ./cookiejar.py
5688 total
Those modules appear to have different authors so they are loosely related.
The __init__.py module defines an HTTPStatus class which “defines
a number of HTTP status codes, reason phrases and long descriptions”.
The http.server can be invoked directly on the command line. We can setup a simple http server via this command serving the current directory.
python -m http.server 8080
In the web browser we can access a static website by typing the address
http://localhost:8080/. The http server will start serving the
index.html file of a website.
The command will invoke the server.py script and start a server. The
code after the ... = '__main__': line starts running. The code looks
like the concept shown below.
# concept code
addr = (host, port)
with DualStackServer(addr, SimpleHTTPRequestHandler) as httpd:
try:
httpd.serve_forever()
except KeyboardInterrupt:
print('...')
sys.exit(0)
The DualStackSever is derived from ThreadingHTTPServer, which is
in turn derived from HTTPServer. The whole hierarchy of the classes is
shown below.
socketserver.BaseServer
socketserver.TCPServer
HTTPServer
ThreadingHTTPServer
DualStackServer
socketserver.ThreadingMixIn
ThreadingHTTPServer
The http.server module is closer to the socketserver module than
other modules in the http package.
The serve_forever command can be thought as the entry point of the
module. It is defined in the BaseServer class of the socketserver
module. The code is shown below.
# serve_forever method in BaseServer
def serve_forever(self, poll_interval=0.5):
"""Handle one request at a time until shutdown.
Polls for shutdown every poll_interval seconds. Ignores
self.timeout. If you need to do periodic tasks, do them in
another thread.
"""
self.__is_shut_down.clear()
try:
# XXX: Consider using another file descriptor or connecting to the
# socket to wake this up instead of polling. Polling reduces our
# responsiveness to a shutdown request and wastes cpu at all other
# times.
with _ServerSelector() as selector:
selector.register(self, selectors.EVENT_READ)
while not self.__shutdown_request:
ready = selector.select(poll_interval)
# bpo-35017: shutdown() called during select(),
# exit immediately.
if self.__shutdown_request:
break
if ready:
self._handle_request_noblock()
self.service_actions()
finally:
self.__shutdown_request = False
self.__is_shut_down.set()
The serve_forever method uses the selector module. The register method is
called only once here, so the selector only have one server socket registered.
The selector does not include client sockets. I haven’t seen code like this in
other places. The method then calls _handle_request_noblock, which in turn
calls other methods in BaseServer and TCPServer classes. The self.socket
shown below is a network socket initialized in __init__ method of TCPServer class.
The request variable is normally named conn or connection which is a
client socket.
# concept code in _handle_request_noblock, simplified
request, client_address = self.socket.accept()
self.RequestHandleClass(request, client_address, self)
request.close()
The RequestHandler class also has a hierarchy. Two of the classes
are defined in the socketserver module and other two are defined in
the http.server module. Those four classes are well organized.
socketserver.BaseRequestHandler
socketserver.StreamRequestHandler
BaseHTTPRequestHandler
SimpleHTTPRequestHandler
Here is the code in the __init__ method of BaseRequestHandler class.
def __init__(self, request, client_address, server):
self.request = request
self.client_address = client_address
self.server = server
self.setup()
try:
self.handle()
finally:
self.finish()
The setup method is defined in the StreamRequestHandler class. It
sets up two file streams rfile and wfile on the client socket, and
derived classes can use those two streams to accept request from client
and write response to the client.
The rfile is the return value of the makefile call on the socket.
The wfile is a class _SocketWrite object which derives from
BufferedIOBase class, and it has a method write which calls
sendall method of the socket. So those classes are all built on
top of the socket module.
The handle method in BaseHTTPRequestHandler class calls the
handle_one_request method. It in turn calls parse_request
method and method. The parse_request will parse the first
line of the request (e.g., “GET / HTTP/1.1”) and the HTTP headers.
The method refers to one of the methods defined in SimpleHTTPRequestHandler
class such as do_GET and do_HEAD. The do... method will
write to the wfile stream and send the response back.
The SimpleHTTPRequestHandler does not define a do_POST method,
so it can’t handle any POST method.
That’s basically how the http.server handles a request.
The http.client module “defines classes implement the client side of the
HTTP and HTTPS protocols”. The documentation says that “it is normally not
used directly” and implies that urllib is recommended. Here is an example
on how to use the module.
conn = http.client.HTTPSConnection('www.python.org')
conn.request('GET', '/')
resp = conn.getresponse()
print(resp.status, resp.reason)
data = resp.read() # return content in a byte string
If you already read the http.server module code, the source code in this
module is not difficult to understand. The request method will call the
_send_request method of the HTTPConnection class, which in turn calls
putrequest, putheader, and endheaders methods. The send method
of the class calls sendall method of the socket to actually transfer
the request.
The getresponse method creates an instance of the HTTPResponse class,
calls the begin method, and returns the instance. Then we can call
the read method on the instance to get the actual HTTP response.
The http cookies module is relatively independent from
other modules. The cookies.py module only imports three standard modules,
re, string, and types. It defines Morsel, BaseCookie, and SimpleCookie
classes. The examples in the Python documentation do not show how to
use this module with other server or client module.
The http cookiejar module defines a Cookie class and CookieJar and
FileCookieJar classes. This Cookie class is not related to the SimpleCookie
class. The examples on Python documentation shows how to use this module
with the urllib module.
I spent quite some time reading the http module source code and trying to
understand how they work. The code is a good resource for studying Python
network related topics.