A5 - HTTP Server
350 - 600 points
Last updated
350 - 600 points
Last updated
How many websites do you visit per day? Do you need a specialized program for every one of them? Probably not. It turns out that it has not always been the same. Before the invention of HTTP protocol (the backbone of the web, by Tim Berners Lee) people used different programs to access different parts of the web. What made it possible was the unification of application-layer web protocols.
However, the complexities of dealing with HTTP are usually hidden behind the abstraction of web server frameworks like Node.js, Flask, Django, and others. Not today! In this assignment, you will explore what happens behind the curtains by implementing a simplified HTTP server.
Your job is to implement a simplified HTTP 1.1 server supporting GET
methods. The website, based on HTML files and images we provide, should be loadable from your server and be visible in any browser. Please note, that while most modern browsers will render you an HTML document even if was sent by a protocol other than HTTP (for example FTP), for this assignment you are required to use HTTP for all client-server communication.
We provide a minimal webpage that your implementation must be able to statically serve. Your server must support GET
requests and be able to serve different types of files typically found on a webpage, such as jpeg images, HTML, and CSS files. Aditionally, your server shall support different status codes, such as 200, 201, 400, and 404 and serve the correct HTML file depending on the response status code.
The main document you will be working with is , the specification for the HTTP/1.1 protocol. You will not have to implement the entire HTTP/1.1 specification, however some sections of interest are:
Section 1 - Introduction A general overview of the HTTP/1.1 protocol and its purpose.
Section 5.1 & 5.2 - Request Line Learn how to parse requests sent by browsers, the message format and its standard contents.
Section 9.3 - The GET
method
Section 3.2 - HTTP Version
Section 7.2.1 - Type Specify the content type served, i.e., jpeg, html, and so on.
Section 7.2.2 - Entity Length Specify the content length, required for determining the end of the transmitted data.
Sections 10.2.1 - 10.4.1 - Status Codes Descriptions on status codes you will have to supoort. Note that we will only require a subset of the presented status codes.
Section 8 - Connections Support for persistent connections (sk yourself: why were those important to add?). Without the persistent connections support, your browser will not likely load your HTTP 1.1 page.
Section 8.2.3 & 8.2.4 - Use of the 100 (Continue) Status & Client Behavior if Server Prematurely Closes Connection Learn about the ways you should deal with connections based on their state.
RFCs may seem overly complex, holding trumendous amounts of information that one cannot process in one sitting. The key to using RFCs to their whole potential is treating them as a tool rather than a book to read. Use RFCs to guide your development process rather than a prerequisite read to starting to write code.
That being said, we recommend reading at least once the Introduction section to familiarize yourself with the protocol.
The following requirements indicate the protocol specification the server should adhere to, as well as technical requirements regarding implementation.
The server must support HTTP 1.1.
The server must support the GET
method.
The server must support HTTP 1.1 persistent connections.
The implementation must not use either of the exit()
or sendall()
Python functions.
The server must support multiple concurrent connections.
Your task is to adapt your HTTP/1.1 server to serve a dynamic endpoint that gets updated through a POST
request form. In doing so, consider performance, specifically handling many concurrent connections efficiently. Practice shows that handling each connection in a new thread will not get us far.
POST
requestsThe personal_cats.html
file contains a <table>
entry, which is currently being statically served. You will have to update the file to include a special placeholder that you will then replace with the personal cat image links of each user. Each connection may submit different pictures to be included in the table, therefore, each connection should see a different version of the personal_cats.html
file upon requesting it.
Submitting the image links should be done through a POST /data
request, using a application/x-www-form-urlencoded
encoding with the following fields:
cat_url
An url (link) of cat picture.
description
The description of the picture.
The table entry in the personal_cats.html
file should be replaced as shown below. Your implementation should replace TABLE_PLACEHOLDER
with the <tr><td>...</td></tr>
entries containing the images stored by each user.
The state should not be persistently stored. It is ok if the images are gone if the user restarts their browser or the server is restarted.
If you are on MacOS, the epoll interface is not available to you, therefore, you will have to resort to implementing a thread pool.
Your implementation will be judged by the grading teaching assistant. If the implementation is not considered to meet the performance requirements of the bonus assignment, the submission may be rejected. There are no automated tests for performance testing on CodeGrade.
The following requirements describe the expected behavior of the dynamic endpoints for the first part of the bonus. Please note that the personal_cats.html
file will be treated as an endpoint following these requirements.
The server must support POST
requests on the /data
endpoint.
The /data endpoint must properly implement application/x-www-form-urlencoded
using the prescribed fields.
The server must support dynamic GET
requests on the /personal_cats.html
endpoint. Namely, each connection should be able to see their images but not others'.
The server must implement the 201 status code and serve the success.html
page with it upon successful access of the /data
endpoint.
The server must support the status code 400 and serve the 400.html
page with it upon a failed access of the /data
endpoint.
The /data
endpoint must check if the fields are empty and display an appropriate error if erroneous fields are inputted.
The /personal_cats.html
endpoint must support replacing the TABLE_PLACEHOLDER
placeholder, regardless of the file or context it is found in.
To qualify for the 50 bonus points for code quality, your implementation must be easily scalable to an indefinite number of endpoints, as well as being able to demonstrate its performance capabilities with several hundreds of parallel curl
requests (at least as performant as Python can get).
Execute the script below to test your server's performance. Adjust the NUM_CONNECTIONS
variable according to your test's magnitude. For comparison, our sample implementation can handle 3000 connections in 6.37s user 12.62s system 839% cpu 2.263 total.
The test mentioned above is not integrated in the automated testing environment. To pass the performance criteria, you must demonstrate during the sign-off that your implementation can serve the required number of parallel connections.
Your assignment must implement a subset of the features described in . In short, your server should implement functionality that enables statically serving any webpage. We define the following requirements as a smaller specification based on the RFC to guide your development process.
As HTTP is rapidly being adopted into more and more application-layer software, we start noticing the need for performant HTTP implementations. Namely, we want to be able to handle hundreds of thousands of concurrent connections and serve all of them with high throughput and low latency. Many web servers are currently competing for the , but can we do the same?
To handle many connections concurrently without losing performance, you will have to either implement a consumer algorithm or implement your HTTP server without using any threads at all, instead using the interface. Epoll is a high-performance polling interface used in most top-tier web servers. You may choose which implementation you want to use, however, we expect a thread pool system to be cleverly designed to reuse threads after a connection closes.
To test the performance of your HTTP server you want to create several thousand connections in parallel and time your server's response time. If your server cannot handle many connections in parallel the wait time will be longer. The time
command will help us determine the execution time of all NUM_CONNECTIONS
parallel requests. We will run the command in parallel by using the &
control operator.