Webserv is a custom HTTP/1.1 web server implementation written in C++98. This project aims to provide a deep understanding of the HTTP protocol by building a fully functional web server from scratch. The server handles multiple simultaneous client connections using non-blocking I/O and the poll() system call, supports various HTTP methods, serves static files, executes CGI scripts, and provides comprehensive configuration through custom configuration files.
This implementation follows the HTTP/1.1 specification and provides features similar to production web servers like NGINX, including virtual hosts, location blocks, custom error pages, file uploads, and automatic directory listings.
• HTTP/1.1 Protocol: Complete implementation of HTTP/1.1 with support for persistent connections
• Multiple Request Methods: GET, POST, and DELETE methods with proper request body handling
• Configuration Files: NGINX-style configuration with server blocks, location directives, and various settings
• Virtual Hosts: Support for multiple server names and ports on the same instance
• CGI Support: Execute CGI scripts (Python, Ruby, etc.) with proper environment variable handling
• File Operations: Upload, download, and delete files with multipart/form-data support
• Directory Listing: Automatic index generation (autoindex) for directories
• Non-blocking I/O: Efficient handling of multiple connections using poll() with no blocking operations
• Error Handling: Custom error pages and comprehensive HTTP status code support
• Request Body Limits: Configurable maximum client body size to prevent memory overflow
• Redirections: HTTP redirections with custom status codes (301, 302, etc.)
• Range Requests: Support for partial content delivery (HTTP 206)
• Request Timeout: Automatic timeout handling for slow or stalled clients
- C++ compiler with C++98 support (g++ or clang++)
- UNIX-like operating system (Linux, macOS)
- Make build tool
git clone https://github.com/whoismtrx/42_webserv.git webserv
cd webserv
makeCreate a configuration file or use the default one in conf/default.conf:
server {
listen 127.0.0.1:8080;
server_name localhost;
root /var/www/html;
client_max_body_size 20M;
error_page 404 /error/404.html;
error_page 500 502 503 504 /error/50x.html;
location / {
methods GET POST DELETE;
root /var/www/html;
index index.html index.htm;
autoindex on;
}
location /uploads {
methods GET POST DELETE;
root /var/www/uploads;
autoindex off;
}
location /cgi-bin {
methods GET POST;
root /var/www/cgi-bin;
cgi .py /usr/bin/python3;
cgi .rb /usr/bin/ruby;
}
}# Run with default configuration
./webServer
# Run with custom configuration
./webServer path/to/config.confThe server will start and listen on the configured host:port combinations.
Test the server with various tools:
# Test with curl
curl http://localhost:8080/
curl -X POST -F "file=@test.txt" http://localhost:8080/upload
curl -X DELETE http://localhost:8080/file.txt
# Test with browser
open http://localhost:8080
# Stress testing with siege
siege -c 100 -t 30s http://localhost:8080/
# Test with Postman or similar tools1. Server Management (server/)
Server: Main server orchestrator managing multiple HTTP server instancesHttpServer: Individual server instance bound to a specific host:port combination- Uses
poll()for multiplexed I/O operations across all connections
2. Configuration Parsing (configfile/)
configFile: Main configuration parser and validatorserverData: Server block configuration (listen, server_name, root, etc.)Location: Location block configuration (methods, root, index, CGI, etc.)HttpStatusPars: HTTP status code mapping and error page handling
3. HTTP Request/Response (HttpRequest/)
HttpRequest: Request parsing and validationHttpRequestParse: HTTP header and request line parsingHttpResponse: Response generation with appropriate headersMethods: GET, POST, DELETE method implementationsContent: Content-Type detection and file servingDelete: File deletion logicMatchingLocation: Location matching algorithmhtml: HTML generation for directory listings and error pages
4. Utilities (utils.cpp/hpp)
- Helper functions for string manipulation, file operations, and validation
The server uses a sophisticated configuration parser that reads NGINX-style configuration files. The parsing process involves multiple stages:
1. File Reading and Validation
- Configuration file is read into memory as a single string
- Validates file existence and readability
- Checks for balanced braces and proper syntax
2. Server Block Extraction
- Identifies and extracts individual
server { }blocks - Each server block is parsed independently
- Supports multiple server blocks in a single configuration file
3. Directive Parsing
- listen: Validates IP:port format, checks port range (0-65535), validates IP address format
- server_name: Parses multiple server names (virtual hosts)
- root: Validates directory path existence
- client_max_body_size: Parses size units (K, M, G) and converts to bytes
- error_page: Maps HTTP status codes to custom error page paths
4. Location Block Parsing
- Extracts nested
location { }blocks within server blocks - Determines location type (exact match, prefix match)
- Parses location-specific directives:
methods: Validates HTTP method names (GET, POST, DELETE)root: Location-specific document rootindex: List of default index filesautoindex: Boolean flag for directory listingscgi: CGI extension to interpreter path mappingreturn: HTTP redirect status code and URL
5. Configuration Validation
- Ensures required directives are present (listen is mandatory)
- Checks for conflicting directives
- Validates directory paths and file existence
- Verifies CGI interpreter paths are executable
6. Data Structure Organization
- Builds internal data structures for fast lookup:
- Map of ports to server configurations
- Map of (host, port) pairs to server data
- Map of server names to server instances
- Enables efficient request routing to correct server block
Config File → Read → Tokenize → Extract Server Blocks
↓
Parse Directives (listen, server_name, root, etc.)
↓
Extract Location Blocks
↓
Parse Location Directives
↓
Validate Configuration
↓
Build Internal Data Structures
The server implements a comprehensive HTTP request parser that handles the complete request lifecycle:
1. Request Line Parsing
- Extracts HTTP method (GET, POST, DELETE)
- Parses request URI and separates path from query string
- Validates HTTP version (HTTP/1.1, HTTP/1.0)
- Example:
GET /path/to/resource?key=value HTTP/1.1
2. Header Parsing
- Reads headers line by line until empty line (
\r\n\r\n) - Splits each header into key-value pairs
- Normalizes header names (case-insensitive)
- Stores headers in a map for efficient lookup
Key Headers Processed:
Host: Server name and port (for virtual host routing)Content-Length: Size of request bodyContent-Type: Body format (form data, JSON, multipart, etc.)Transfer-Encoding: Chunked transfer encoding supportRange: Partial content requests (bytes=start-end)Cookie: Session and state informationConnection: Keep-alive or closeUser-Agent: Client identification
3. Body Reading
- Determined by
Content-LengthorTransfer-Encoding: chunked - Non-blocking incremental reading for large bodies
- Validates body size against
client_max_body_sizelimit - Handles different content types:
a) Multipart Form Data
- Extracts boundary from Content-Type header
- Parses multiple parts separated by boundary markers
- Extracts filename, content-type, and binary data for file uploads
- Supports multiple file uploads in a single request
b) URL-Encoded Form Data
- Parses
application/x-www-form-urlencodedformat - Decodes percent-encoded characters (%20, %2F, etc.)
- Splits key=value pairs separated by &
c) Raw Binary Data
- Stores body as-is for CGI processing or custom handling
4. URI Processing
- Separates URI into path and query string
- Decodes percent-encoded characters in path
- Normalizes path (removes .., ., multiple slashes)
- Validates against directory traversal attacks
5. Location Matching
- Iterates through location blocks to find the best match
- Exact match: Location path equals request path exactly
- Prefix match: Location path is a prefix of request path
- Selects the longest matching location
- Falls back to server root if no location matches
6. Request Validation
- Checks if HTTP method is allowed for the matched location
- Validates Content-Length doesn't exceed limits
- Verifies request format is valid (no malformed headers)
- Returns appropriate error codes for invalid requests:
- 400 Bad Request: Malformed syntax
- 405 Method Not Allowed: Method not permitted
- 413 Payload Too Large: Body too large
- 414 URI Too Long: URI exceeds limits
- 505 HTTP Version Not Supported: Invalid version
Raw Socket Data → Read Line → Parse Request Line (Method, URI, Version)
↓
Parse Headers (Key: Value pairs)
↓
Detect Body (Content-Length / Chunked)
↓
Read Body Incrementally (Non-blocking)
↓
Parse Body (Multipart / URL-encoded / Raw)
↓
Match Location Block
↓
Validate Request (Method, Size, Format)
↓
Route to Request Handler (GET/POST/DELETE)
- Each connection has a timestamp updated on data receipt
- If no data received for 8 seconds, connection times out
- Server generates 408 Request Timeout response
- Connection is closed to free resources
The parser supports chunked transfer encoding for requests without known Content-Length:
- Read chunk size in hexadecimal
- Read chunk data
- Repeat until chunk size is 0 (end marker)
- Read optional trailing headers
- Accept Connection: Server accepts incoming connection on listening socket
- Read Request: Non-blocking read of HTTP request data
- Parse Request: Parse request line, headers, and body
- Match Location: Find matching location block based on URI
- Validate Method: Check if method is allowed for the location
- Process Request:
- GET: Serve file, generate listing, or execute CGI
- POST: Handle upload or execute CGI with POST data
- DELETE: Remove file from filesystem
- Generate Response: Create HTTP response with appropriate headers
- Send Response: Non-blocking send of response data
- Close Connection: Clean up and close socket
The server uses poll() for efficient multiplexed I/O:
// Poll events
POLLIN - Data available for reading
POLLOUT - Socket ready for writing
POLLHUP - Connection closed by peer
POLLERR - Error conditionAdvantages:
- Single thread handles all connections
- No blocking on I/O operations
- Efficient resource utilization
- Scalable to hundreds of connections
The server implements comprehensive error handling:
Client Errors (4xx):
- 400 Bad Request: Malformed request
- 403 Forbidden: Permission denied
- 404 Not Found: Resource not found
- 405 Method Not Allowed: Method not permitted
- 408 Request Timeout: Client timeout (8 seconds)
- 413 Payload Too Large: Body exceeds limit
Server Errors (5xx):
- 500 Internal Server Error: Server-side error
- 501 Not Implemented: Unsupported feature
- 505 HTTP Version Not Supported: Invalid HTTP version
Custom error pages can be configured per server block.
• HTTP/1.1 Specification (RFC 2616)
• CGI Specification (RFC 3875)
• Beej's Guide to Network Programming
• Webserv request flow
• NGINX Configuration Guide
• The NGINX Handbook
webserv/
├── conf/
│ ├── default.conf # Default configuration file
│ ├── conf.conf # Alternative configuration
│ ├── file.conf # File-specific configuration
│ └── httpStatusCodes.conf # HTTP status code mappings
├── configfile/
│ ├── configFile.cpp/hpp # Main configuration parser
│ ├── serverData.cpp/hpp # Server block parser
│ ├── location.cpp/hpp # Location block parser
│ └── HttpStatusPars.cpp/hpp # Status code handler
├── HttpRequest/
│ ├── HttpRequest.cpp/hpp # Request handler
│ ├── HttpRequestParse.cpp # Request parsing
│ ├── HttpResponse.cpp # Response generation
│ ├── Methods.cpp # HTTP methods implementation
│ ├── Content.cpp # Content handling
│ ├── Delete.cpp # DELETE method
│ ├── MatchingLocation.cpp # Location matching
│ └── html.cpp # HTML generation
├── server/
│ ├── Server.cpp/hpp # Server orchestrator
│ └── HttpServer.cpp/hpp # HTTP server instance
├── main.cpp # Entry point
├── utils.cpp/hpp # Utility functions
└── Makefile # Build configuration
This repository is for educational purposes only, documenting my work on the 42 curriculum. These solutions are intended as a reference for students who have already completed or are actively working on the project.