Skip to content

whoismtrx/42_webserv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webserv

Overview

Webserv is a custom HTTP/1.1 web server implementation written in C++98. This project aims to provide a deep understanding of the HTTP protocol by building a fully functional web server from scratch. The server handles multiple simultaneous client connections using non-blocking I/O and the poll() system call, supports various HTTP methods, serves static files, executes CGI scripts, and provides comprehensive configuration through custom configuration files.

This implementation follows the HTTP/1.1 specification and provides features similar to production web servers like NGINX, including virtual hosts, location blocks, custom error pages, file uploads, and automatic directory listings.

Key Features

HTTP/1.1 Protocol: Complete implementation of HTTP/1.1 with support for persistent connections
Multiple Request Methods: GET, POST, and DELETE methods with proper request body handling
Configuration Files: NGINX-style configuration with server blocks, location directives, and various settings
Virtual Hosts: Support for multiple server names and ports on the same instance
CGI Support: Execute CGI scripts (Python, Ruby, etc.) with proper environment variable handling
File Operations: Upload, download, and delete files with multipart/form-data support
Directory Listing: Automatic index generation (autoindex) for directories
Non-blocking I/O: Efficient handling of multiple connections using poll() with no blocking operations
Error Handling: Custom error pages and comprehensive HTTP status code support
Request Body Limits: Configurable maximum client body size to prevent memory overflow
Redirections: HTTP redirections with custom status codes (301, 302, etc.)
Range Requests: Support for partial content delivery (HTTP 206)
Request Timeout: Automatic timeout handling for slow or stalled clients

Getting Started

Prerequisites

  • C++ compiler with C++98 support (g++ or clang++)
  • UNIX-like operating system (Linux, macOS)
  • Make build tool

Compilation

git clone https://github.com/whoismtrx/42_webserv.git webserv
cd webserv
make

Configuration

Create a configuration file or use the default one in conf/default.conf:

server {
    listen 127.0.0.1:8080;
    server_name localhost;
    root /var/www/html;
    client_max_body_size 20M;
    error_page 404 /error/404.html;
    error_page 500 502 503 504 /error/50x.html;

    location / {
        methods GET POST DELETE;
        root /var/www/html;
        index index.html index.htm;
        autoindex on;
    }

    location /uploads {
        methods GET POST DELETE;
        root /var/www/uploads;
        autoindex off;
    }

    location /cgi-bin {
        methods GET POST;
        root /var/www/cgi-bin;
        cgi .py /usr/bin/python3;
        cgi .rb /usr/bin/ruby;
    }
}

Running the Server

# Run with default configuration
./webServer

# Run with custom configuration
./webServer path/to/config.conf

The server will start and listen on the configured host:port combinations.

Testing

Test the server with various tools:

# Test with curl
curl http://localhost:8080/
curl -X POST -F "file=@test.txt" http://localhost:8080/upload
curl -X DELETE http://localhost:8080/file.txt

# Test with browser
open http://localhost:8080

# Stress testing with siege
siege -c 100 -t 30s http://localhost:8080/

# Test with Postman or similar tools

Architecture

Core Components

1. Server Management (server/)

  • Server: Main server orchestrator managing multiple HTTP server instances
  • HttpServer: Individual server instance bound to a specific host:port combination
  • Uses poll() for multiplexed I/O operations across all connections

2. Configuration Parsing (configfile/)

  • configFile: Main configuration parser and validator
  • serverData: Server block configuration (listen, server_name, root, etc.)
  • Location: Location block configuration (methods, root, index, CGI, etc.)
  • HttpStatusPars: HTTP status code mapping and error page handling

3. HTTP Request/Response (HttpRequest/)

  • HttpRequest: Request parsing and validation
  • HttpRequestParse: HTTP header and request line parsing
  • HttpResponse: Response generation with appropriate headers
  • Methods: GET, POST, DELETE method implementations
  • Content: Content-Type detection and file serving
  • Delete: File deletion logic
  • MatchingLocation: Location matching algorithm
  • html: HTML generation for directory listings and error pages

4. Utilities (utils.cpp/hpp)

  • Helper functions for string manipulation, file operations, and validation

Configuration File Parsing

The server uses a sophisticated configuration parser that reads NGINX-style configuration files. The parsing process involves multiple stages:

Parsing Architecture

1. File Reading and Validation

  • Configuration file is read into memory as a single string
  • Validates file existence and readability
  • Checks for balanced braces and proper syntax

2. Server Block Extraction

  • Identifies and extracts individual server { } blocks
  • Each server block is parsed independently
  • Supports multiple server blocks in a single configuration file

3. Directive Parsing

  • listen: Validates IP:port format, checks port range (0-65535), validates IP address format
  • server_name: Parses multiple server names (virtual hosts)
  • root: Validates directory path existence
  • client_max_body_size: Parses size units (K, M, G) and converts to bytes
  • error_page: Maps HTTP status codes to custom error page paths

4. Location Block Parsing

  • Extracts nested location { } blocks within server blocks
  • Determines location type (exact match, prefix match)
  • Parses location-specific directives:
    • methods: Validates HTTP method names (GET, POST, DELETE)
    • root: Location-specific document root
    • index: List of default index files
    • autoindex: Boolean flag for directory listings
    • cgi: CGI extension to interpreter path mapping
    • return: HTTP redirect status code and URL

5. Configuration Validation

  • Ensures required directives are present (listen is mandatory)
  • Checks for conflicting directives
  • Validates directory paths and file existence
  • Verifies CGI interpreter paths are executable

6. Data Structure Organization

  • Builds internal data structures for fast lookup:
    • Map of ports to server configurations
    • Map of (host, port) pairs to server data
    • Map of server names to server instances
  • Enables efficient request routing to correct server block

Example Parsing Flow

Config File → Read → Tokenize → Extract Server Blocks
                                        ↓
                        Parse Directives (listen, server_name, root, etc.)
                                        ↓
                        Extract Location Blocks
                                        ↓
                        Parse Location Directives
                                        ↓
                        Validate Configuration
                                        ↓
                        Build Internal Data Structures

HTTP Request Parsing

The server implements a comprehensive HTTP request parser that handles the complete request lifecycle:

Request Parsing Architecture

1. Request Line Parsing

  • Extracts HTTP method (GET, POST, DELETE)
  • Parses request URI and separates path from query string
  • Validates HTTP version (HTTP/1.1, HTTP/1.0)
  • Example: GET /path/to/resource?key=value HTTP/1.1

2. Header Parsing

  • Reads headers line by line until empty line (\r\n\r\n)
  • Splits each header into key-value pairs
  • Normalizes header names (case-insensitive)
  • Stores headers in a map for efficient lookup

Key Headers Processed:

  • Host: Server name and port (for virtual host routing)
  • Content-Length: Size of request body
  • Content-Type: Body format (form data, JSON, multipart, etc.)
  • Transfer-Encoding: Chunked transfer encoding support
  • Range: Partial content requests (bytes=start-end)
  • Cookie: Session and state information
  • Connection: Keep-alive or close
  • User-Agent: Client identification

3. Body Reading

  • Determined by Content-Length or Transfer-Encoding: chunked
  • Non-blocking incremental reading for large bodies
  • Validates body size against client_max_body_size limit
  • Handles different content types:

a) Multipart Form Data

  • Extracts boundary from Content-Type header
  • Parses multiple parts separated by boundary markers
  • Extracts filename, content-type, and binary data for file uploads
  • Supports multiple file uploads in a single request

b) URL-Encoded Form Data

  • Parses application/x-www-form-urlencoded format
  • Decodes percent-encoded characters (%20, %2F, etc.)
  • Splits key=value pairs separated by &

c) Raw Binary Data

  • Stores body as-is for CGI processing or custom handling

4. URI Processing

  • Separates URI into path and query string
  • Decodes percent-encoded characters in path
  • Normalizes path (removes .., ., multiple slashes)
  • Validates against directory traversal attacks

5. Location Matching

  • Iterates through location blocks to find the best match
  • Exact match: Location path equals request path exactly
  • Prefix match: Location path is a prefix of request path
  • Selects the longest matching location
  • Falls back to server root if no location matches

6. Request Validation

  • Checks if HTTP method is allowed for the matched location
  • Validates Content-Length doesn't exceed limits
  • Verifies request format is valid (no malformed headers)
  • Returns appropriate error codes for invalid requests:
    • 400 Bad Request: Malformed syntax
    • 405 Method Not Allowed: Method not permitted
    • 413 Payload Too Large: Body too large
    • 414 URI Too Long: URI exceeds limits
    • 505 HTTP Version Not Supported: Invalid version

Request Parsing Flow

Raw Socket Data → Read Line → Parse Request Line (Method, URI, Version)
                                        ↓
                        Parse Headers (Key: Value pairs)
                                        ↓
                        Detect Body (Content-Length / Chunked)
                                        ↓
                        Read Body Incrementally (Non-blocking)
                                        ↓
                        Parse Body (Multipart / URL-encoded / Raw)
                                        ↓
                        Match Location Block
                                        ↓
                        Validate Request (Method, Size, Format)
                                        ↓
                        Route to Request Handler (GET/POST/DELETE)

Timeout Handling

  • Each connection has a timestamp updated on data receipt
  • If no data received for 8 seconds, connection times out
  • Server generates 408 Request Timeout response
  • Connection is closed to free resources

Chunked Transfer Encoding

The parser supports chunked transfer encoding for requests without known Content-Length:

  1. Read chunk size in hexadecimal
  2. Read chunk data
  3. Repeat until chunk size is 0 (end marker)
  4. Read optional trailing headers

Request Processing Flow

  1. Accept Connection: Server accepts incoming connection on listening socket
  2. Read Request: Non-blocking read of HTTP request data
  3. Parse Request: Parse request line, headers, and body
  4. Match Location: Find matching location block based on URI
  5. Validate Method: Check if method is allowed for the location
  6. Process Request:
    • GET: Serve file, generate listing, or execute CGI
    • POST: Handle upload or execute CGI with POST data
    • DELETE: Remove file from filesystem
  7. Generate Response: Create HTTP response with appropriate headers
  8. Send Response: Non-blocking send of response data
  9. Close Connection: Clean up and close socket

Non-blocking I/O Architecture

The server uses poll() for efficient multiplexed I/O:

// Poll events
POLLIN  - Data available for reading
POLLOUT - Socket ready for writing
POLLHUP - Connection closed by peer
POLLERR - Error condition

Advantages:

  • Single thread handles all connections
  • No blocking on I/O operations
  • Efficient resource utilization
  • Scalable to hundreds of connections

Error Handling

The server implements comprehensive error handling:

Client Errors (4xx):

  • 400 Bad Request: Malformed request
  • 403 Forbidden: Permission denied
  • 404 Not Found: Resource not found
  • 405 Method Not Allowed: Method not permitted
  • 408 Request Timeout: Client timeout (8 seconds)
  • 413 Payload Too Large: Body exceeds limit

Server Errors (5xx):

  • 500 Internal Server Error: Server-side error
  • 501 Not Implemented: Unsupported feature
  • 505 HTTP Version Not Supported: Invalid HTTP version

Custom error pages can be configured per server block.

Resources

HTTP/1.1 Specification (RFC 2616)CGI Specification (RFC 3875)
Beej's Guide to Network Programming
Webserv request flow
NGINX Configuration Guide
The NGINX Handbook

Project Structure

webserv/
├── conf/
│   ├── default.conf          # Default configuration file
│   ├── conf.conf             # Alternative configuration
│   ├── file.conf             # File-specific configuration
│   └── httpStatusCodes.conf  # HTTP status code mappings
├── configfile/
│   ├── configFile.cpp/hpp    # Main configuration parser
│   ├── serverData.cpp/hpp    # Server block parser
│   ├── location.cpp/hpp      # Location block parser
│   └── HttpStatusPars.cpp/hpp # Status code handler
├── HttpRequest/
│   ├── HttpRequest.cpp/hpp   # Request handler
│   ├── HttpRequestParse.cpp  # Request parsing
│   ├── HttpResponse.cpp      # Response generation
│   ├── Methods.cpp           # HTTP methods implementation
│   ├── Content.cpp           # Content handling
│   ├── Delete.cpp            # DELETE method
│   ├── MatchingLocation.cpp  # Location matching
│   └── html.cpp              # HTML generation
├── server/
│   ├── Server.cpp/hpp        # Server orchestrator
│   └── HttpServer.cpp/hpp    # HTTP server instance
├── main.cpp                  # Entry point
├── utils.cpp/hpp             # Utility functions
└── Makefile                  # Build configuration

Disclaimer

This repository is for educational purposes only, documenting my work on the 42 curriculum. These solutions are intended as a reference for students who have already completed or are actively working on the project.

About

HTTP/1.1 web server implementation in C++98 with non-blocking I/O, CGI support, and NGINX-style configuration. Handles multiple simultaneous connections using poll(), supports GET/POST/DELETE methods, virtual hosts, file uploads, and custom error pages.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors