Wednesday, July 9, 2014

mahimahi - webtools

A set of lightweight tools for browser developers, website authors, and Web server developers to produce a usable benchmark for Web transport protocols and browser behaviors. Mahimahi can be used to record actual websites and replay them over various emulated link conditions. Mahimahi is free software and is available on Ubuntu (version 13.10 or higher). http://mahimahi.mit.edu/#about Getting Mahimahi Ubuntu 13.04 and later $ sudo add-apt-repository ppa:keithw/mahimahi $ sudo apt-get update $ sudo apt-get install mahimahi Operating system logos are trademarks or registered trademarks and are displayed for identification only. The vendors shown aren't affiliated with and haven't endorsed Mahimahi. Building from source $ git clone https://github.com/ravinet/mahimahi $ cd mahimahi $ ./autogen.sh $ ./configure $ make # sudo make install Dependencies Name Typical package Protocol Buffers protobuf-compiler, libprotobuf-dev autotools autotools-dev autoreconf dh-autoreconf iptables iptables pkg-config pkg-config dnsmasq dnsmasq Apache2 apache2 debhelper (7.0.50 or later) debhelper OpenSSL libssl-dev, ssl-cert Details Delayshell Delayshell uses clone() to fork off a new shell in a distinct network namespace. All packets to and from an application running inside delayshell are stored in a packet queue (one per direction). When a packet arrives at a queue, it is assigned a delivery time which is the sum of its arrival time and a user-specified fixed one-way delay. Packets are released from the queue at their delivery time. This technique enforces a fixed delay on a per-packet basis. Linkshell linkshell uses clone() to fork off a new shell in a distinct network namespace. linkshell uses packet delivery traces to emulate both time-varying links (eg. cellular) and fixed-rate links (eg. 12Mbps). When a packet arrives at the link, it is directly placed into one of two packet queues depending on its intended direction. linkshell releases packets from each queue based on the corresponding input packet-delivery trace. Linkshell interprets trace files such that each line in the trace represents a packet delivery opportunity: the time at which 1500 bytes can be sent. Accounting is done at the byte-level. Thus, a single line in the trace file can correspond to the delivery of several packets whose sizes sum to 1500 bytes. Delivery opportunities are wasted if bytes are unavailable at the instant of an opportunity. When linkshell reaches the end of an input trace file, it wraps around to the beginning of the trace file and resets it's base timestamp value. Recordshell recordshell uses clone() to fork off a new shell in a distinct network namespace. recordshell adds a routing table rule to forward all TCP traffic from inside recordshell to a man-in-the-middle proxy. The proxy accepts each TCP connection request from programs run inside recordshell. It then determines the original destination address of the TCP connection request and connects to the original destination on the program's behalf. The proxy runs an HTTP parser which parses all TCP traffic to and from a program running in recordshell to differentiate HTTP requests and responses from TCP segments. Once an HTTP request and its corresponding response have been parsed, the proxy stores them as a request-response pair. recordshell handles SSL traffic similarly using OpenSSL. To establish a connection with the program running inside recordshell, the proxy uses a fake self-signed certificate with the common name ’Mahimahi’. The proxy also makes an SSL connection request to the original destination. Since a self signed certificate is used (it is not signed by a browser-trusted Certificate Authority), the user must add an exception when loading a page in recordshell using HTTPS. At the end of a recording session, the user-specified record folder consists of a set of files: one for each request-response pair seen during that recording session. Each file is stored using Google Protobufs and includes the complete HTTP request and response (headers and body) as well as the connection's original destination ip address and port number, and the protocol used (HTTP or HTTPS).recordshell is compatible with any unmodified browser because recording is done at the packet level. The figure below illustrates the actions taken by recordshell to record a Web page: 1. Inside recordshell, a browser is used to load a Web page. 2. All HTTP requests from the browser are sent to the HTTP proxy running outside recordshell's isolated network namespace 3. These HTTP requests are sent to their original destinations in the Internet and the corresponding responses are sent back to the HTTP proxy 4. For each HTTP response received, the HTTP proxy sends the response back to the browser and sends the corresponding HTTP request-response pair to the Recorded site folder Replayshell replayshell uses unshare(CLONE_NEWNET) to create a shell in an isolated network namespace. This isolation prevents the accidental download of resources over the Internet when replaying page loads. Unlike Google's Web-Page-Replay, replayshell mimics the multi-origin nature of most web sites today by creating an Apache Web server for each distinct IP/port pair seen while recording. This prevents the browser from using a single TCP connection to download resources belonging to different web servers (impossible due to the point-to-point nature of TCP). replayshell binds its web servers to the same IP address and port number as their recorded counterparts using dummy network interfaces. All HTTP requests made within replayshell are handled by one of replayshell's servers, each of which can access to the user-specified recorded folder. Each server is configured to use a CGI script, called Replay Server, which compares incoming HTTP requests to the set of all recorded request-response pairs to locate a matching HTTP response. replayshell considers the Host, Accept-Language, Accept-Encoding, Connection, and Referer HTTP header fields, and ignores time-sensitive header fields (eg. If-Modified-Since, Cookie, If-Unmodified-Since) as they may change from run-to-run. Replayshell handles query strings by first matching object names (ignoring query strings) and then finding the longest matched query string amongst the matching object names. If an requested object has no match in the user-specified recorded folder, Replay Server returns an HTTP "Connection: close". Currently, replayshell supports HTTP and can be modified to support SPDY (using mod_spdy 0.9.3.3-386). HTTPS traffic is handled using Apache's mod_ssl. The figure below illustrates the actions taken by replayshell to replay a recorded Web page: 1. Inside replayshell, a browser is used to load a recorded Web page. 2. Apache servers are set up inside the isolated network namespace to mimic the Web page's actual structure and sharding 3. All HTTP requests from the browser are sent to the appropriate Apache server (based on destination IP address) 4. The Apache servers each execute the Replay Server CGI script to handle incoming HTTP requests 5. Replay Server's have access to the recorded site and find the matching HTTP request and corresponding HTTP response 6. Apache server's send the matching HTTP response back to the browser Example Commands Emulating a link with 100ms minimum RTT $ delayshell 50 All programs run inside this shell will have a 100ms round-trip delay for all their packets Creating a trace file for a 3Mbps link 0 4 8 . . . Since each line in a trace file represents a packet delivery opportunity, you must first convert 3Mbps into ms/packet. We assume a 1500 byte packet size. This yields 4 ms/packet. The trace file should thus increment each line by 4. The trace file should have at least two lines and can be arbitrarily large. Emulating a 12Mbps link $ linkshell 12Mbps_trace 12Mbps_trace All programs run inside this shell will have their packets traverse the link according to the input trace files. Since the desired link is 12Mbps in each direction, the same trace file can be used for the uplink and downlink. To emulate a 12Mbps link, the trace file should increment each line by 1 (representing a millisecond between packet delivery opportunities). It should start with 0, 1,..., and be of any length because linkshell, upon reaching the end of a tracefile, continues from the beginning of the trace file Record a Web session into the /bome/recorded_site/ directory $ recordshell /home/recorded_site/ All pages loaded from inside this shell will be recorded into /home/recorded_site/ as HTTP request/response pairs Replay a Web session using the recorded content in /home/recorded_site/ $ replayshell /home/recorded_site/ All pages loaded when recording into "/home/recorded_site/" can be loaded from this shell. No other pages can be loaded. Replay a Web session using the recorded content in /home/recorded_site/ over an emulated 12Mbps link with 100ms minimum RTT $ delayshell 50 [delay 10 ms] $ linkshell 12Mbps_trace 12Mbps_trace [delay 10 ms] [link, up=12Mbps_trace.txt, down=12Mbps_trace.txt] $ replayshell /home/recorded_site All page loads from this inner-most shell will have their packets sent according to the input 12Mbps traces and delayed with a 100ms round-trip delay Exiting To exit any shell, simply use exit. When nesting shells, exit will only exit the inner-most shell. You can verify that you have exited a given shell by checking that there is no prompt in the terminal (eg. [delay 10 ms] ). Replayshell has no prompt associated with it. To verify you have exited replayshell, you can use ifconfig and verify that there are no network interfaces named "shardedX" (where X is is the sharded server's identification number). Manual More details can be found in the delayshell(1), linkshell(1), recordshell(1), and replayshell(1) manual pages. Contact Mahimahi was written by Ravi Netravali, Anirudh Sivaraman, and Keith Winstein. For any questions or suggestions, please e-mail: mahimahi@mit.edu

1 comment:

bee-social