Friday, July 25, 2014

ovs-ofctl commands on OpenFlow 1.3 Mininet switch (ovsk)

ovs-ofctl commands on OpenFlow 1.3 Mininet switch (ovsk)

ovs−ofctl program is a command line tool for monitoring and administering OpenFlow switches. It can also show the current state of an OpenFlow switch, including features, configuration, and table entries. It should work with any OpenFlow switch, not just Open vSwitch.

Before pushing the flows we need to start mininet switch. using below command(also shown in snapshot).
sudo mn --topo single,2 --controller remote,ip=192.168.56.103:6653 --switch ovsk,protocols=OpenFlow13
where,
192.168.56.103 is openflowplugin Controllers IP Address and protocols=OpenFlow13 states that we need to use OpenFlow protocol version 1.3, tcp/6653 is used for OF1.3 communication and 6633 for OF1.0.
Point to note here, Mininet and Controller are running on different Virtual Machines.


 If the above command is successfully executed we should see OF1.3 communication between OVSK(switch s1 here) and SDN Controller.
Flows can be added as
sudo ovs-ofctl -O Openflow13 add-flow s1 in_port=1,actions=nw_ttl:2,output:2

sudo ovs-ofctl -O OpenFlow13 add-flow s1 priority=11,dl_type=0x0800,nw_src=10.0.0.1,action=mod_tp_dst:8888

If the above command is successfully configured on OVSK we should successfully dump flows.
mininet@mininet-vm:~$ sudo ovs-ofctl -O OpenFlow13 dump-flows s1
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=7.443s, table=0, n_packets=0, n_bytes=0, priority=11,ip,nw_src=10.0.0.1 actions=mod_tp_dst:8888


ovs-ofctl connects to an OpenFlow switch using ssl, tcp(ip and port), socket file, unix file etc. ovs-ofctl talks to ovs-vswitchd, and ovs-vsctl talks to ovsdb-server.

Detailed options can be found at
http://openvswitch.org/cgi-bin/ovsman.cgi?page=utilities%2Fovs-ofctl.8

Big Switch SDN Fabric




http://bigswitch.com/blog/2014/07/22/announcing-big-cloud-fabric-the-first-data-center-bare-metal-sdn-fabric


http://www.amazon.com/The-Big-Switch-Rewiring-Edison/dp/039334522X/ref=cm_cr_pr_pb_t






Screen Shot 2014-07-22 at 8.59.41 AM



Next Generation Monitoring Fabric diagram

Wednesday, July 23, 2014

Tcpdump usage examples

Tcpdump usage examples

March 13, 2010
In most cases you will need root permission to be able to capture packets on an interface. Using tcpdump (with root) to capture the packets and saving them to a file to analyze with Wireshark (using a regular account) is recommended over using Wireshark with a root account to capture packets on an "untrusted" interface. See the Wireshark security advisories for reasons why.
See the list of interfaces on which tcpdump can listen:
tcpdump -D
Listen on interface eth0:
tcpdump -i eth0
Listen on any available interface (cannot be done in promiscuous mode. Requires Linux kernel 2.2 or greater):
tcpdump -i any
Be verbose while capturing packets:
tcpdump -v
Be more verbose while capturing packets:
tcpdump -vv
Be very verbose while capturing packets:
tcpdump -vvv
Be less verbose (than the default) while capturing packets:
tcpdump -q
Limit the capture to 100 packets:
tcpdump -c 100
Record the packet capture to a file called capture.cap:
tcpdump -w capture.cap
Record the packet capture to a file called capture.cap but display on-screen how many packets have been captured in real-time:
tcpdump -v -w capture.cap
Display the packets of a file called capture.cap:
tcpdump -r capture.cap
Display the packets using maximum detail of a file called capture.cap:
tcpdump -vvv -r capture.cap
Display IP addresses and port numbers instead of domain and service names when capturing packets:
tcpdump -n
Capture any packets where the destination host is 192.168.1.1. Display IP addresses and port numbers:
tcpdump -n dst host 192.168.1.1
Capture any packets where the source host is 192.168.1.1. Display IP addresses and port numbers:
tcpdump -n src host 192.168.1.1
Capture any packets where the source or destination host is 192.168.1.1. Display IP addresses and port numbers:
tcpdump -n host 192.168.1.1
Capture any packets where the destination network is 192.168.1.0/24. Display IP addresses and port numbers:
tcpdump -n dst net 192.168.1.0/24
Capture any packets where the source network is 192.168.1.0/24. Display IP addresses and port numbers:
tcpdump -n src net 192.168.1.0/24
Capture any packets where the source or destination network is 192.168.1.0/24. Display IP addresses and port numbers:
tcpdump -n net 192.168.1.0/24
Capture any packets where the destination port is 23. Display IP addresses and port numbers:
tcpdump -n dst port 23
Capture any packets where the destination port is is between 1 and 1023 inclusive. Display IP addresses and port numbers:
tcpdump -n dst portrange 1-1023
Capture only TCP packets where the destination port is is between 1 and 1023 inclusive. Display IP addresses and port numbers:
tcpdump -n tcp dst portrange 1-1023
Capture only UDP packets where the destination port is is between 1 and 1023 inclusive. Display IP addresses and port numbers:
tcpdump -n udp dst portrange 1-1023
Capture any packets with destination IP 192.168.1.1 and destination port 23. Display IP addresses and port numbers:
tcpdump -n "dst host 192.168.1.1 and dst port 23"
Capture any packets with destination IP 192.168.1.1 and destination port 80 or 443. Display IP addresses and port numbers:
tcpdump -n "dst host 192.168.1.1 and (dst port 80 or dst port 443)"
Capture any ICMP packets:
tcpdump -v icmp
Capture any ARP packets:
tcpdump -v arp
Capture either ICMP or ARP packets:
tcpdump -v "icmp or arp"
Capture any packets that are broadcast or multicast:
tcpdump -n "broadcast or multicast"
Capture 500 bytes of data for each packet rather than the default of 68 bytes:
tcpdump -s 500
Capture all bytes of data within the packet:
tcpdump -s 0

RYU SDN Framework Installation



Installing RYU

 % pip install ryu
or
% git clone git://github.com/osrg/ryu.git
% cd ryu; python ./setup.py install 

Issues faced
#sudo ryu-manager --verbose --observe-links ryu.app.ws_topology
Traceback (most recent call last):
  File "/usr/local/bin/ryu-manager", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2707, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 686, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 584, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: six>=1.4.0

#ryu -version
Traceback (most recent call last):
  File "/usr/local/bin/ryu", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2707, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 686, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 584, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: six>=1.4.0


Solution
#sudo easy_install Distribute
Searching for Distribute
Best match: distribute 0.6.24dev-r0
Adding distribute 0.6.24dev-r0 to easy-install.pth file
Installing easy_install script to /usr/local/bin
Installing easy_install-2.7 script to /usr/local/bin
Installing easy_install-2.6 script to /usr/local/bin

Using /usr/lib/python2.7/dist-packages
Processing dependencies for Distribute
Finished processing dependencies for Distribute


sudo easy_install -U Distribute
Searching for Distribute
Reading http://pypi.python.org/simple/Distribute/
Best match: distribute 0.7.3
Downloading https://pypi.python.org/packages/source/d/distribute/distribute-0.7.3.zip#md5=c6c59594a7b180af57af8a0cc0cf5b4a
Processing distribute-0.7.3.zip
Running distribute-0.7.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-IggX2I/distribute-0.7.3/egg-dist-tmp-Q36wpI
warning: install_lib: 'build/lib.linux-x86_64-2.7' does not exist -- no Python modules to install

Adding distribute 0.7.3 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/distribute-0.7.3-py2.7.egg
Processing dependencies for Distribute
Searching for setuptools>=0.7
Reading http://pypi.python.org/simple/setuptools/
Reading http://peak.telecommunity.com/snapshots/
Reading https://bitbucket.org/pypa/setuptools
Reading https://pypi.python.org/pypi/setuptools
Best match: setuptools 5.4.1
Downloading https://pypi.python.org/packages/source/s/setuptools/setuptools-5.4.1.zip#md5=96bd961ab481c78825a5be8546f42a66
Processing setuptools-5.4.1.zip
Running setuptools-5.4.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-GGJ3Yb/setuptools-5.4.1/egg-dist-tmp-AxT2X2
Adding setuptools 5.4.1 to easy-install.pth file
Installing easy_install script to /usr/local/bin
Installing easy_install-2.7 script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/setuptools-5.4.1-py2.7.egg
Finished processing dependencies for Distribute


 Solved

#ryu
usage: ryu [-h] [--config-dir DIR] [--config-file PATH] [--version]
           [subcommand] ...

positional arguments:
  subcommand          [rpc-cli|run|of-config-cli]
  subcommand_args     subcommand specific arguments

optional arguments:
  -h, --help          show this help message and exit
  --config-dir DIR    Path to a config directory to pull *.conf files from.
                      This file set is sorted, so as to provide a predictable
                      parse order if individual options are over-ridden. The
                      set is parsed after the file(s) specified via previous
                      --config-file, arguments hence over-ridden options in
                      the directory take precedence.
  --config-file PATH  Path to a config file to use. Multiple config files can
                      be specified, with values in later files taking
                      precedence. The default files used are: None.
  --version           show program's version number and exit

Wednesday, July 9, 2014

mahimahi - webtools

A set of lightweight tools for browser developers, website authors, and Web server developers to produce a usable benchmark for Web transport protocols and browser behaviors. Mahimahi can be used to record actual websites and replay them over various emulated link conditions. Mahimahi is free software and is available on Ubuntu (version 13.10 or higher). http://mahimahi.mit.edu/#about Getting Mahimahi Ubuntu 13.04 and later $ sudo add-apt-repository ppa:keithw/mahimahi $ sudo apt-get update $ sudo apt-get install mahimahi Operating system logos are trademarks or registered trademarks and are displayed for identification only. The vendors shown aren't affiliated with and haven't endorsed Mahimahi. Building from source $ git clone https://github.com/ravinet/mahimahi $ cd mahimahi $ ./autogen.sh $ ./configure $ make # sudo make install Dependencies Name Typical package Protocol Buffers protobuf-compiler, libprotobuf-dev autotools autotools-dev autoreconf dh-autoreconf iptables iptables pkg-config pkg-config dnsmasq dnsmasq Apache2 apache2 debhelper (7.0.50 or later) debhelper OpenSSL libssl-dev, ssl-cert Details Delayshell Delayshell uses clone() to fork off a new shell in a distinct network namespace. All packets to and from an application running inside delayshell are stored in a packet queue (one per direction). When a packet arrives at a queue, it is assigned a delivery time which is the sum of its arrival time and a user-specified fixed one-way delay. Packets are released from the queue at their delivery time. This technique enforces a fixed delay on a per-packet basis. Linkshell linkshell uses clone() to fork off a new shell in a distinct network namespace. linkshell uses packet delivery traces to emulate both time-varying links (eg. cellular) and fixed-rate links (eg. 12Mbps). When a packet arrives at the link, it is directly placed into one of two packet queues depending on its intended direction. linkshell releases packets from each queue based on the corresponding input packet-delivery trace. Linkshell interprets trace files such that each line in the trace represents a packet delivery opportunity: the time at which 1500 bytes can be sent. Accounting is done at the byte-level. Thus, a single line in the trace file can correspond to the delivery of several packets whose sizes sum to 1500 bytes. Delivery opportunities are wasted if bytes are unavailable at the instant of an opportunity. When linkshell reaches the end of an input trace file, it wraps around to the beginning of the trace file and resets it's base timestamp value. Recordshell recordshell uses clone() to fork off a new shell in a distinct network namespace. recordshell adds a routing table rule to forward all TCP traffic from inside recordshell to a man-in-the-middle proxy. The proxy accepts each TCP connection request from programs run inside recordshell. It then determines the original destination address of the TCP connection request and connects to the original destination on the program's behalf. The proxy runs an HTTP parser which parses all TCP traffic to and from a program running in recordshell to differentiate HTTP requests and responses from TCP segments. Once an HTTP request and its corresponding response have been parsed, the proxy stores them as a request-response pair. recordshell handles SSL traffic similarly using OpenSSL. To establish a connection with the program running inside recordshell, the proxy uses a fake self-signed certificate with the common name ’Mahimahi’. The proxy also makes an SSL connection request to the original destination. Since a self signed certificate is used (it is not signed by a browser-trusted Certificate Authority), the user must add an exception when loading a page in recordshell using HTTPS. At the end of a recording session, the user-specified record folder consists of a set of files: one for each request-response pair seen during that recording session. Each file is stored using Google Protobufs and includes the complete HTTP request and response (headers and body) as well as the connection's original destination ip address and port number, and the protocol used (HTTP or HTTPS).recordshell is compatible with any unmodified browser because recording is done at the packet level. The figure below illustrates the actions taken by recordshell to record a Web page: 1. Inside recordshell, a browser is used to load a Web page. 2. All HTTP requests from the browser are sent to the HTTP proxy running outside recordshell's isolated network namespace 3. These HTTP requests are sent to their original destinations in the Internet and the corresponding responses are sent back to the HTTP proxy 4. For each HTTP response received, the HTTP proxy sends the response back to the browser and sends the corresponding HTTP request-response pair to the Recorded site folder Replayshell replayshell uses unshare(CLONE_NEWNET) to create a shell in an isolated network namespace. This isolation prevents the accidental download of resources over the Internet when replaying page loads. Unlike Google's Web-Page-Replay, replayshell mimics the multi-origin nature of most web sites today by creating an Apache Web server for each distinct IP/port pair seen while recording. This prevents the browser from using a single TCP connection to download resources belonging to different web servers (impossible due to the point-to-point nature of TCP). replayshell binds its web servers to the same IP address and port number as their recorded counterparts using dummy network interfaces. All HTTP requests made within replayshell are handled by one of replayshell's servers, each of which can access to the user-specified recorded folder. Each server is configured to use a CGI script, called Replay Server, which compares incoming HTTP requests to the set of all recorded request-response pairs to locate a matching HTTP response. replayshell considers the Host, Accept-Language, Accept-Encoding, Connection, and Referer HTTP header fields, and ignores time-sensitive header fields (eg. If-Modified-Since, Cookie, If-Unmodified-Since) as they may change from run-to-run. Replayshell handles query strings by first matching object names (ignoring query strings) and then finding the longest matched query string amongst the matching object names. If an requested object has no match in the user-specified recorded folder, Replay Server returns an HTTP "Connection: close". Currently, replayshell supports HTTP and can be modified to support SPDY (using mod_spdy 0.9.3.3-386). HTTPS traffic is handled using Apache's mod_ssl. The figure below illustrates the actions taken by replayshell to replay a recorded Web page: 1. Inside replayshell, a browser is used to load a recorded Web page. 2. Apache servers are set up inside the isolated network namespace to mimic the Web page's actual structure and sharding 3. All HTTP requests from the browser are sent to the appropriate Apache server (based on destination IP address) 4. The Apache servers each execute the Replay Server CGI script to handle incoming HTTP requests 5. Replay Server's have access to the recorded site and find the matching HTTP request and corresponding HTTP response 6. Apache server's send the matching HTTP response back to the browser Example Commands Emulating a link with 100ms minimum RTT $ delayshell 50 All programs run inside this shell will have a 100ms round-trip delay for all their packets Creating a trace file for a 3Mbps link 0 4 8 . . . Since each line in a trace file represents a packet delivery opportunity, you must first convert 3Mbps into ms/packet. We assume a 1500 byte packet size. This yields 4 ms/packet. The trace file should thus increment each line by 4. The trace file should have at least two lines and can be arbitrarily large. Emulating a 12Mbps link $ linkshell 12Mbps_trace 12Mbps_trace All programs run inside this shell will have their packets traverse the link according to the input trace files. Since the desired link is 12Mbps in each direction, the same trace file can be used for the uplink and downlink. To emulate a 12Mbps link, the trace file should increment each line by 1 (representing a millisecond between packet delivery opportunities). It should start with 0, 1,..., and be of any length because linkshell, upon reaching the end of a tracefile, continues from the beginning of the trace file Record a Web session into the /bome/recorded_site/ directory $ recordshell /home/recorded_site/ All pages loaded from inside this shell will be recorded into /home/recorded_site/ as HTTP request/response pairs Replay a Web session using the recorded content in /home/recorded_site/ $ replayshell /home/recorded_site/ All pages loaded when recording into "/home/recorded_site/" can be loaded from this shell. No other pages can be loaded. Replay a Web session using the recorded content in /home/recorded_site/ over an emulated 12Mbps link with 100ms minimum RTT $ delayshell 50 [delay 10 ms] $ linkshell 12Mbps_trace 12Mbps_trace [delay 10 ms] [link, up=12Mbps_trace.txt, down=12Mbps_trace.txt] $ replayshell /home/recorded_site All page loads from this inner-most shell will have their packets sent according to the input 12Mbps traces and delayed with a 100ms round-trip delay Exiting To exit any shell, simply use exit. When nesting shells, exit will only exit the inner-most shell. You can verify that you have exited a given shell by checking that there is no prompt in the terminal (eg. [delay 10 ms] ). Replayshell has no prompt associated with it. To verify you have exited replayshell, you can use ifconfig and verify that there are no network interfaces named "shardedX" (where X is is the sharded server's identification number). Manual More details can be found in the delayshell(1), linkshell(1), recordshell(1), and replayshell(1) manual pages. Contact Mahimahi was written by Ravi Netravali, Anirudh Sivaraman, and Keith Winstein. For any questions or suggestions, please e-mail: mahimahi@mit.edu

Monday, July 7, 2014

Explaining L2 Multipath in terms of North/South, East West Bandwidth


Explaining L2 Multipath in terms of North/South, East West Bandwidth

http://etherealmind.com/layer-2-multipath-east-west-bandwidth-switch-designs/

In a number of Packet Pushers episodes, I’ve been referring to the nature of the data centre designs shifting from “North-South” type designs to “East-West-North-South”. Lets dig into this terminology a bit and show us

Spanning Tree is always North / South

I’m reasonably confident that most people who read this will comprehend how a switching network will use spanning tree to create a TREE.
North south east west 1
It will look something like this where the sore switches are configured to act as the ‘root’ of the spanning tree, and traffic flows to core to the edge. More correctly, traffic always flows from edge to core to edge and always in a fixed direction. Because we tend to draw the core at the top of the diagram, and shows connections to the distribution and access layers as connecting down the hierarchy, we tend to see a ‘top to bottom’ or north-south distribution of data traffic flows.
Where this model fails, is that bandwidth between servers that are on two branches must cross the core of the network as shown in this network diagram.
North south east west 2

The Weakness is the Core Switch Interconnect

The challenge with this is that the connection between the core switches can become heavily overloaded, especially in networks where the server fanout is large and commonly occurs in heavily virtualised network. To some extent, this is a new problem. Previously, the core switches would be interconnected with an EtherChannel that would provide multi-gigabit connectivity, and recently we saw the introduction of 10GbE ports which allowed for further increases in the core capacity.
Now that servers are connected and 10GbE, and the addition of storage data means that sustained traffic flows have increased, and not just by twenty or fifty percent. Storage data (whether iSCSI, NFS or even FCoE) means that these designs won’t last much longer.
Currently, it’s convention to locate the storage arrays close to the core network switches so as to reduce the workload in the branches of the tree which isn’t a bad strategy. But this doesn’t account for the East-West migration of virtual machines.

Layer 2 Multipath Switch Networking

Layer 2 Multipath (L2MP) refers to the recent developments in Data Centre networks where the core switch can no longer handle all the load. That is, if you have a three hundred physical servers and each physical servers hosts twenty virtual machines, then the gross data load including storage traffic will easily exceed the interconnect. We talk about the development of data centre models that support east-west traffic flows.
North south east west 3
In this type of design, we can see that a L2MP core, regardless of the type – Big Brother or Borg style, means that bandwidth does not choke around any specific point in the network. So not only does the network support the traditional North/South bandwidth alignment that we have today, which creates artificial limits on how we can locate and distribute servers inside existing data centre networks, we are now able to provide East/West bandwidth to support loads that are dynamically moved around the data centre with a lesser degree of concern for key choke points that exist in legacy designs.
This especially applies to converged network where the storage data creates new loads that increase the sustained usage of the Ethernet network.

Scale

Also, because hot spots can exist in the network core as traffic loads migrate around the network edge points, the L2MP allows for additional connections to be added as needed. Note that adding does not have the potential service impact and risk profile that making changes to spanning tree presents. Therefore, the network becomes more flexible (or less “crystalline” is the term that I use).
North south east west 4
Note that the terms Borg and Big Brother are fully described in http://blog.ioshints.info/2011/03/data-center-fabric-architectures.html blog post from Ivan Pepelnjak.

The EtherealMind View

It’s worth noting that these changes are key to successfully addressing the networking requirements for virtualisation. Hopefull this helps to explain some of the reason that new switch architectures from Juniper and Cisco that relate to Fabric networking are important.

Bisectional Bandwidth

It’s worth noting that this problem is also related to the topic of Bisectional Bandwidth and the measurement of the server to server bandwidth as a function of the architecture. I wrote about this in this blog post : http://etherealmind.com/bisectional-bandwidth-l2mp-trill-bridges-design-value/

bee-social