Thursday, June 19, 2014

Facebook has built its own switch. And it looks a lot like a server

4 Comments

Jay Parikh Facebook Structure 2014
Summary: Facebook has built its own networking switch and developed a Linux-based operating systems to run it. The goal is to create networking infrastructure that mimics a server in terms of how its managed and configured.
Not content to remake the server, Facebook’s engineers have taken on the humble switch, building their own version of the networking box and the software to go with it. The resulting switch, dubbed Wedge, and the software called FBOSS will be provided to the Open Compute Foundation as an open source design for others to emulate. Facebook is already testing it with production traffic in its data centers.
Jay Parikh, the VP of infrastructure engineering at Facebook shared the news of the server onstage at the Gigaom Structure event Wednesday, explaining that Facebook’s goal in creating this project was to eliminate the network engineer and run its networking operations in the same easily swapped out and dynamic fashion as their servers. In many ways Facebook’s efforts with designing its own infrastructure have stemmed from the need to build hardware that was as flexible as the software running on top of it. It makes no sense to be innovating all the time with your code if you can’t adjust the infrastructure to run that code efficiently.
ocpnetwork
And networking has long been a frustrating aspect of IT infrastructure because it has been a black box that both delivered packets and also did the computing to figure out the path those packets should take. But as networks scaled out that combination — and the domination of the market by giants Cisco and Juniper — was becoming untenable. Thus efforts to separate the physical delivery of packets and the routing of the packets was split into two jobs allowing the networks to become software-defined — and allowing other companies to start innovating.
The creation of a custom-designed switch that allows Facebook to control its networking like it currently manages its servers has been a long time coming. It began the Open Compute effort with a redesigned server in 2011 and focused on servers and a bit of storage for the next two years. In May 2013 it called for vendors to submit designs for an open source switch and at our last year’s Structure event Parikh detailed Facebook’s new networking fabric that allowed the social networking giant to move large amounts of traffic more efficiently.
fbwedge
But the combination of the modular hardware approach to the Wedge server and the Linux-based FBOSS operating system blow the switch apart in the same way Facebook blew the server apart. The switch will use the Group Hug microprocessor boards so any type of chip could slot into the box to control configuration and run the OS. The switch will still rely on a networking processor for routing and delivery of the packets and has a throughput of 640 Gbps, but eventually Facebook could separate the transport and decision-making process.
The whole goal here is to turn the monolithic switch into something that is modular and controlled by the FBOSS software that can be updated as needed without having to learn proprietary networking languages required by other providers’ gear. The question with Facebook’s efforts here is how it will affect the larger market for networking products.
Facebook’s infrastructure is relatively unique in that it wholly controls it and has the engineering talent to build software and new hardware to meet its computing needs. Google is another company that has built its own networking switch, but it didn’t open source those designs and keeps them close. But many enterprise customers don’t have the technical expertise of a web giant, so the tweaks that others contribute to the Open Compute Foundation to make the gear and the software will likely influence adoption.

Network Monitoring

http://www.sdncentral.com/news/aristas-tap-danzing-with-the-stars/2013/02/

Arista Networks today announced a cloud-scale tap aggregation solution appropriately called DANZ (Data ANalyZer). DANZ is an integrated switch-based tapping solution that enables advanced monitoring and port spanning capabilities on Arista’s 7000 series of switches—most notably the recently announced Intel/Fulcrum FM6000-based 7150. With DANZ, Arista eliminates the need for external proprietary fabrics that facilitate monitoring, capture and reporting, and leverages the existing switching infrastructure to create the necessary scale and favorable economics.
In addition to basic port-spanning and static filters, Arista has leveraged the switching chipset’s capabilities to enable an advanced programmable event-driven system that provides intelligent capture based on critical events such as queue lengths, and can track monitoring on VMs as they move around the infrastructure. DANZ also integrates with Arista’s enhanced LANZ (Latency ANalyZer) feature, allowing fine-grained time-stamping of packets as they are filtered, so 3rd-party monitoring solutions can use the precise time-stamps for their analysis. LANZ yields precision on the order of 10ns for each and every queue and transit link within a network of Arista switches—providing fine-grained data for identifying hotspots in the network.
Arista DANZ Precision Data Analysis Arista DANZ’s feature-set include:
  • SimpleCLI/XMPPInterface
  • Any:Any Replication
  • Symmetric Loadbalancing/ Intelligent Loadsharing
  • Time-Stamping
  • AEM Event Services
  • PTP1588
  • EnhancedL2/3/4Filtering
  • Source Port Identification
  • Packet Truncation
  • AgilePorts+40G
  • Buffer Tuning
  • sFlow and LANZ
Arista DANZ Framework
Doug Gourlay, VP of Marketing, tells SDNCentral that DANZ was driven by customer requirements to have cost-effective and scalable solutions for monitoring their network infrastructure as networks go from 10Gbps to 40Gbps speeds and beyond. Today’s monitoring solutions are priced as high as $4,000 per monitoring port versus $300-400 for a 10Gbps switch data-carrying port. This makes it infeasible for customers to place a monitoring probe on every port. Doug was shocked that vendors were able to charge 10X for essentially a switch-port with L2/L3 functionality like BPDU, MAC learning, multicast etc turned off. With the DANZ approach, Doug claims that Arista can do it for 10X less with 3X increase in density for a 30-fold improvement, providing dramatic CAPEX savings due to consolidation of production and monitoring networks. This also provides significant OPEX savings due to automation and event-driven programmability. As part of automation and programmability, DANZ also exposes a JSON API that allows programming of which ports to redirect on, what type of flows to redirect and where the flows should be directed to, providing programmatic access for scripting and partner control.
The partners Arista has signed at launch reads like a veritable who’s who in network monitoring including vendors like Corvil, Endace, SolarFlare.
Arista DANZ Partner Ecosystem
When asked about why these partners would work with Arista when DANZ reduces the number of ports they can sell, Doug responded that these partners had deep visualization and analysis capabilities which Arista will not provide but that Arista could help broaden the market and reach for their analysis solutions. As for the network packet broker vendors, Doug’s view was that Arista could provide a solution appropriate for 80% of customers, but that the packet brokers with deep hardware capabilities built on NPUs and FPGAs would still have customers with special need for their capabilities. And finally, for Arista SDN partners like Big Switch who also have their own Big Tap solution, Arista’s view was that the tap aggregation problem can be solved multiple ways with OpenFlow being one, and DANZ another.
All DANZ features are bundled in the Arista EOS “Z” licenses which provide multi-destination mirroring, precision timing, timestamping, packet filtering, TAP aggregation, LANZ, AEM, EOS API, OpenFlow Agent, Cloudvision XMPP, and existing features like ZTP. The Z licenses are priced at appropriately $50 per port in addition to the price of the switch chassis.
The SDNCentral view:
  • This is a real improvement in datacenter switches. Built-in tapping within the datacenter switch fabric is not a new idea, and Cisco has announced something similar in its OnePK-base SDN solution. However, Arista has actually executed on it with a rich feature-set, including programmatic event-driven flow spanning, load-balancing, and fine-grained timestamping with no performance impact on capture ports. This is a significant leap forward for the datacenter switch and it fulfills one of the key pain points that SDNCentral has been hearing about from enterprise and service provider customers. It’s no surprise that one of the first SDN applications we’ve seen across multiple commercial controllers is the network tap.
  • SDN as an orchestration framework has benefits. Arista’s approach of not leveraging OpenFlow in their solution actually has quite a few benefits—it allows all existing switch logic to run unperturbed, providing the existing reliability and survivability that has already been proven. In addition, it utilizes the fine-grained timestamping that Arista built into their switches. And it still provides the programmatic API and orchestration that is driving customer towards SDN-based infrastructure. Essentially a win-win approach in line with what we are seeing as a shift in SDN approaches—moving from raw programmability of flows to orchestration of network applications.
  • Gigamons of the world beware. Arista may be the first, but will not be the last switching vendor to offer built-in tapping and this will put pressure on network packet brokers like Gigamon, VSS Monitoring/Danaher, Anue/IXIA, to rethink their strategy in an SDN world and adjust pricing models to keep themselves relevant.
  • A challenge to the 3rd-party controller. SDN controller vendors like Big Switch who have tapping applications will need to innovate to add value over what Arista has built in. This will be inherently an uphill battle as a vendor like Arista will probably not expose proprietary features like LANZ via OpenFlow controls, making their built-in solution superior to any based on OpenFlow.

Original press release follows:
Arista Networks Brings New Level of Precision Data Analysis
Redefines TAP Aggregation Applications for Software Defined Networking
 February 13, 2013 – Arista Networks today announced new EOS (Extensible Operating System) capabilities for advanced data analysis for the SDN (Software Defined Networking) market. Arista data analyzer, named DANZ, brings an advanced suite of TAP Aggregation functions integrated into the Arista 7000 series, providing IT departments the data they need at a fraction of the cost of competitive solutions.
Arista DANZ is the first integrated switch-based solution at cloud-scale that provides precise visibility of network conditions without additional hardware infrastructure offering uncompromised performance. With the inherent openness of EOS, Arista interoperates with a broad ecosystem of partners ensuring fine-grained visibility and traffic monitoring across the network.
“Organizations continue to consolidate data centers, creating increasingly large and complex network environments that are forced to handle massive amounts of traffic. In order to meet or exceed demanding service levels, it is imperative to have visibility into the environment,” said Bob Laliberte, senior analyst at ESG. “Arista leveraged its SDN capabilities in EOS to develop a compelling offering for organizations requiring precise and accurate network analytics reporting.”
Next Generation TAP Aggregation is Here
Arista DANZ’s advanced monitoring capabilities provide strategic and integrated network-wide analytics across leaf and spine-based cloud networks. State-of-the art innovations such as multi-destination mirroring, packet filtering and manipulation, port-mirror source aggregation and forensics, work together to reduce the complexity and cost associated with deploying network analysis at multiple locations at multi gigabit wire speed.
Enhancements to the current Latency Analyzer, LANZ, provide early alerts of application-level congestion and correlations of events. IT operators can now proactively profile their network, applications and cope with microbursts and congestion hot spots before critical business applications are impacted.
Arista DANZ makes next generation tap aggregation a SDN reality. Typically, monitoring the network costs network administrators time and money with slow, expensive probes and external monitoring devices. Arista DANZ is a compelling alternative, offering open API’s tightly coupled with advanced event management (AEM), giving customers tremendous programmability. This enables quick reactions at wire-speed performance without any human intervention.
Availability
Arista DANZ is available immediately on the Arista 7150 series as an EOS “Z” license option augmenting OpenFlow support on the Arista 7050 series. Additional examples of Arista SDCN applications are also available on EOS Central. 
About Arista
The company was founded to deliver software-defined cloud networking solutions for large data center and computing environments. Arista’s award- winning 10 GbE switches redefine scalability, robustness, and price– performance, with more than one million cloud networking ports being deployed worldwide. At the core of Arista’s platform is EOS, the world’s most advanced network operating system. Arista Networks products are available worldwide through distribution partners, systems integrators and resellers.
Additional information and resources on today’s announcement can be found at: http://www.aristanetworks.com

OpenFlow Messages

bee-social