CS-534: Packet Switch Architecture
Spring 2003
Department of Computer Science
© University of Crete, Greece

6.4   Backpressure in Buffered Switching Fabrics

[Up: Table of Contents]
[Prev: 6.3 Per-Flow Backpressure]

[Next: 7.1 Conserv.Law, Hierar.Sch.]

[Copyright Notice: This section is presented by borrowing material from the ASICCOM/ATLAS project of ICS-FORTH (1995-98); this material is owned (copyright'ed) by ICS-FORTH, and is presented here with the permission of ICS-FORTH].

ATLAS I is a 10 Gb/s single-chip ATM switch that supports credit-based flow control (multilane backpressure).

6.4.1   Backpressure Buffered Fabrics versus other Architectures:

Output Queueing Architecture

Output queueing is the ideal switch architecture from the point of view of performance, but it becomes impractical for large valency switches because its cost grows with the square of the number of links. Below we will see how ATLAS I based switching fabrics with internal backpressure emulate this architecture so as to provide comparatively high performance at a significantly lower cost. In this and the next two trancparencies, the color of a cell indicates the output port of the fabric that this cell is destined to (in the ATLAS I based fabric, this will identify the flow group that the cell belongs to).

Input Buffering Architecture

Input buffering (otherwise called advanced input queueing, or virtual output queues at the inputs) is one method by which designers have tried to provide high switch performance at reasonable cost. It avoids the head-of-line blocking problem of FIFO input queueing by providing multiple logical queues (one per output) in each input buffer. The switch part of this architecture must solve a matching problem during each cell time: for each input buffer, choose one of the cell colors present in it, so that no two input buffers have the same color chosen, and so that the number of colors chosen is maximized (or other performance criteria, e.g. fairness). For large valency switches, this matching problem is extremely hard to solve quickly and with good performance. Bellow we will see how ATLAS I based switching fabrics with internal backpressure emulate the solution of this matching problem in a progressive and distributed manner.

Buffered Fabrics with Internal Backpressure: Best Price / Performance

MuqPro-ATLAS Switching Fabrics with Internal Backpressure

The effect of the ATLAS backpressure is to push most of the output queue cells back near the input ports. The head cells of the output queues, however, are still close to their respective output ports. The fabric operates like an input-buffered switch where the ATLAS chips implement the arbitration and scheduling function in a distributed, pipelined fashion. In every link, all connections going to a given output port of the fabric form one flow group (colors correspond to flow groups). Each MuqPro maintains separate logical queues for each flow group. For simplicity, this transparency only shows a 4x4 fabric made of 2x2 switch elements --this architecture, however, scales perfectly well to very large switching fabrics. In this switching fabric, the only external memories are those attached to the MuqPro's, whose cost is the same as in input buffering. In order for the cost to be kept at that low level, the building blocks of the switching fabric (i.e. ATLAS) must not use off-chip buffer memory. Since buffer memory in the switching elements has to be restricted to what fits on a single chip, backpressure is the method to keep this small memory from overflowing all the time. Performance-wise, this switching fabric offers properties comparable to those of output queueing. Saturation throughput is close to 1.0 even with few lanes (see below). No cell loss occurs in the fabric --cells are only dropped when the (large) MuqPro queues fill-up. Traffic to lightly loaded destination ports (e.g. G) is isolated from hot-spot outputs (e.g. R) as verified by simulation (see below).

6.4.2   Backpressure Cost-Benefit Evaluation:

This section uses transparencies from the talk of M. katevenis on "Implementation of ATLAS I: a Single-Chip ATM Switch with Backpressure" at the IEEE Hot Interconnects 6 Symposium, Stanford, CA USA, Aug. 1998. For the full paper, see:
  • G. Kornaros, D. Pnevmatikatos, P. Vatsolaki, G. Kalokerinos, C. Xanthaki, D. Mavroidis, D. Serpanos, M. Katevenis: "ATLAS I: Implementing a Single-Chip ATM Switch with Backpressure", IEEE Micro Magazine, vol. 19, no. 1, Jan/Feb. 1999, pp. 30-41.
  • http://archvlsi.ics.forth.gr/atlasI/hoti98/

    ATLAS I floorplan with functions colored

    ATLAS I cost per function

    Switching fabrics with internal backpressure

    Alternative 1: large, off-chip buffers

    Alternative 2: internal speedup

    Cost of alternative 1 (large, off-chip buffers)

    Cost of alternative 2 (internal speedup)

    6.4.3   Credit Sharing among Multiple Flows:

    QFC-like Credit Protocol of ATLAS I

    Benefits from the Multilane Backpressure of ATLAS I

    Credit Protocol Performance Simulation

    We simulated Banyan networks like the ATLAS-MuqPro backpressured fabric above. Flow groups corresponded to destinations. The switch elements simulated were implementing a credit flow control protocol that is a generalization of the ATLAS I protocol: the flowGroup (destination) credits were initialized to any number --not just 1, as in ATLAS I.

    Saturation Throughput Simulation Results

    We see that a modest buffer space --around 8 to 16 cells per incoming link (for the low priority class)-- suffices for the outgoing links to reach full utilization (presumably, by low priority traffic, which fills in whatever capacity remains unused by higher priority traffic). The ATLAS protocol (red line) performs better than the traditional multilane wormhole protocol for the reasons outlined below.

    Non-Hot-Spot Delay, in the Presence of Hot-Spot Destinations

    With the ATLAS protocol, when the number of lanes is larger than the number of hot-spot output ports of the fabric (1 or 2 ports --upper two red curves), the delay to the non-hot-spot outputs remains unaffected by the presence of hot-spots (it is the same as when there are no hot-spot outputs --bottom red curve). This is precisely the ideal desired behavior for hot-spot tolerance.

    ATLAS versus Wormhole Protocol Performance: Explanation

    [Up: Table of Contents]
    [Prev: 6.3 Per-Flow Backpressure]

    [Next: 7.1 Conserv.Law, Hierar.Sch.]

    Up to the Home Page of CS-534
    © copyright University of Crete, Greece.
    Last updated: 16 May 2003, by M. Katevenis.