?

Log in

No account? Create an account
 
 
gwalla
16 February 2005 @ 12:53 pm
Topology is to Graph theory as the real numbers are to the integers.

EDIT: It'd be nice if Deepest Sender's linking feature FUCKING WORKED
Tags:
 
 
Current Mood: dorkydorky
 
 
gwalla
16 February 2005 @ 07:26 pm
Since IRC was first invented, the standard network topology for it has been a rooted tree. The advantage of this is its simplicity: it's easy to map,  it's easy to configure, and if a server receives a message from one connection and sends it only to each other connection, then messages can never be duplicated. The downside, as everyone who has spent significant time on IRC knows, is its fragility: if a single connection goes down, the network is split into two, and there is no way for messages on one half of the split to reach servers and clients on the other; if a hub goes down, the split can be even worse. This type of network gets more unstable the more layers of hubs you add to it (the longer a path between servers, the more likely the path will be broken, because it is dependent on more connections, all of which are capable of breaking), and since bandwidth is limited, larger networks demand more layers of hubs. A star topology (every leaf connected to a single hub) isn't scalable, and furthermore the hub itself becomes a major point of failure (if it goes down, every leaf becomes a stand-along server unable to communicate with the others).

One solution to this is to move from a acyclic tree topology to a less strict connected-graph topology that allows cycles. A naïve solution is to make every server connect to every other server, but this just extends the bandwidth problems of a star topology to every server. A more realistic solution would be a topology in which there are at least two internally disjoint paths between any two servers: in this case, at least two connections would have to break for two servers to lose contact with each other. The problems with generalized graph topologies are increased overhead (both bandwidth and processor), and more message complexity (each message would need some sort of identifier, so a server can tell if it has already received a copy).

However, there's another way: multicasting. Traditional IRC server-to-server communication (and, in fact, most Internet protocols) use unicasting (in particular TCP/IP,which maintains connections and handles lost packets), which is communication between two peers. It's one-to-one. Multicasting is one-to-many or many-to-many: peers register with their local network as listeners to a shared multicast IP address, and any packet sent to that address is received by all listeners. The problem with multicasting has been that it has no concept of connections like TCP/IP does—lost packets vanish into the ether and listeners have no way of detecting that a packet has been lost, making it useful for applications where some packet loss isn't a big deal (like audio and video, where a stream loses some quality if a few packets disappear, but remains intelligible). However, more recently reliable multicasting protocols have been developed. These build on top of multicasting by implementing ways of identifying lost packets and requesting that they be resent. For example, one tactic is for a server to number each packet it sends sequentially, so if a listener receives packets #12 and #14, it knows that it has missed #13 and can send a "noack" (negative acknowledgement) requesting that #13 be resent.

Basing server-to-server communications on reliable multicasting would allow the topology to be a hypergraph, in which each "edge" can connect more than two servers. On a network like this, all servers could be "leaf servers" sending and listening to a single multicast IP. In effect, the Internet itself would be acting as a "hub", without the bandwith problems a single hub for several leaves would have. With a sufficiently sophisticated reliable multicasting protocol, increasing the number of servers would actually increase the robustness of the network (e.g. in the noack-based protocol, any server that has received a packet could respond to requests for it, not just the originating server).

The main disadvantages with the multicasting solution are unfamiliarity (most network programmers are used to TCP/IP and UDP/IP, but multicasting is new and rare) and obtaining a multicast IP (I don't know how you'd go about getting a transient multicast IP, although I know you can). Another problem is security: there is no way of controlling who is allowed to listen to or send to a multicast IP the way you can allow or disallow connections to a hub you control. A snooper could conceivably register himself as a listener at the multicast address and packet-sniff from nearly anywhere on the 'Net. Also, an attacker could send spoofed packets to the multicast address and perform commands he shouldn't be allowed to (e.g. oper commands). However, implementing some simple encryption (we're not talking SSL or PGP here—more like XORing packets with a shared private key) and authentication, rather than sending everything in the clear, would remove that problem.
 
 
Current Mood: geekygeeky