edmaher.com

software - it consulting - invention - photography

What happened to Peer to Peer ?
Peer to peer?
The peer to peer architecture is one where there is no centralised computer for processing or storage of data, but the system is based around a distributed network of many machines providing processing and storage. Hailed as a super-solution during the Napster years, the peer to peer architecture has faded from the headlines, and is rarely written about or discussed today.

The solution to all the problems ?
In the heady days of the tech boom, we were told that peer to peer systems will be the only way for the future. Aricles about the wonderful new technology appeared in Computing, Computer Weekly, and Time Magazine. Peer to peer systems would solve all the problems, from the need for more computing power, to the storage problems being faced by the corportate IT department, so why aren't businesses all using peer to peer software now ?
Quite simply, the peer to peer architecture solves very few problems, indeed it solves problems that have already been solved and replaces old problems with new ones.

Availability
Anyone who used Napster may be able to relate to one of the reasons why peer to peer is not a good answer to many of the corporate problems. Quite simply, while the file you want may be held in the peer network, it may not be available. It may be on Bobs PC and Jims PC, but one of those may be turned off because he's on holiday, and the other may be being serviced or upgraded, the result is that the file is not accessible.
Corporate data is predominantly held on fileservers which are maintained by the IT department and built and designed to be always available, in the main they do not get turned off or shut-down in the middle of the working day. Such servers often use technology such as redundant power supplies and redundant disks to ensure that faults do not stop the system working, these are not typically deployed to desktop machines.

Replication
The peer to peer fans tell us how all the data is replicated all over the place, so we can't accidentally loose it, and it's always available ( I already suggested that the files are not always available ). Never mind the fact that properly managed servers are backed-up regularly and that backups are taken off-site to protect against fire and disaster, there are already methods to ensure 100% availability of the files, a technique called clustering. Clustering is different from peer to peer replication because it is controlled, the behaviour is defined. If Susan updates a document, the clustering software makes sure that the document is updated wherever it is duplicated, it also deals with locking access to the file, so that no-one else can update it at the same time. In general a controlled system ensures that there is only one version of the file available, and that a backup is always available, peer to peer appears random in this regard.

Security
Replicating confidential documents onto the office PCs is not really a sensible idea. Employees gaining access to each other's salary reviews is a recipie for disaster, let alone if they can modify them. How can an auditor prove that the security policy is implemented, when the data is all over the place ? A server may be secured in a high security room, but the PCs used by staff are likely to be in a low security office open to cleaner, tradespeople, and other visitors, and physical security will be less stringent in such an area. This problem may be solved with encryption and peer to peer software that implements system-wide access control.

Load balancing ?
Peer to peer systems spread resources all over the place, so surely they are good for balancing load, either between machines, or over the network. Such systems would mean that network traffic would become unpredicatble, as would the disk or processor load on any PC, this means that it's not possible to design the network to deal with the requirements, and it's not possible to monitor the systems to identify bottlenecks, because any changes or measurement is invalidated by the randomness of peer to peer distribution of resources. Overloaded servers may be supported by spreading data over disks, increasing disk cache, adding more or faster network interfaces and clustering. Overloaded networks can be fixed by identifying where the load comes from, and changing the topolgy, increasing the backbone capacity, adding switches or routers.

What is Peer to Peer good for ?
By now it should be obvious that peer to peer systems are mainly good where a centralised system is not possible at all, in the main, this is not the case in a corporate IT environment, the only place where it is relevant is in the illegal redistribution of copyright or prohibited materials, where it becomes impossible to shut-down. but actually Napster was shut down, ironically, this was possible because Napster relied on the avilability of centralised servers to supply the indexing and directory service.

Was Napster innovative ?
Napster was the peer to peer program that captured the headlines ( despite not being 100% peer to peer ). It's main popularity stemmed from the fact that people sould suddenly find (illegal) digital copies of music online quite easily. There was one feature of Napster that was it's major advantage and innovation, and it had little to do with peer to peer. The innovation was not the file transfer, Napster's was less reliable than other methods developed years before. There was nothing particularly clever in sharing files from PCs, that's been done in one form or another for some time. The real innovation which is yet to be developed further, was the peer-based index updating. When the Napster application was started, it would synchronise the list of files available on your PC with a list on the central index servers, so it would always be accurate. Imaginge how much more accurate Altavista and Google would be if the webservers scanned and indexed their own content, and sent updates to the search engine servers, imagine saving all the internet traffic used by search robots, and users requesting pages that appear as the ubiquitous 404 error.

Design by Ed Maher 2002