13 Aug 2013

9 algorithms that changed the future - John MacCormick: Book Review

The Book

9 algorithms that changed the future (the ingenious ideas that drive today’s computers) by John MacCormick

My Thoughts

I came across this book while idly browsing a book store and since I’ve found most introduction to algorithms books very dry I thought it’d be interesting to see what one aimed at the general public would be like.

Overall it was an enjoyable read and I quite like the pattern that the author used for each algorithm, which was:

Describe the problem that it’s needed for.
Explain a simplified version of the algorithm or use a metaphor to give the general outline.
Explain which bits were simplified and how the real version addresses those simplifications.

The first step is often missed out in algorithms books which is a mistake for people like me who become more interested in a subject once a practical use case is explained.

Although the title claims 9 algorithms I counted the following 8 which made the cut:

Search Engine Indexing - this chapter covers how you’d go about writing the part of a search engine which works out which pages are applicable for certain search terms. It effectively describes Lucene.
Page Rank - this chapter described Google’s PageRank which was particularly timely for me as I’ve recently been playing around with some similar algorithms. The author works step by step through the algorithm using a simple example covering it very throughly. The best thing I’ve seen on the web is PageRank explained with Javascript although that is slightly more technical than the book. I didn’t realise that Sergey Brin and Larry Page had published a paper back in 1998 titled 'The Anatomy of a Large-Scale Hypertextual Web Search Engine' which explains the initial PageRank algorithm in more detail.
Public Key Cryptography - this chapter mostly covers Diffie-Hellman key exchange which I realise is quite well explained on wikipedia as well.
Error-Correcting Codes - I took the checksums included in somewhat for granted but in this chapter MacCormick goes through the problem of data getting lost or corrupted in transfer and iterates through potential solutions. The further reading from this chapter is 'A Mathematical Theory of Communication', Hamming code and Reed-Solomon error correction.
Pattern Recognition - this chapter covered a variety of machine learning algorithms initially focusing on digit recognition - something that Jen Smith and I spent a chunk of time working on last year for the Kaggle problem. The three algorithms covered are nearest neighbours, decision trees and neural networks, all of which we attempted! I recently came across the concept of convolutional neural networks and deep learning which I’ve yet to try out but are apparently even more accurate than plain neural networks.
Data Compression - I imagine data compression would be one of the more familiar algorithms on this list since everybody knows how to 'zip up a file' and send it around. The author covers lossy algorithms such as 'JPEG Leave it Out' which reduces image quality as well as size, as well as lossless algorithms such as LZ77, Shannon-Fano coding and Huffman coding. The latter is covered in Stanford’s Algorithms II and I think the explanation there is actually easier to understand than the book’s.
Databases - this section mostly focused on how ACID compliant relational databases work and covered things like the write ahead log and index lookups. The recommended reading from this chapter is Transaction Processing: Concepts and Techniques. Given the fact that many of the popular websites that people use tend to use NoSQL stores these days I thought there might be some mention of that but it was left out.
Digital Signatures - this chapter ties in quite closely with the one on public key cryptography. It focused on signing of software with a digital signature rather than the signing of emails which is what I expected the chapter to be about. The RSA algorithm is described and the link between the difficultly of factoring large numbers and the security of the algorithm is explained.

I enjoyed the book and I’ve got some interesting articles/papers to add to my reading list. Even if you already know all the algorithms I think it’s interesting to hear them described from a completely different angle to see if you learn something new.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.