Novel Data Representations
Does a two-dimensional picture really contain as much information as a thousand words?
What about a four-dimensional hyperimage from a sophisticated sensor? Asking such questions led ThinkTank Maths to re-examine what we mean by information. The word information brings to mind bytes or megabytes — counting the bits that computers use to represent data. In fact, such ideas go back to the work of Claude Shannon in the 1940s, whose key mathematical idea, Shannon entropy, underpins nearly all of the communication technologies we use every day.
However, Shannon’s approach is far from perfect when applied to the challenges faced by modern science. The volume of data created by sensors and instruments like the Large Hadron Collider may be so enormous that most of it actually has to be thrown away — so how should one decide what to keep without analysing everything? Or what should a Mars rover transmit back to Earth during its narrow communication window? To address such questions, it is crucial to measure the interestingness of data rather than just the raw information content.