The end of Newtonian thinking, please !
January 17, 2010
On July 20th 1969 Apollo 11 landed men on the moon using substantially less computing power than is available in today’s mobile telephone. Its not a direct comparison since the Apollo Guidance computer wasn’t capable of floating point operations and that’s kind of the point of this post.
In 1969 reaching the peak of technology, flying three men to the moon, was possible using Newtonian mechanics and the calculations necessary to do that were available using simple integer-only computers. Cause and effect were still the gold standard of science and the weird world of Quantum Mechanics had not penetrated the public psyche even though it had been out in the open for 40 years. We’re 40 years still further down the line and probability-based interpretations of reality still haven’t gained widespread acceptance only, where this wasn’t an issue for Buzz & his buddies, its starting to cause real problems for science and its wider understanding and acceptance.

This is the famous Solvay Conference where the early developers of Quantum Mechanics discussed their new ideas with Einstein and others. Einstein hated the idea that probability functions and not deterministic processes could be the prime movers in the universe. But by any reasonable measure of science he and his theory of General Relativity are wrong. Perhaps it would be more generous to say that it was incomplete, but when compared to QM it pales by comparison in predictive power and accuracy. Yet Einstein still holds the public heart with his shock of white hair and sticky-out tongue. E=mc2. There is a reason why it is so easy to remember. As a theory to completely describe the relationship between mass and energy, its wrong. This is the Standard Model Lagrangian Expansion that physicists currently believe is the best explanation of that relationship. Bit more complicated isn’t it
Its also the reason for the Large Hadron Collider (LHC) since the term dealing with gravitons has not been tested conclusively.
Now I know that, strictly speaking, Einsteinian physics and Newtonian physics are not the same thing but they do share a common philosophical thread; the idea that a single cause will have a single effect. In Newtonian mechanics this is exemplified by ‘every force has an equal and opposite reactive force’. In the Eisteinian universe gravitation is an expression of warps in Space-Time.
Quantum Mechanics has a fundamentally different philosophical standpoint and does not experience or express the universe as cause and effect. Instead probability describes the likelihood of something happening. If you don’t know about this stuff already I’d recommend The Elegant Universe as a starter. Its a bit effects-heavy, but has an open and accessible style.The recent BBC show The Secret Life of Chaos shows the parallel development of complexity and non-deterministic mathematics.
The reason why I say Newtonian mechanics is holding back public understanding of science is that it is so easily testable. Table-top experiments show cause and effect at work, and our real-world experience backs that up. Most belief systems go further still with cause and effect being the structural basis for many moral codes – ‘thieves will go to hell’ – that sort of thing. So our social norms AND out experience of the physical world are predicated on cause and effect. But cause and effect stopped being a good explanation of the observable universe almost 80 years ago and still the public psyche is firmly rooted to that way of seeing the universe.
What we need to start doing is educating kids on roulette, betting odds, and all other manner of statistical analysis because quite simply we are not telling them the truth when we teach them that 2+2=4. What they should know is that there is a high probability that 2+2=4 in most situations, but don’t blinker yourself to other possibilities available under the Bell Curve. Until certainty is left behind humanity is going to have a hell of a philosophical challenge on its hands in living with the duality of macro and micro descriptions of the universe.
The Ethics of Data Mining – Mediated Memory
January 9, 2010
I’m going to pose myself questions questions here rather than answer them. Self-indulgent I know, but hell its my party and I’ll cry if I want to
Digital data has some properties that could or should impact on ethics. I’m going to take a look at three of them this time;
It is non-corporeal, so possibly not as susceptible to the ravages of time as paper would be
It is transmissible, so probably not subject to physical location
It is a record of events that may be edited or erased leaving little or no evidence of those actions
The first two points are similar, in that they relate to inadvertent data loss, but relate to ethics in very different ways. The third is a very different quality.
Non-corporeality – The onwards march of time and technology makes specific media obsolete. That is as true for spoken language as it is for other media formats, but the loss of spoken languages is a large enough topic for a post on its own, so I’ll stick to physical media.
Ask yourself – ‘When did I last buy a 35mm photographic film or a 90 minute audio cassette ?”
For myself, it’d have to a decade or more, and I owned a 35mm SLR camera until 3 years ago ! I just kept a stock of old film in a box in the fridge up to its use by date, and past in some cases. Now as I climb the technology ladder I have my music CDs as reference, but don’t need to touch them as my music is transferred from device to device with no apparent loss of quality. So long as I make those steps up the ladder while technology exists to bridge the gaps no data is lost. So here comes the first question.
Do we have an obligation to keep data in its original form and format, respecting the media that it was originally hosted in, do we have an obligation to retain the information contained in the data, or is the data disposable and only the effect of the data relevant ?
In many ways the existence of the institutions of ‘the museum’ and ‘the library’ answer this question from our ancestor’s point of view. Certainly in the UK, philanthropists saw the advancement of science and the education of the masses as a moral obligation, but what about the curation of data for historical rather than scientific purposes ?
I think that we can say that retained samples are a valuable weapon in the arsenal of scientific endeavour, without much doubt. Whether it be new species of animal in today’s world, samples of pathogens lost to science or the ability to re-examine old specimens with new techniques, a library of original, physical sample material is an essential part of science, but what about non-corporeal data ?
Audiophiles still see the crackle and hiss embodied in vinyl recordings as adding character and being more authentic than ‘clean’ digital renditions, but to me this would imply that the recording artist, as the author of their own material, would want a degraded recording. I really don’t think that’s true. If I were a musician I would want my recordings to be heard as played, not as recorded. But then the experience of listening to music is not the same as the experience of playing it, so a direct equivalence between the data as recorded and the data as experienced is going to be a tricky one and probably something to consider another day.
Can you imagine the curator of a museum in 1,000 years time carefully handling the mix tape that you made for wassername in ’88 ? But why not ? Its an excellent piece of social history communicating universal feelings and allowing later generations to connect with past generations on an emotional level. No different from a birthday party invitation sent 2,000 years ago or love letter written 4,000 years ago. But that assumes that the data can be read in hundreds of years time.
The non-corporeality of digital data does hide degradation introduced by copying and by natural stochastic processes, such as cosmic rays hitting storage media or radiometric decay. We should not consider data stored on digital media such as magnetic tapes, CDROMs or laserdisks as immune to degradation, indeed they are more prone to damage than paper in many circumstances. That’s not as surprising as it may at first seem since we have 4,000 years of paper technology under our collective belts, but less than 100 of electronic recording and experience using plastics. In my own lifetime I have seen data storage formats become obsolete (and so have you), but that’s not even considering MIME types.
Every new start-up seems to define their files in a new way. This is 100% understandable in the context of intellectual property rights and the advance of technology, but it also means that the failure of each of these companies will consign their file type to the rubbish bin. Effectively we are spawning and killing a new ‘language’ each time this happens and any data recorded in this language will need either translation into more common languages or the preservation of a Rosetta Stone for as long as that data might be preserved. In this context the decadal lifespan of CDROM and magnetic tapes starts to seem like a long-term issue and the churn of data formats the overwhealming problem. Since we cannot ethically restrict the proliferation of new languages, the best that we can hope for is that file types are translatable. Unfortunately translations almost always result in loss of data fidelity.
So the physical nature of storage media for digital data is less of a defining factor than we might at first believe, but before we reject its potential impacts there is the question of whether the original physical items need to be curated in the same way as any museum piece. Should we collect one of everything just so that the next generation has access to it in something close to context ? This question I will leave for the historians.
The non-corporeality of data may impact more through the potential for high-fidelity copying to multiple locations and this feeds neatly into out next topic;
Transmissibility – Not a new feature of data, after all the telegraph was transmitting non-corporeal data around the world in the 19th century, but perhaps I mean the ability to provide photo-realistic reproduction to anywhere almost instantly.
If we assume that we are not going to get significant data degradation through each copy (i.e. that the fidelity is retained) or that the original reference copy is still available, data becomes a commodity available on-demand. Our data networks make that possibility a reality, whether it be via broadcast media, sharing or sale through websites or via ‘direct’ network connections such as FTP or PPTP.
Generally we do not even think about whether we are receiving what is being sent, unless there is some obvious fault. Our technologies are reliable enough that, at the point of consumption, we consider the data as a good representation of the original. That is not to say that checks on data fidelity don’t happen in the background with most transmission mechanisms, they do, and a significant portion of bandwidth used on the internet is devoted to comparing sent to received.
The question then becomes why are we happy to have copies of original data many times removed from the original when the original (or a much closer removal) is available ?
It is often said that we live in a media age, but most would consider that to refer to the number and availability of media channels, when in fact the most pervasive mediated experiences that we have is with the data that makes everyday life possible.
Just to give an example; even 10 years ago I wouldn’t have dreamed of money as a mediated experience, yet in most senses it is just that. Very little cash actually flows through my wallet these days. I rarely visit a bank. I trust all those data transmissions that are running in the background to provide a very real outcome like food on the table. But why should I ? Well the honest truth in this case is that I don’t trust digital money any more than I trust hard cash. Both are copyable, both are stealable. Both are mediated experiences of wealth. To me they are no different, so I have no philosophical problem in a cashless economy. But that’s not the same as the ethics of error checking.
Engineers strive for high fidelity data transmission. It is a matter of honour and professional pride. Depending on the application it can be a matter of life an death.
Bankers (should) strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the customer it can be a matter of life an death.
Journalists should strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the story it could be a matter of life an death.
But past those three professions, high fidelity data transmission is mostly an aesthetic choice but not universally an ethical one. That’s why we have security certificates, passwords, ID checks and all that apparatus that at first sight looks Orwellian but, when you understand what it is compensating for, is much more depressing. Its plugging the ethics gap between personal and professional.
Is it right that we load our communications technologies sending multiple copies of data rather than consigning a single authoritative item of data to a secure store and simply reference it from there ? Computer programmers working in groups do this already using applications collectively known as Version Control Software. Could we create a Version Controlled repository for all human knowledge ?
This brings us to digital memories as editable and erasable records;
In the Version Controlled world nothing is ever deleted without consensus. Edits are recorded so that if some piece of data was to be found to lead to a dead-end you can back-track to the last useful data and try a different approach.
But what happens if a particular thread of knowledge leads to disaster/evil/daytime TV or whatever, what then ?
Without infinitely reproducible data a society can choose to forget. If data is infinitely reproducible we have to assume that a copy exists somewhere, even if it is only in a router cache that is not immediately accessible. If the potential exists for a copy to resurface, then ethically we have to consider that it will. Forgetting is not a choice that is practically open to us in the massively networked digital world.
Forget is the wrong word in some of these cases, ‘consign to the past and move on whilst retaining full knowledge’ could be a better way of putting it. For example the Truth and Reconciliation commissions in Rwanda and SA, are a way of accommodating of unpalatable facts until they fade a little before merging with the background of history. Anti-Nazi laws in Germany are there to provide several generations space from the shared horror of WW2 and they will not be seen to be successful until at least 2050, when the children of Nazis have died, so providing a removal from the first person experience.
But there are problems with the version controlled world where threads of knowledge are prioritised and the flow of history consciously re-routed.
At a personal level we loose our sense of ourselves and our innate ability to put things behind us, and even the essential personal liberty of simply growing up. Lets take a few examples to illustrate what I’m getting at;
Criminal offenses committed by persons under a certain age are usually dealt with differently to those committed by adults. The dividing line between child and adult is different under different legal systems, but it is present in the vast majority of cases. What is also present in most cases is a statute of limitations which means that offenses are deemed to have no relevance under law after a certain amount of time. The convergence of these two legal principles usually means that offenses committed as a child will be wiped ‘off the record’ after a relatively short period and the child allowed to go on with life as a reformed character having learned its lesson.
I see no reason why data should be treated any differently, yet if a copy of an old web page surfaces that contains embarrassing, or even harmful, personal information any person can act on it. The trope about ‘nothing to hide’ is idiotic. Everyone has something that they would rather wasn’t repeated ad nauseum in public whether it be bad fashion choices, awkward breakups, financial embarrassment or even a physical blemish. There is no small amount of debate on this, but one of the most interesting recent stories is this on from BBC News. How can Facebook ban you from deleting yourself from their platform ? To me this is a great piece of social commentary art.
Anyway. Too long. Move on.
Veracity Values Redux
September 5, 2009
The redoubtable Wikipedia has started the ball rolling on visualising veracity. Its not in the wild yet, but they have started testing.
Reported in Wired, NowPublic, and with tongue in cheek here.
Here is a search result from Wikipedia all about just how complicated it actually is to do this and a discussion on the different approaches that you might make at this kind of thing.
I’m glad that Wikipedia has made a move on this. Looking at the way that they are approaching it the algorithm looks relatively simple, at least nothing like as complex as it could be, but I suppose the issue for a non-profit is external cost more than anything. The reason that I say its relatively simple is because it only looks at internal data to come to its assessment of how much you can trust the information on a page. I don’t like that by the way. Trust is not what you want in data. Verifiability is what you want in data. Everything else is faith.
Reading the Wired article I think that I agree with the researcher from Palo Alto who said that normal readership would probably find it distracting. My idea would see veracity as much more like a security certificate that you could investigate if you wanted, but would otherwise be just an icon somewhere on the browser (or whatever is being used at the time).
So taking it to the next level, where external data sources are checked rather than just counted as this implementation seems to do, and you need to run it under a different business model. Until all data is open access, the scientific publishing hegemony is broken and no bad people exist checking that data is true, or even verifiable, will cost money. Last time I looked a single scientific paper costs about $30. At that price its no wonder a professional class exists to exclude mass access to information. When I do research I might look at 100 papers, properly read 20 or so and need to read more than the abstract of 50 to see if they are relevant. That’s a hell of a commercial barrier to entry. $3000 to properly research a note that no-one may read. That’s the sort of return that most speculators would balk at, hence academia. Or at least old model academia, where research topics were chosen by the researcher just because they want to know more about something.
So the moral to the story is – if you want to really kick off the knowledge economy using the new economics of data proliferation break up the scientific publishing houses, or at least force them to open their vaults a chink.
Ethical Data Mining
June 14, 2009
I’m starting to think about where the line should be drawn when data mining web content, specifically content provided by individuals. I almost said private individuals there, but if you are posting on a publicly visible blog, comments board or whatever, then by definition private is no longer applicable. Or is it ?
What is the difference between government agencies putting together a profile on me from my electronic footprint and me doing it on someone else as part of a scientific research project, or indeed a third party doing it for commercial reasons (thinking of Phorm) ? What are the methodological and ethical differences ? Are there any ?
There are a couple that leap to mind with regards to government responsibilities and accountability, but I’m still thinking about this so no conclusions yet.
