The Ethics of Data Mining – Mediated Memory
January 9, 2010
I’m going to pose myself questions questions here rather than answer them. Self-indulgent I know, but hell its my party and I’ll cry if I want to
Digital data has some properties that could or should impact on ethics. I’m going to take a look at three of them this time;
It is non-corporeal, so possibly not as susceptible to the ravages of time as paper would be
It is transmissible, so probably not subject to physical location
It is a record of events that may be edited or erased leaving little or no evidence of those actions
The first two points are similar, in that they relate to inadvertent data loss, but relate to ethics in very different ways. The third is a very different quality.
Non-corporeality – The onwards march of time and technology makes specific media obsolete. That is as true for spoken language as it is for other media formats, but the loss of spoken languages is a large enough topic for a post on its own, so I’ll stick to physical media.
Ask yourself – ‘When did I last buy a 35mm photographic film or a 90 minute audio cassette ?”
For myself, it’d have to a decade or more, and I owned a 35mm SLR camera until 3 years ago ! I just kept a stock of old film in a box in the fridge up to its use by date, and past in some cases. Now as I climb the technology ladder I have my music CDs as reference, but don’t need to touch them as my music is transferred from device to device with no apparent loss of quality. So long as I make those steps up the ladder while technology exists to bridge the gaps no data is lost. So here comes the first question.
Do we have an obligation to keep data in its original form and format, respecting the media that it was originally hosted in, do we have an obligation to retain the information contained in the data, or is the data disposable and only the effect of the data relevant ?
In many ways the existence of the institutions of ‘the museum’ and ‘the library’ answer this question from our ancestor’s point of view. Certainly in the UK, philanthropists saw the advancement of science and the education of the masses as a moral obligation, but what about the curation of data for historical rather than scientific purposes ?
I think that we can say that retained samples are a valuable weapon in the arsenal of scientific endeavour, without much doubt. Whether it be new species of animal in today’s world, samples of pathogens lost to science or the ability to re-examine old specimens with new techniques, a library of original, physical sample material is an essential part of science, but what about non-corporeal data ?
Audiophiles still see the crackle and hiss embodied in vinyl recordings as adding character and being more authentic than ‘clean’ digital renditions, but to me this would imply that the recording artist, as the author of their own material, would want a degraded recording. I really don’t think that’s true. If I were a musician I would want my recordings to be heard as played, not as recorded. But then the experience of listening to music is not the same as the experience of playing it, so a direct equivalence between the data as recorded and the data as experienced is going to be a tricky one and probably something to consider another day.
Can you imagine the curator of a museum in 1,000 years time carefully handling the mix tape that you made for wassername in ’88 ? But why not ? Its an excellent piece of social history communicating universal feelings and allowing later generations to connect with past generations on an emotional level. No different from a birthday party invitation sent 2,000 years ago or love letter written 4,000 years ago. But that assumes that the data can be read in hundreds of years time.
The non-corporeality of digital data does hide degradation introduced by copying and by natural stochastic processes, such as cosmic rays hitting storage media or radiometric decay. We should not consider data stored on digital media such as magnetic tapes, CDROMs or laserdisks as immune to degradation, indeed they are more prone to damage than paper in many circumstances. That’s not as surprising as it may at first seem since we have 4,000 years of paper technology under our collective belts, but less than 100 of electronic recording and experience using plastics. In my own lifetime I have seen data storage formats become obsolete (and so have you), but that’s not even considering MIME types.
Every new start-up seems to define their files in a new way. This is 100% understandable in the context of intellectual property rights and the advance of technology, but it also means that the failure of each of these companies will consign their file type to the rubbish bin. Effectively we are spawning and killing a new ‘language’ each time this happens and any data recorded in this language will need either translation into more common languages or the preservation of a Rosetta Stone for as long as that data might be preserved. In this context the decadal lifespan of CDROM and magnetic tapes starts to seem like a long-term issue and the churn of data formats the overwhealming problem. Since we cannot ethically restrict the proliferation of new languages, the best that we can hope for is that file types are translatable. Unfortunately translations almost always result in loss of data fidelity.
So the physical nature of storage media for digital data is less of a defining factor than we might at first believe, but before we reject its potential impacts there is the question of whether the original physical items need to be curated in the same way as any museum piece. Should we collect one of everything just so that the next generation has access to it in something close to context ? This question I will leave for the historians.
The non-corporeality of data may impact more through the potential for high-fidelity copying to multiple locations and this feeds neatly into out next topic;
Transmissibility – Not a new feature of data, after all the telegraph was transmitting non-corporeal data around the world in the 19th century, but perhaps I mean the ability to provide photo-realistic reproduction to anywhere almost instantly.
If we assume that we are not going to get significant data degradation through each copy (i.e. that the fidelity is retained) or that the original reference copy is still available, data becomes a commodity available on-demand. Our data networks make that possibility a reality, whether it be via broadcast media, sharing or sale through websites or via ‘direct’ network connections such as FTP or PPTP.
Generally we do not even think about whether we are receiving what is being sent, unless there is some obvious fault. Our technologies are reliable enough that, at the point of consumption, we consider the data as a good representation of the original. That is not to say that checks on data fidelity don’t happen in the background with most transmission mechanisms, they do, and a significant portion of bandwidth used on the internet is devoted to comparing sent to received.
The question then becomes why are we happy to have copies of original data many times removed from the original when the original (or a much closer removal) is available ?
It is often said that we live in a media age, but most would consider that to refer to the number and availability of media channels, when in fact the most pervasive mediated experiences that we have is with the data that makes everyday life possible.
Just to give an example; even 10 years ago I wouldn’t have dreamed of money as a mediated experience, yet in most senses it is just that. Very little cash actually flows through my wallet these days. I rarely visit a bank. I trust all those data transmissions that are running in the background to provide a very real outcome like food on the table. But why should I ? Well the honest truth in this case is that I don’t trust digital money any more than I trust hard cash. Both are copyable, both are stealable. Both are mediated experiences of wealth. To me they are no different, so I have no philosophical problem in a cashless economy. But that’s not the same as the ethics of error checking.
Engineers strive for high fidelity data transmission. It is a matter of honour and professional pride. Depending on the application it can be a matter of life an death.
Bankers (should) strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the customer it can be a matter of life an death.
Journalists should strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the story it could be a matter of life an death.
But past those three professions, high fidelity data transmission is mostly an aesthetic choice but not universally an ethical one. That’s why we have security certificates, passwords, ID checks and all that apparatus that at first sight looks Orwellian but, when you understand what it is compensating for, is much more depressing. Its plugging the ethics gap between personal and professional.
Is it right that we load our communications technologies sending multiple copies of data rather than consigning a single authoritative item of data to a secure store and simply reference it from there ? Computer programmers working in groups do this already using applications collectively known as Version Control Software. Could we create a Version Controlled repository for all human knowledge ?
This brings us to digital memories as editable and erasable records;
In the Version Controlled world nothing is ever deleted without consensus. Edits are recorded so that if some piece of data was to be found to lead to a dead-end you can back-track to the last useful data and try a different approach.
But what happens if a particular thread of knowledge leads to disaster/evil/daytime TV or whatever, what then ?
Without infinitely reproducible data a society can choose to forget. If data is infinitely reproducible we have to assume that a copy exists somewhere, even if it is only in a router cache that is not immediately accessible. If the potential exists for a copy to resurface, then ethically we have to consider that it will. Forgetting is not a choice that is practically open to us in the massively networked digital world.
Forget is the wrong word in some of these cases, ‘consign to the past and move on whilst retaining full knowledge’ could be a better way of putting it. For example the Truth and Reconciliation commissions in Rwanda and SA, are a way of accommodating of unpalatable facts until they fade a little before merging with the background of history. Anti-Nazi laws in Germany are there to provide several generations space from the shared horror of WW2 and they will not be seen to be successful until at least 2050, when the children of Nazis have died, so providing a removal from the first person experience.
But there are problems with the version controlled world where threads of knowledge are prioritised and the flow of history consciously re-routed.
At a personal level we loose our sense of ourselves and our innate ability to put things behind us, and even the essential personal liberty of simply growing up. Lets take a few examples to illustrate what I’m getting at;
Criminal offenses committed by persons under a certain age are usually dealt with differently to those committed by adults. The dividing line between child and adult is different under different legal systems, but it is present in the vast majority of cases. What is also present in most cases is a statute of limitations which means that offenses are deemed to have no relevance under law after a certain amount of time. The convergence of these two legal principles usually means that offenses committed as a child will be wiped ‘off the record’ after a relatively short period and the child allowed to go on with life as a reformed character having learned its lesson.
I see no reason why data should be treated any differently, yet if a copy of an old web page surfaces that contains embarrassing, or even harmful, personal information any person can act on it. The trope about ‘nothing to hide’ is idiotic. Everyone has something that they would rather wasn’t repeated ad nauseum in public whether it be bad fashion choices, awkward breakups, financial embarrassment or even a physical blemish. There is no small amount of debate on this, but one of the most interesting recent stories is this on from BBC News. How can Facebook ban you from deleting yourself from their platform ? To me this is a great piece of social commentary art.
Anyway. Too long. Move on.

May 20, 2010 at 11:40 pm
Thanks so much for the article.Really thank you! Much obliged.