I’ve written about Newtonian science and the simple cause and effect interpretation of the physical universe that it embodies, and how the mathematics of complexity and statistical interpretations of the physical universe, such as quantum mechanics, have superseded that mechanistic view. What I would like to suggest is that neuroscience is treading the same path in its interpretation of the mind as a mechanistic system demonstrable by a physical understanding of brain function. This is not a new idea as far as I know, but I need to ‘say it out loud’ to show myself that I know where the limits lie.

I’ve been thinking about whether I believe that the mind could be replaced by the internet, and I think ‘no, but there could be functions that could be farmed out such as memory‘. Here I’m going to explore that idea with specific reference to the blossoming fields of neuroscience, neuromarketing, neuroethics and neuro-anything-else-they-can-think-of.

The mathematics of complexity and uncertainty, chaos theory, complex systems, or however else we wish to express the concept, all share the same fundamental tenet; that simple mathematical relationships can give in unpredictable results. This is shown by the Lorentz’s Butterfly Effect, but it it is also embodied in Godel’s incompleteness theorem and the Schrodinger’s Cat thought experiment. To me these are all similar aspects of the same idea; that we can never measure all variables sufficiently well to be able to have a 100% reliable model.

If we transpose this notion to the pursuit of neuroscience, where claims are being made almost to the point of being able to read the minds of experimental subjects, we should at least consider the nature of the systems involved before we accept the validity or even applicability these kind of claims.

Brain structure is primarily a function of the expression of DNA of the individual and DNA as a replicator a very mechanistic and ultimately predictable system. I say this because genes are quite simple. Their complexity is in their size, not their building blocks. That the Human Genome Project was able to sequence our genes using automated techniques suggests that a mechanistic approach to reading that material is appropriate. However once the brain has started to develop neuronal connectivity in response to stimuli (memories start to be formed in response to experience), that mechanistic interpretation is no longer applicable without having a set of meta-data that shows the context under which those connections are made.

What neuroscientists are doing using fMRI is establishing a work-book of that contextual meta-data under experimental conditions, and its very impressive that they are managing to exclude enough of the outside world to be able to see human responses such as lying and trust and jealousy in the data that they collect. I’m sure that, on average, they are seeing some functions of mind being expressed physically. BUT, what cannot, and indeed must not, be inferred from this work is that the responses from one individual’s brain can be directly equated to the responses of another individual’s brain.

We could go though a significant proportion of the human race, taking subjects from all walks of life and every corner of the globe and find average response curves for each chunk of the brain, but we would never be able to replicate the contextual meta-data to a fine enough resolution to be able to counter Godel’s incompleteness theorem as it applies to basic information or the individual’s brain development in response to experiences from its own unique viewpoint. The mechanistic interpretation of mind, that equates brain activity to mind function, breaks down under existing mathematical interpretations of the physical universe. We would need a whole new mathematics to be able to do what is currently being claimed for neuroscience. To be fair to the neuroscientists, many of them shy away from the grand claims, but enough are not that we see fMRI being cited in legal cases. Far from free will being dead and neuroscience proving a deterministic worldview, it is showing just how poor our quantitative understanding of mind really is.

This is not a new experience. Psychoanalysis promised an understanding of mind and motivation at the beginning and middle of the 20th century and arguably was the basis for the construction of the consumerist global economy. I wonder how far neuroscience will be pushed outside the lab.

Proponents see recent fMRI science as analogous to genetic fingerprinting; as a quantitative diagnostic tool. I would argue that it is more analogous to a form of psychoanalysis where interpretation is automated. In many of the new institutes and companies working with fMRI we see the objections to the wider application of fMRI-centered neuroscience being characterised as philosophical and relating to ideas of free will and determinism. I don’t see that as a valid or even relevant conflation. My counter-claim is that what is being claimed for neuroscience is not mathematically possible and that in ignoring the role of mathematical complexity scientists, lawmakers, economists and others are acting unethically. What is being seen is the brain and not the mind. That the brains responses are linked to the mind shouldn’t be a surprise but the simple Newtonian idea of cause and effect is not applicable where 100 billion neurons each have around 7,000 synapses many of which have been influenced by memory formation or physical conditions since, or even before, birth. Simply put, just because a specific cubic centimeter of grey matter demands extra blood flow in response to the same stimuli, it doesn’t mean its for the same reason.

If it is possible to mathematically model the mind, then it should be considered as a complex system inhabiting another complex system (the brain) and informed by a set of contextual meta-data (memories and experiences) as well as environmental stimuli. Divining motivation from brain activity is a step too far mathematically, but an approximation could be possible with a sufficiently large database to populate response curves with experimental data. Whether those response curves could provide useful predictive data can’t be known at this point, but what we can say with a good degree of certainty is that you’d need a large n-value to compensate for the free variables in two complex systems and the contextual meta-data.

We’ve been hearing a great deal about science in the media in the context of climate change and new energy sources lately, and the quality of some scientific work has been called into doubt, and there have been calls for an increased understanding of science to try and stop misrepresentation by the media, blah, blah, blah. This call for dialogue between the fields of arts and sciences is happening on more and more occasions as science gets more difficult and mass media becomes less patient. Anyone still remember CP Snow ? So why don’t we look at things a slightly different way ?

Science is media

That sounds a bit odd, but philosophically science is a mechanism by which we try to understand the physical reality that we inhabit and mass media (especially news journalism) is also a mechanism to help understand the world around us. Their methods are different but their core goals are the same – enhanced understanding of reality.

So lets look at some recent science through a media lens. In fact let’s get PoMo on its ass !

Marshall McLuhan in ‘The Medium is the Massage’ posits that you perform the message that you wish to communicate. It doesn’t matter if that is verbally, ethically, artistically, mathematically or physically, what you do and how you do it IS what you say. On the other side of the coin if your performance does not tie in with your message the audience undergoes cognitive dissonance and the message is garbled, contradictory and ineffective.
Strasberg’s Method Acting technique is a great example of this. The actor does everything in his power to become the character in order that his whole performance reflects the experience of that being, in so doing the words and the physical body perform as one and, hopefully, the role is played well. The actor doesn’t actually become the character, that would be impossible, but he will take on or construct every aspect of that character that he can discover.

So if we take the recent CRU email scandal (yes, scandal), we have a set of scientists who perform their science under the scientific method which involves openness, respect for others results and views, self-criticism, peer-review and data validation. Over the years they have told us ‘trust us, we’re the best, we do good science’, in effect we’re following the scientific method, and now we find out that their performance is not backed up by their method. We thought that we were seeing the real thing, or at least a good approximation of the real thing with the scientists suffering for their art, but we were sold a poor performance. A shallow frontage. Its like finding out that a character that Al Pacino plays never actually liked coffee but Pacino forced a re-write because he couldn’t go without his morning joe.

For the record and as a former scientist I find the actions of the CRU scientists abhorant, but human (I never lived up to my own view of what a scientist is, which is why no longer call myself one, though I still perform the role of scientific critic). For me the affair doesn’t detract from the credibility of climate science as a whole, but its disturbing that their performance was more Lee Majors than Lee Strasberg.
They need to get their method back.

I’m going to pose myself questions questions here rather than answer them. Self-indulgent I know, but hell its my party and I’ll cry if I want to ;)

Digital data has some properties that could or should impact on ethics. I’m going to take a look at three of them this time;
It is non-corporeal, so possibly not as susceptible to the ravages of time as paper would be
It is transmissible, so probably not subject to physical location
It is a record of events that may be edited or erased leaving little or no evidence of those actions

The first two points are similar, in that they relate to inadvertent data loss, but relate to ethics in very different ways. The third is a very different quality.

Non-corporeality – The onwards march of time and technology makes specific media obsolete. That is as true for spoken language as it is for other media formats, but the loss of spoken languages is a large enough topic for a post on its own, so I’ll stick to physical media.
Ask yourself – ‘When did I last buy a 35mm photographic film or a 90 minute audio cassette ?”
For myself, it’d have to a decade or more, and I owned a 35mm SLR camera until 3 years ago ! I just kept a stock of old film in a box in the fridge up to its use by date, and past in some cases. Now as I climb the technology ladder I have my music CDs as reference, but don’t need to touch them as my music is transferred from device to device with no apparent loss of quality. So long as I make those steps up the ladder while technology exists to bridge the gaps no data is lost. So here comes the first question.
Do we have an obligation to keep data in its original form and format, respecting the media that it was originally hosted in, do we have an obligation to retain the information contained in the data, or is the data disposable and only the effect of the data relevant ?

In many ways the existence of the institutions of ‘the museum’ and ‘the library’ answer this question from our ancestor’s point of view. Certainly in the UK, philanthropists saw the advancement of science and the education of the masses as a moral obligation, but what about the curation of data for historical rather than scientific purposes ?
I think that we can say that retained samples are a valuable weapon in the arsenal of scientific endeavour, without much doubt. Whether it be new species of animal in today’s world, samples of pathogens lost to science or the ability to re-examine old specimens with new techniques, a library of original, physical sample material is an essential part of science, but what about non-corporeal data ?

Audiophiles still see the crackle and hiss embodied in vinyl recordings as adding character and being more authentic than ‘clean’ digital renditions, but to me this would imply that the recording artist, as the author of their own material, would want a degraded recording. I really don’t think that’s true. If I were a musician I would want my recordings to be heard as played, not as recorded. But then the experience of listening to music is not the same as the experience of playing it, so a direct equivalence between the data as recorded and the data as experienced is going to be a tricky one and probably something to consider another day.

Can you imagine the curator of a museum in 1,000 years time carefully handling the mix tape that you made for wassername in ’88 ? But why not ? Its an excellent piece of social history communicating universal feelings and allowing later generations to connect with past generations on an emotional level. No different from a birthday party invitation sent 2,000 years ago or love letter written 4,000 years ago. But that assumes that the data can be read in hundreds of years time.

The non-corporeality of digital data does hide degradation introduced by copying and by natural stochastic processes, such as cosmic rays hitting storage media or radiometric decay. We should not consider data stored on digital media such as magnetic tapes, CDROMs or laserdisks as immune to degradation, indeed they are more prone to damage than paper in many circumstances. That’s not as surprising as it may at first seem since we have 4,000 years of paper technology under our collective belts, but less than 100 of electronic recording and experience using plastics. In my own lifetime I have seen data storage formats become obsolete (and so have you), but that’s not even considering MIME types.
Every new start-up seems to define their files in a new way. This is 100% understandable in the context of intellectual property rights and the advance of technology, but it also means that the failure of each of these companies will consign their file type to the rubbish bin. Effectively we are spawning and killing a new ‘language’ each time this happens and any data recorded in this language will need either translation into more common languages or the preservation of a Rosetta Stone for as long as that data might be preserved. In this context the decadal lifespan of CDROM and magnetic tapes starts to seem like a long-term issue and the churn of data formats the overwhealming problem. Since we cannot ethically restrict the proliferation of new languages, the best that we can hope for is that file types are translatable. Unfortunately translations almost always result in loss of data fidelity.

So the physical nature of storage media for digital data is less of a defining factor than we might at first believe, but before we reject its potential impacts there is the question of whether the original physical items need to be curated in the same way as any museum piece. Should we collect one of everything just so that the next generation has access to it in something close to context ? This question I will leave for the historians.

The non-corporeality of data may impact more through the potential for high-fidelity copying to multiple locations and this feeds neatly into out next topic;

Transmissibility – Not a new feature of data, after all the telegraph was transmitting non-corporeal data around the world in the 19th century, but perhaps I mean the ability to provide photo-realistic reproduction to anywhere almost instantly.

If we assume that we are not going to get significant data degradation through each copy (i.e. that the fidelity is retained) or that the original reference copy is still available, data becomes a commodity available on-demand. Our data networks make that possibility a reality, whether it be via broadcast media, sharing or sale through websites or via ‘direct’ network connections such as FTP or PPTP.
Generally we do not even think about whether we are receiving what is being sent, unless there is some obvious fault. Our technologies are reliable enough that, at the point of consumption, we consider the data as a good representation of the original. That is not to say that checks on data fidelity don’t happen in the background with most transmission mechanisms, they do, and a significant portion of bandwidth used on the internet is devoted to comparing sent to received.

The question then becomes why are we happy to have copies of original data many times removed from the original when the original (or a much closer removal) is available ?

It is often said that we live in a media age, but most would consider that to refer to the number and availability of media channels, when in fact the most pervasive mediated experiences that we have is with the data that makes everyday life possible.
Just to give an example; even 10 years ago I wouldn’t have dreamed of money as a mediated experience, yet in most senses it is just that. Very little cash actually flows through my wallet these days. I rarely visit a bank. I trust all those data transmissions that are running in the background to provide a very real outcome like food on the table. But why should I ? Well the honest truth in this case is that I don’t trust digital money any more than I trust hard cash. Both are copyable, both are stealable. Both are mediated experiences of wealth. To me they are no different, so I have no philosophical problem in a cashless economy. But that’s not the same as the ethics of error checking.

Engineers strive for high fidelity data transmission. It is a matter of honour and professional pride. Depending on the application it can be a matter of life an death.
Bankers (should) strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the customer it can be a matter of life an death.
Journalists should strive for high fidelity transactions. It should be a matter of honour and professional pride. Depending on the story it could be a matter of life an death.
But past those three professions, high fidelity data transmission is mostly an aesthetic choice but not universally an ethical one. That’s why we have security certificates, passwords, ID checks and all that apparatus that at first sight looks Orwellian but, when you understand what it is compensating for, is much more depressing. Its plugging the ethics gap between personal and professional.

Is it right that we load our communications technologies sending multiple copies of data rather than consigning a single authoritative item of data to a secure store and simply reference it from there ? Computer programmers working in groups do this already using applications collectively known as Version Control Software. Could we create a Version Controlled repository for all human knowledge ?

This brings us to digital memories as editable and erasable records;
In the Version Controlled world nothing is ever deleted without consensus. Edits are recorded so that if some piece of data was to be found to lead to a dead-end you can back-track to the last useful data and try a different approach.
But what happens if a particular thread of knowledge leads to disaster/evil/daytime TV or whatever, what then ?

Without infinitely reproducible data a society can choose to forget. If data is infinitely reproducible we have to assume that a copy exists somewhere, even if it is only in a router cache that is not immediately accessible. If the potential exists for a copy to resurface, then ethically we have to consider that it will. Forgetting is not a choice that is practically open to us in the massively networked digital world.

Forget is the wrong word in some of these cases, ‘consign to the past and move on whilst retaining full knowledge’ could be a better way of putting it. For example the Truth and Reconciliation commissions in Rwanda and SA, are a way of accommodating of unpalatable facts until they fade a little before merging with the background of history. Anti-Nazi laws in Germany are there to provide several generations space from the shared horror of WW2 and they will not be seen to be successful until at least 2050, when the children of Nazis have died, so providing a removal from the first person experience.

But there are problems with the version controlled world where threads of knowledge are prioritised and the flow of history consciously re-routed.
At a personal level we loose our sense of ourselves and our innate ability to put things behind us, and even the essential personal liberty of simply growing up. Lets take a few examples to illustrate what I’m getting at;
Criminal offenses committed by persons under a certain age are usually dealt with differently to those committed by adults. The dividing line between child and adult is different under different legal systems, but it is present in the vast majority of cases. What is also present in most cases is a statute of limitations which means that offenses are deemed to have no relevance under law after a certain amount of time. The convergence of these two legal principles usually means that offenses committed as a child will be wiped ‘off the record’ after a relatively short period and the child allowed to go on with life as a reformed character having learned its lesson.

I see no reason why data should be treated any differently, yet if a copy of an old web page surfaces that contains embarrassing, or even harmful, personal information any person can act on it. The trope about ‘nothing to hide’ is idiotic. Everyone has something that they would rather wasn’t repeated ad nauseum in public whether it be bad fashion choices, awkward breakups, financial embarrassment or even a physical blemish. There is no small amount of debate on this, but one of the most interesting recent stories is this on from BBC News. How can Facebook ban you from deleting yourself from their platform ? To me this is a great piece of social commentary art.

Anyway. Too long. Move on.

Ethical Data Mining

June 14, 2009

I’m starting to think about where the line should be drawn when data mining web content, specifically content provided by individuals. I almost said private individuals there, but if you are posting on a publicly visible blog, comments board or whatever, then by definition private is no longer applicable. Or is it ?

What is the difference between government agencies putting together a profile on me from my electronic footprint and me doing it on someone else as part of a scientific research project, or indeed a third party doing it for commercial reasons (thinking of Phorm) ? What are the methodological and ethical differences ? Are there any ?

There are a couple that leap to mind with regards to government responsibilities and accountability, but I’m still thinking about this so no conclusions yet.

Follow

Get every new post delivered to your Inbox.