The ENCODE results: let the marvelous orgy begin?

Scientists have been chewing over the mass release results of the ENCODE experiments for the past several days, and will doubtless continue to do so; while ID and creationist proponents hop up and down and proponents of completely non-intentional biological development scramble defensively in various ways and...

...and, frankly, at first I didn't see what the big deal was.

At first.


I'm going to assume for sake of relative brevity that anyone who is reading this page on the internet knows how to search the internet for pages to read, so if you don't know what I'm talking about, search around for "ENCODE", or click on our sidebar links for Proslogion (which tends to slant young earth creation) and/or Evolution News and Views (which tends to slant old earth creation)--you won't have to page around far to search them for discussion on it.

Very shortly, and somewhat oversimply, the project involved mapping genomes (I honestly don't recall if it was even the human genome or not, but that's irrelevant) for functionality. There are tons of technical results, but the big chewy points are (1) some scientists now claim that "genes" as classically understood turn out not to exist; and (2) 80% genomic functionality.


In regard to (1): so what? Chromosomal sequences still exist, the structures still exist, the units of the structures still exist. I guess the implication is that basic units are now larger than has previously been taught? Nucleotides are still nucleotides: a molecule of phosphor combined with either adenine, guanine, thymine or cytosine. A bunch of them used to make up a gene (which would then make proteins or protein clusters or RNA chains etc.); now a bunch of them make up a... "transcript"? Most of the transcripts are legible, some are currently not (for whatever reason or reasons). That used to be true about genes, too: some were genes, some were pseudogenes. Pseudogenes still exist but I guess they won't be called "genes" anymore (and there are fewer than most scientists were expecting).

So what's the paradigm shift?

Previously, genetic information was expected to be provided in a linear string, much like reading this sentence: there are phrases and clauses and words and they convey meaning but "etic informa" would be nonsense. In recent years scientists have been suspecting and steadily confirming that informational meaning in the genes isn't always linear, but is sometimes coded in multiple dimensions in the chromosome. A relatively simple example G would be O like reading the current sentence D and wondering D why there appears I to be D extra nonsense letters in IT. When actually there is a second interpretative protocol to the effect of "put together capital letters between phrases and clauses". But also more interpretative protocols allowing a reader to filter out the extra capital letters, and to disregard the final capitalized IT as being irrelevant, without which protocols the main sentence couldn't be read without confusion.

That's just one example; there are other types of multi-valent information coding in the genome, and not only do interpretative protocols have to exist to make them useful, protocols also have to exist to keep them from messing up each other. The ENCODE study found that these multi-valent information levels were so numerous and so prevalent that the standard expectation of reading a "gene" just doesn't work anymore.

Which is also connected to the unexpected result of 80% genomic functionality. That isn't an upper limit, either; that's just the lower limit based on conservative identification criteria. It's entirely possible and likely that functionality is higher but wasn't observed.

When the practical informational complexity of a system skyrockets enormously beyond the already significantly huge amount of practical informational complexity acknowledged to exist in the system, then that doesn't look good for theories about the systems developing by a string of many many many many non-intentional accidents and nothing more than those non-directed accidents.


To put it another way, calling a sequence a "gene" doesn't immediately get across the idea that there ought to be information in the sequence. And if there isn't information in a "gene" then it's no big deal. The sequence could be junk DNA or a "pseudo-gene".

But calling the basic sequences "transcripts" now, instantly implies that we ought to be nominally looking for information and that the original form of the sequence should be expected to contain information, so if we don't find information in a sequence we should be looking for new ways to interpret it as information (especially if the sequence seems part of an active area, which would indicate there must be protocols for discounting the apparent gibberish)--

--or we should regard the sequence as being broken. Not originally random noise accidentally generated to begin with. Broken information (whether broken accidentally or not).


Putting it another way, you can make a transcript of random radio signal noise from the sun--people actually do that for various purposes, mainly having to do with cryptographically coding information. But it would be bizarre to call the random radio signal noise itself a "transcript".

"Transcript" implies some kind of intentionality, or at least some kind of useful information: random radio noise isn't itself information until it's useful for coding and decoding information. But information is a main scientific forensic evidence for detecting intentional causality.

A blood stain from a dead person that spells out a recipe for baking a cake may have been written by a crazy person, and it might technically be possible (if unspeakably improbable) for it to have come about by a sequence of accidents. But a forensic investigator who sees the message is going to rule out "accident" pretty quickly.

Especially if the apparent gibberish in the recipe turns out to be a code for baking a blood pie, too.

Plus a different code of apparent gibberish for the addresses to send the pie to.

Plus a different code for instructions on what blood to use (namely the dead persons').

And a different code, made out of sequences of letters in areas that aren't in other ways apparently gibberish at all, for who exactly should be the one to deliver the blood pie.

And completely different sets of all those codes for the cake, too!


Now, admittedly, if a crime scene investigator brought such an orgy of information to his captain, the captain might start to suspect that the investigator was rather oversensitive to detecting information where it wasn't really there. But if the solution solved numerous problems of the apparent gibberish that was casting doubt on whether the appearance of a cake recipe drawn with the blood was real, that would start to look suspicious in favor of intentionality again.

If the CSI officer added, "Oh, did I mention little machines are reading the letters made of blood and are behaving in ways consonant with the instructions, including in regard to the apparently gibberish letters?--which is how we realized we ought to be looking for meanings in the gibberish?"...

...at that point (assuming the little machines demonstrably existed, of course), the debate about whether the blood pattern was an unintended series of accidents would be over.

Maybe the debate about whether the person died of unintentional causes, too.


Granted, that situation might be evidence that a Marvel Comics supervillain, not just a garden-variety sociopath, was on the loose! Or it might be evidence that a supervillain was tampering with a system set up by Mr. Fantastic or Dr. Strange (leading to results like a blood pie being sent to the victim's family, as well as a tasty healthy cake being made of reasonable ingredients elsewhere).

But that's a colorful illustration of what the ENCODE results amount to.


ID theorists, including creationists (of various sorts) who expect ID, can predict the existence of lots of information and even some broken information. The general expectation would be that apparent gibberish is actually information encoded in an unexpected way, or gibberish generated by accident after the original encoding. Even information added to a basic background of random gibberish would work; by the same principle in reverse, even a little information occurring in a basic background of random gibberish would be evidence for intentionality and thus for intelligent manipulation of the material. (Keeping in mind that structured patterns are not the same thing as information, although information may always require structural patterns of some kind. Seashell sorting from wave action on a beach is a structural pattern but not information per se, although a rational agent can produce information about those structural patterns, using those patterns as data.)

Non-ID theories aren't set up, and maybe can't be set up, to expect informational sequences, "transcripts", to be the fundamental expectation of what a string of particulate building blocks should be. Much less are they currently set up (and maybe can't be set up) to deal with multi-valent levels of informational encoding in the development of the supposedly non-intentionally developed systems.

The ENCODE results indicate that scientists should no longer regard the basic background of biological structure as random gibberish, with acknowledged but relatively small amounts of information to be explained somehow (whether by the same process of random gibberishing or by intentional design). Scientists instead should regard the basic background as ordered information, with a further expectation that any actually random gibberish is broken information--but also that apparently random gibberish is more likely to be coded information we haven't identified yet.

And that sure puts the scientific right of way in favor of ID theorists. It's a whoooooooole lot easier to explain the development of broken information by relatively small amounts of random accident (or even intentional tampering?!!?), than to explain the development of massive amounts of coherent multi-valent practically eventful information (information routinely put to practical use, or rarely used except in special case situations yet still for practical effect), by those same random accidents.


Whether that will still be the situation next week or month or year or decade or century, only more study can say.

Comments

Jason Pratt said…
Blogger now has a new dashboard and, presumably, updated internal code. Whether this means authors of posts will receive comment tracking without having to manually sign up for comment tracking (by posting a comment on their own posts...! {inhale!}) remains to be seen.

Rather than take a chance, though, I decided to register for comment tracking anyway. {g}

JRP
I hate it. I had gotten that a few months ago and had trouble and went back to the old one. They always makes stuff harder to use then call it "improvement."
What is it specifically that the naturalists are having so much trouble handling? I missed it, but then again Jason perhaps it is hidden in that long post :)

"(1) some scientists now claim that "genes" as classically understood turn out not to exist"

This just seems false, unless by "classically understood" you mean as described by Mendel, in which case we didn't need a new study to know this. (Just in case: we also know that the human genome is based on a composite sketch of just a few genomes, and there isn't really a single human genome, but a highly variable family of genomes).

We are not about to stop talking about genes, start codons, stop codons, etc.. This is madness.

The importance and prevalence of regulatory genes is not that surprising. In graduate school it is drilled into students how important such things are.

If quizzed before seeing these studies, I would have been completely agnostic about the percentage of genes that are regulatory in nature rather than protein-coding: if anything I would have predicted the majority of genes are regulatory, because that is how biology creates seemingly infinite complexity given a finite conserved alphabet: tweaking the genes that control other genes is at least as important as the genes that directly express phenotypes.

This is one of the elementary lessons of evolutionary developmental biology, so I might be forgiven here for balking at those who would say that the evolutionists have no idea how to think about this stuff. That strikes me as ridiculous and disingenuous (note Jason I'm not saying this about you, you seem to be somewhat innocently be following the currents being generated by these people).

What is qualitatively new here, specifically, that merits this theological spin? I see some numbers being added to a story biologists have known is true:
a) 'junk' DNA is probably often DNA we don't understand yet
b) there exist non protein coding stretches of DNA that are functional, and act as regulatory elements for other bits of DNA.
c) It is an extremely complicated regulatory network that we have just begun to crack with studies like the ENCODE project.
d) We have no idea whether these principles, gleaned from organisms after billions of years of evolution, apply to the first living things, or (most relevantly) their prebiotic ancestors.


Incidentally, I have never liked the focus some people put on "junk" DNA. E.g., Dawkins and the other skepto-bots were sort of mindless about it, displaying the kind of overconfidence that we find in the creationists we like to criticize.

Wikipedia's entry on noncoding DNA seems pretty good.
Good discussion here on the science.
Jason Pratt said…
Thanks, BDK. I was working on a follow-up article but I've gotten distracted by other projects.

{{We are not about to stop talking about genes, start codons, stop codons, etc.. This is madness.}}

It was madness being promoted by the people who had done the tests, who I gathered weren't talking about really stopping referring to genes. They were talking about changing the basic genome unit to the transcript, and obviously over-reaching on their rhetoric: but the switch itself has conceptual implications and those were what I was talking about.

Considering that the people who did the tests were regarded as typical professionals (and staunch neo-Darwinians), if they express surprise and that no one was expecting this much even basic activity, while they may be over-reacting their over-reaction does say something about an idea prevalent to some extent in the field: namely that structures built from random copy-errors will be mostly comprised of gibberish. Natural selection may periodically remove some of the gibberish, but insofar as it removes the gibberish (relatively) immediately, then by default that not only means most mutations wouldn't get passed on (which then doesn't explain the density of DNA generally across species), but it would also mean that there would have to be proportionately that many more beneficial copy-errors worth keeping than are indicated by the evidence.

(This is why neutralism was proposed as an important factor: the mutation neither helps nor hurts and so is evolutionarily neutral, but then there wouldn't be much chemical activity for it either except during cell reproduction. Also, how neutral mass using scads of resources could be considered evolutionarily neutral is a problem.)


{{[T]hat is how biology creates seemingly infinite complexity given a finite conserved alphabet: tweaking the genes that control other genes is at least as important as the genes that directly express phenotypes.}}

I'm pretty sure that one of the elementary lessons of evolutionary developmental biology (at least of the non-directed and non-designed neo-Darwinian gradualistic type), is that mutations are random copy-errors. Consequently, it has to be those copy-errors which are "creating" the nearly infinite effective complexity you're talking about. It isn't the regulatory genes which create the near infinite complexity of the regulatory genes (although such genes once in place do provide astounding flexibility for adaptation). It isn't even micro- and macro-level natural selection processes: those weed out ineffective or harmful complexity once the complexity is established.

Random copy-errors which tweak the genes which control other genes, would tend to break control of those other genes; not always in a way fatal to the cell or to the organism, but the more that the result doesn't fit into the biomechanical processes involved in producing biological effects, the less the mutation provides a beneficial increase in complexity. (It may provide a beneficial and otherwise neutral decrease in complexity of course. Which is great, but isn't the kind of evolution neo-Darwinian theory needs.)

JRP
Jason Pratt said…
Please let me hasten to add that my reply in hindsight looks more peckish than I really intended. {g} It's very common for proponents of b.e.t to put the processes in language that kind of elides past what the theory basically and functionally says is actually happening, and while I understand the desire to do so for simplicity, I can't help but feel sometimes that what was being said just wouldn't look as plausible if it was spelled out.

And that has relevance to the article. The numbers being put to the topic involve so much of an extent of confirmation of (or at least strong implications of) operational functionality, in such multi-valent fashions, that I wish proponents of non-ID theories would speak more plainly about what such theories propose to be actually happening to reach such results. The junk-DNA proponents (some of whom are strongly theistic by the way) seem more willing (on this topic) to talk about the underlying basic mechanism of structural change: random copy-errors.

I guess it rubbed me the wrong way that you were talking about "one of the elementary lessons of evolutionary developmental biology", when how you yourself phrased things ("biology creates") naturally sounds more plausible than what one of the most elementary factors of neo-Darwinian gradualism insists upon: random chemical process errors are what generated start codons, stop codons, regulatory genes that control other genes which create proteins by detailed and complex biomechanical coding processes thus producing phenotypes that effectively interact with micro- and macro-environments.

Not "biology" "creating" these things.

The junk-DNA proponents no doubt feel comfortable expecting random copy errors to produce mostly things that are not this--because that's what copy-errors would naturally and usually produce. If random copy-errors produced a little functionality, so long as it was functional enough, that seems plausible by means of the process.

The numbers being put to what scientists were already (yet relatively recently) learning, push the level of functional information waaaay over into the far majority of material. To the extent (and this was the main point of my article) that we ought to be now solidly expecting information as the basic unit of the genome.

But conceptually that fits a lot better with a theory of copy-errors producing noise and mistakes in an original informational structure, than with a theory that copy-errors are the origin of the informational structure. Copy-errors become the problem for an original informational structure, not the solution for an original informational structure.

JRP

Popular posts from this blog

How Many Children in Bethlehem Did Herod Kill?

The Bogus Gandhi Quote

Where did Jesus say "It is better to give than receive?"

Discussing Embryonic Stem Cell Research

Revamping and New Articles at the CADRE Site

Tillich, part 2: What does it mean to say "God is Being Itself?"

A Botched Abortion Shows the Lies of Pro-Choice Proponents

The Folded Napkin Legend

Exodus 22:18 - Are Followers of God to Kill Witches?

A Non-Biblical Historian Accepts the Key "Minimum Facts" Supporting Jesus' Resurrection