Log in

Quizzing the Anonymous
Ignoramus et ignorabimus
Why aren’t we Teletubbies? Part 1 
3rd-Feb-2013 03:46 pm
In the early 2000s, BBC aired a TV series called Teletubbies. They were toddler-like creatures sporting TV antennas on their heads and screens in their bellies. The aliens communicated by producing unintelligible cooing sounds (“Eh-oh”) and pressing fingers on their bellies to display videos to each other.

Teletubbies were the exact opposite of humans. Our visual cues are as primitive as their infantile babbling, while our speech serves as the main way of communication. Why are not we Teletubbies? Wouldn’t flashing images be a superior way of communication?

Judging by our art, this is not the case. The least expressive artistic medium is 3D sculpture, despite being the most realistic. Art galleries and museums are full of people admiring 2D objects, but this is not what they do on every hour of every day, and such images can only be understood as part of a broader narrative telling how such objects needs to be perceived. The most expressive means are 1D narratives. One can point to the movies and TV as example of 2D progression in time, but these are illustrated stories that began as scripts.

There is another telling example: life. Our traits are written in 1D code. This code is translated into elaborate 3D objects, like a script is translated into a theatrical production. In this 3D production, part of the show is writing a 1D script telling how the next show is to be staged. There is no a priori reason for the genetic carrier to be 1D. One can argue that 3D scripts would be physically difficult to handle in 3D space, but 2D matrix is readily accessible at any point.

Even our own 2D storage devices (hard drives, CDs, barcodes) are only nominally 2D; they retrieve data as 1D bit streams. The reason is the necessity of error correction. There's little to the art of such correction, if you are willing to proof read and have multiple copies of the data, but that sacrifices speed of communication/transmission and storage space. If those are of little concern, you get a Jewish scribe slowly and faithfully reproducing the word of the Bible on the parchment. The problem emerges when the speed and space ARE the concern. The art begins when one needs to minimize the space dedicated to check bits while maximizing the transmission rate without sacrificing the fidelity.

An average CD corrects for random errors of 2 bytes per 32 byte block; more importantly, it corrects for burst errors of up to 4 kB in length; such clustered errors are caused by scratches on the disk. Due to the general preponderance of such errors, there is strong incentive to interleave data, so the burst errors appear as random errors spread over many blocks (that can be individually corrected). A lot of buffering and processing memory goes into this error correction operation. 2D data (say, in bar codes) are transformed into 1D bit streams and then error corrected using Reed-Solomon coding.

Message symbols become coefficients of a polynomial, the latter is multiplied by a specially constructed polynomial, and Galois field properties of such polynomials are exploited in a clever way, so that about 6-10% extra bits are sufficient to detect random errors. All of our electronic devices are based on this general approach invented in the 1960s. 2D data need to be processed as 1D streams, as the error correcting methods are inherently 1D; it does not matter how the data are stored or retrieved. This is what you need to do if you are seriously concerned about the accuracy and speed of communication.

Replication of DNA does not work like that at all, although it shares many of the same concerns. The approach is not clever error correction methods applied to a chunk of data; it is the Jewish scribe approach of careful proof reading and mismatch repair based on the complementarity of the two strands in the double helix (but doing it rapidly). If there is a double strand break, the damage cannot be repaired unless there are multiple copies. Maintaining such copies is an expensive thing to do. Only desperados leaving in harsh environments go to such extremes

Cellular machinery is designed to prevent the occurrence of such irreparable damage rather than repairing this damage after it occurred. There is little redundancy; no check bits. A bacterium cannot spare even 10% of its DNA for error correction, as it would squander vital resources that can be used for (almost accurate) replication. A certain fraction of mutations is tolerated. Neither can the bacterium implement anything like Reed-Solomon coding to minimize the amount of check bits to this comfortably low percentage; so the extra space dedicated to no productive use would be much larger. It is not worth it. Life of a single bacterium is worthless anyway.

We are not bacteria; we can afford to be wasteful, but our code is inherited from the creatures that cannot afford such luxuries. Nature’s way of dealing with burst errors is by not dealing with them at all; the organism is not viable, and such errors do not propagate. The results of the two approaches are comparable: the fidelity of DNA replication and digital transmission is about the same, about 1e-9. There are errors in DNA replication, but they come in well-defined categories, such as nucleotide replacement errors, insertions/deletions, frame shifts and slippages, duplications. There are only so many types of errors occurring in 1D system with built-in complementarity, while many more possibilities exist in a 2D system (faults, dislocations, etc.). My feeling is that 2D coding would be impossible without buffered-memory and block error-correcting codes, which is, in turn, impossible to evolve in a step-by-step fashion, while 1D replication of the observed type can be.

There is another, even more important concern: viral invaders. If you only have 1D storage, they can only incorporate themselves in a few ways (they cannot extensively disperse themselves) and so can be potentially recognized and dealt with. Just think in how many ways 1D sequence (without even cutting itself into smaller pieces) can incorporate itself into a 2D matrix! The only way to intercept such invaders would be before they insert, as otherwise their detection would require block error correction.

Even in 1D this is a major problem. One way of dealing with it (observed in Tetrahymena protozoa) is, once again, having two copies of the genome. RNA copy migrates from one nucleus to another; if it finds a perfect match it self-destroys. If it does not, it means alien DNA is present and the RNA copy survives; during the replication it destroys the DNA strand through RNAi-like mechanism.

However, such tricks are generally not worth pursuing, and eukaryotic genomes are overrun with garbage: it is easier to copy all this stuff (and find uses for it) then going through the Herculean ordeal of excising viral DNA. That is to say that at the molecular level Teletubbies are VERY unlikely.

I can generalize and claim that Teletubbies are unlikely in any situation when high-fidelity communication is required when errors and interference are likely to occur.

This explains why we are not Teletubbies in our general design, but this rationale does quite explain why we are not communicating like Teletubbies. Indeed, our language has no error correction features that are the mark of high-fidelity systems. It is simply not designed for reliable transmission of information. In fact, it seems to be designed for the least reliable transmission of information.

In the next post I will argue that we are not Teletubbies specifically for this reason.

4th-Feb-2013 03:44 pm (UTC)
1. Читая науч.лит, всё время встречаешь предложения: "гены\ДНК кодируют", "информация от органов чувств", и т.п. Меня интересует вопрос - в каком смысле понимаются эти выражения??? Кмк, термины "код" и "информация" имеют строгий смысл только в рамках информатики, т.е. формализма, созданного для описания создания\хранения\передачи\приёма информации, трактуемой, как набор(ы) знаков из предзаданного алфавита. За пределами информатики, слова "кодирует" и "информация" можно понимать, кмк, только метафорически, как обороты речи. В природе нет никаких "кодов" и никакой "информации" в вышеуказанном смысле, а есть многосложные иерархические био\хим\физические реакции. Я, конечно, чайник обыкновенный, но, вот Иванов_Петров тоже, вроде бы выражает сомнение по этому поводу ivanov-petrov.livejournal.com/1801688.html?thread=91931864#t91931864. Я понимаю, что и био\хим\физика - это тоже наши теории\формализации, но "этажом пониже" и действующие по другим "законам". Почему меня это вообще интересует??? В околофилософских дискуссиях в ЖЖ довольно часто я сталкиваюсь с теориями, постулирующими независимое, субстанциональное существование информации - дескать, это и не материя, и не идеальное, это информация. Последний(?) большой проект такого рода - концепция трёх миров Поппера и примкнувшего к нему Экклса (John Carew Eccles), постулирующая независимое существование "объективного содержания мышления (сюда входят содержание научных гипотез, литературные произведения и другие независящие от субъективного восприятия объекты)". Сейчас, в связи с общим бумом вокруг философии сознания, интерес к этой концепции снова реанимировался. Это сильно напоминает платонизм, а ещё больше - проблему онтологического статуса универсалий в ср.вековой схоластике. Моя позиция - "материалистическая" ("материальное" понимается, как пара\оппозиция к "идеальному", или же, что то же самое, "объективная реальность"\"субъективная реальность") - информация, коды - идеальные объекты, существующие только в идеальном\психическом, вне которого существуют только био\хим\физические процессы и материальные объекты, созданные человеком как носители информации, которые, однако, будучи носителями, не содержат информации как таковой, но способны порождать её (как идеальные\психические состояния) в другом идеальном\психическом. Эти объекты\их поведение мы описываем в терминах информатики исключительно удобства для, не придавая информации никакого самостоятельного существования вне категорий материальное\идеальное. Хотелось бы узнать, в каком смысле вы употребляете эти термины в вашем посте???
2. Примерно то же самое относится и к размерности - ведь в природе нет 1Д и 2Д объектов (?), все объекты 3Д (по крайней мере на био\хим\физ. макроуровне?). Эрго, все описания с 1Д и 2Д - это наши идеализации, формализмы, удобные для моделирования определённых свойств - в данном случае именно информационных свойств системы, понятой опять-таки в терминах информатики: создание\хранение\передача\считывание и т.д. Данные формализмы содержат алгоритмы выявления и исправления ошибок, но они не содержат и не могут содержать мат.описания самого появления ошибок - как и почему появляются ошибки. Это уже не область информатики, которой в общем всё равно, передавать "Гамлета" или бессмысленный набор знаков того же объёма, это уже область физики, имеющей дело с "реальными" 3Д объектами и описывающей сбои при "считывании кода". С этой, физической т.зрения, кмк, 3Д организация "информации" намного надёжнее 1Д организации, как надёжней ключ с 3 бородками(шифрами) против одной(одного шифра). Объект 1Д просто не будет иметь смысла\входа для\в такой организации, у него не будет шансов "включиться в игру". Другое дело, возможна ли такая организация с физической\биологической т.зрения???
5th-Feb-2013 04:29 am (UTC)
1. It does not matter. Call it a blueprint if you prefer. DNA needs to be replicated with high fidelity.

2. That also does not matter. DNA is a polymer. It is inherently linear organization of codons, it is translated by machine sliding on the helix, etc. Barcodes are 3D too but that does not matter because the scanner ignores such detail. Of course, DNA "knows" the main errors because the typical ones are due to the thermodynamics of base pairing which produces mismatches.
This page was loaded Sep 4th 2015, 6:41 pm GMT.