A reader, Eric, writes to ask about my post “Nature ‘Learns’ to Create Complexity? Think Again“:
Like Brendan, I work on developing computer models and software for scientific applications. His comments about models vs. reality were, of course, spot on. It did occur to me that there is one possible partial response to a point that was made. Brendan writes:
Worse, whatever is “learned” by that organism must be passed along to the children. So, unless the paper’s authors subscribe to some form of Lamarckism, which is exceedingly unlikely, without a clear inheritance path all the training value is lost.
A neo-Darwinist could respond with the same kind of switch that the synthesis used to rescue the original version of Darwinism. Darwinism originally did have a Lamarckian flavor to it (which is why it opposed and resisted Mendelian genetics). With the neo-Darwinian synthesis, they dropped the Lamarkian aspects, embraced genetics, and proposed that the gains were built in through genetic variations.
In a similar way, someone now could say that the “learning” isn’t Lamarkian. Rather, it is the genetic variations to GRNs that succeed in being passed on (in contrast to those that were weeded out). The correspondence with environmental conditions comes in the form of the natural selection filtering that comes after the fact of the genetic change.
Yet, if that is how it should be understood, the issues raised in the preceding paragraph about training “over time and space” are still relevant. The genetic “learning” would not be limited to the life of any single individual, but it still would be constrained to accumulating within a line of ancestry.
That would make the “learning” generational in nature. Instead of a single copy of a network collecting “millions of data sets,” one would have to look along an organism’s ancestral line to observe the inherited network receiveing a multitude of tweaks down through the successful generations.
At first glimpse, it might seem that this way of viewing it could succeed at making it work for the neo-Darwinist. However, it does come back to the details, such as the limited amount of successful tweaking that can take place in any single generation and the limited number of generations down to the present organism.
Quite simply, there does still seem to be a serious problem about accumulating the extent of learning that would be needed.
The other main thought I had was this. You cannot have a second time without having a first time. You cannot learn to use building blocks more effectively apart from first having building blocks that can be used somewhat effectively.
In my software, I know the value of modularization and being able to recombine components, but that doesn’t happen by accident. I wouldn’t be able to reuse composable parts if I did not first have such parts that allowed from recombination. Flexibility comes by design, and before one could have a flexible, adjustable, “educated” GRN, one would need first to have a GRN that from the start flexibly allows for “learning.” Otherwise, one could not get off the ground.
Even granting for the sake of argument that GRNs can learn across generations, how do we expect that a learning GRN can first come to exist?
Darwinists are forever pushing their present problems aside by building on the idea that whatever they need already existed earlier (without explanation), whether it is functional information or GRNs that can learn.
By the way, if you haven’t seen the movie Next, staring Nicholas Cage, I would suggest giving it a look. (It’s available for streaming on Netflix.)
It’s not a “great” movie in any sense, but it does do a very clever job at cinematically portraying what is essentially a visual allegory to the evolutionary perspective of a finding a winning path that has been filtered from among many other failed attempts. As with evolutionary thought, the attempts are temporally concurrent.
People who have seen this will more easily and more intuitively feel that such a strategy could work (so long as one does not delve into the messy details).
I appreciate Eric’s thoughts. I suppose the answer depends on how the paper’s authors envision the GRN learning. If it is no different than modification + selection that preserves those GRNs producing “better” phenotypes, then Eric’s point is sound. And he is correct in that case to point out that an entire ancestral chain matters. If the GRN learns dynamically (which I don’t believe was clear in the original paper), then inheritance remains a significant problem.
However, if Eric’s perspective is correct and a GRN “learns” through descent with modification, the analogy with Neural Networks breaks down. Neural Networks succeed by changing the entire network in response to massive data sets (along with some notion of correctness). For example, imagine two critters: The GRN in one “learns” to adapt part of the network while the other adapts the other part. They cannot share that information (unless you can specify a mechanism by which they can mate and successfully blend what both “learned”). And adaptation in one is made without respect for the adaptation in the other. NNs succeed because the entire network adapts as a unit; that is, each node learns how to respond to inputs based on how the entire network has faired. You cannot teach one node at a time or even part of the network at a time — the entire network responds. Training a NN one node at a time will, by definition, fail.
And, as Eric agrees, the quantity of data to succeed is massive. AlphaGo contained “millions” of connections across 12 (or 13?) layers (networks) in two different NNs. AlphaGo was originally fed roughly 30 million moves from 160,000 games, yet that was far from sufficient. It required more training by playing itself (I failed to find the number of moves evaluated during this phase, but I suspect it completed many more than 160,000 games). And then, to allow the search tree to run deep, they distributed its execution across over 2,000 CPUs (some of which were the specialized, high-performance CPUs used in video games). Even then, a small number of “hand crafted” rules lurked within (see here).
Comparing GRNs to NNs remains uninteresting given what we’ve learned about training NNs. If anything, what we’ve learned says that GRNs could not have been self-taught since they just don’t have access to the necessary data or a means by which to ensure the entire GRN benefits from the training.
The quantity of data needed to train an NN is proportional (though not linearly) to the number of connections within the network. The more connections a network contains, the more training data necessary.
I do not know how many nodes or connections a GRN contains, but, in a way, it is immaterial: If the number of connections is small, the network cannot “recognize” anything significant. If it is large, then the data requirements will rapidly exceed that available. A small, trainable network will be uninteresting and a large, interesting network will be untrainable. (I like that statement. I believe it captures the challenge succinctly with a bit of rhythm.)
If someone were to respond by saying, “Ah, but we have thousands of GRNs,” then they have a large network (where each node is itself a GRN, something akin to a deep learning problem) that will require massive quantities of data.