Genetic improvement of vegetable crops is a long process, and polyploid species represent an extra challenge to breeders. A recent article led by the ICREA Research Professor at the Centre for Research in Agricultural Genomics (CRAG), Miguel Pérez-Enciso, in collaboration with the University of Florida, shows that the class of machine learning algorithm known as “Deep Learning”, can be a useful tool to accelerate the breeding of “complex” traits in some important agronomic species. The researchers have demonstrated it in two of the most important soft fruit commodities: strawberries and blueberries, and have provided open access to a python-based package for its implementation.
Many traits of interest in plants are complex
The development of new crop varieties typically consists of a trial and error process whereby ‘elite’ consolidated lines are crossed and the performance of their descendants is evaluated. Eventually, some of these descendants replace the elite lines when they outperform them in, at least, some of the traits of interest, for example, disease resistance or flavour. This is a continuous process but, unfortunately, a slow one. For instance, the time to develop a new strawberry variety is over eight years.
For this reason, breeders have strived to accelerate this process using modern genomic technologies. One possibility is to carry out genetic tests to identify the most favorable crosses and individuals. This technique, known as marker-assisted selection (MAS), requires that the genes and their causative mutations behind the trait are known. Unfortunately, and contrary to what is commonly thought, the genetic determinism of traits of economic interest is only partially known. In fact, relatively few causative mutations have been discovered so far. Besides, the expression of many traits, such as flavour, depend not only on the genes but also on the environmental conditions in which the plants are grown. Breeders define such traits as “complex”, because they depend on the environment and on many genes which are only partially characterized.
What to do, then? Again, molecular methods can help but using a complementary approach called “Genomic Prediction“. This procedure consists in utilizing all genetic markers available, including untested candidates, in order to “predict” future performance of plant varieties. This is typically done using variants of the well-known linear regression method.
How Deep Learning can help
Most of the Genomic Prediction methods assume a relatively simple relation between genetic markers and traits of interest.
“Instead, we have used Deep Learning algorithms, which are extremely flexible in the relation they assume between the markers and the traits of interest,” explains Laura Zingaretti, PhD student and first author of the study.
Deep Learning comprises a kind of algorithms that were inspired on how the human brain works and break the whole computation procedure into small units called “neurons”. These methods are extremely popular today and have found numerous applications, ranging from automatic translation to video and sound analysis.
In the study now published in Frontiers in Plant Science journal, the authors have applied Deep Learning methods for the Genomic Prediction of two important horticultural species: strawberry and blueberry.
“Both crop species have highly complex genetic structures because they are polyploids, that is, they have more than two sets of chromosomes,” explains the IRTA researcher at CRAG, Amparo Monfort, expert in strawberry genetics, who also contributed to this study.
It is precisely in this type of species where interactions between genes become more important than usual. Deep learning can be a promising tool for genomic prediction in this scenario.
“Our study is one of the first applications of Deep Learning to Genomic Prediction.
We have demonstrated that Deep Learning can be very effective in the presence of interactions between genes, that is, when the whole trait cannot be predicted simply by considering the genes individually. We also provide software that can be used by researchers to apply deep learning to genomic prediction,” says lead researcher Pérez-Enciso.
“We believe this work is of particular relevance for the Spanish industry since Spain is currently the first European strawberry producer and the sixth worldwide”, adds Amparo Monfort.
Reference Article: L.M. Zingaretti, S.A. Gezan, L.F. Ferrão, L.F. Osorio, A. Monfort, P.R. Muñoz, V.M. Whitaker, M. Pérez-Enciso. (2020) Exploring deep learning for complex trait genomic prediction in polyploid outcrossing species. Frontiers in Plant Science doi.org/10.3389/fpls.2020.00025.
Other recent related articles: Pérez-Enciso M, Zingaretti LM. (2019). A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes doi.org/10.3390/genes10070553.