Mutation Prediction of Infectious Viruses based on Different Machine learning Approaches

Predicting virus evolution helps control, prevent, and treat diseases. Mutations that can evade the host immune system persist and spread through generations, making it crucial to anticipate and combat them. The 1918 H1N1 pandemic was particularly severe. By predicting mutations ahead of time we can identify potential future pandemics before they happen so that effective preventative measures may be taken against them. Variational AutoEncoders (VAEs) and Generative Adversarial Networks (GANs) are two powerful machine learning techniques used to generate new samples from existing data. Sequence-to-Sequence (Seq2seq) networks, are primarily employed in translation tasks for generating a sample based on the previous one. Our method of choice is a combination of GAN networks and Seq2seq networks which allows us to leverage both approaches for an enhanced performance outcome. We created a Seq2seq network as the Generator of our model, utilizing Long Short Term Memories (LSTMs) to generate sequences. We then used a discriminator to distinguish between fake and real sequences. The most challenging task was that GANs are not optimized for sequential data and Seq2seq has some issues with longer-length sequences. Finally, we evaluate our sequences using BLOSUM and Levenshtein Distance and we generate sequences that can have the BLOSUM score of 980 and Levenshtein distance of 5 against our test data.