yield with Dictionary

Protein Translation
Protein Translation in C#
using System;
using System.Collections.Generic;
using System.Linq;

public static class ProteinTranslation
{
    private static IDictionary<string, string> proteins = new Dictionary<string, string>();

    static ProteinTranslation()
    {
        proteins.Add("AUG", "Methionine");
        proteins.Add("UUU", "Phenylalanine");
        proteins.Add("UUC", "Phenylalanine");
        proteins.Add("UUA", "Leucine");
        proteins.Add("UUG", "Leucine");
        proteins.Add("UCU", "Serine");
        proteins.Add("UCC", "Serine");
        proteins.Add("UCA", "Serine");
        proteins.Add("UCG", "Serine");
        proteins.Add("UAU", "Tyrosine");
        proteins.Add("UAC", "Tyrosine");
        proteins.Add("UGU", "Cysteine");
        proteins.Add("UGC", "Cysteine");
        proteins.Add("UGG", "Tryptophan");
        proteins.Add("UAA", "STOP");
        proteins.Add("UAG", "STOP");
        proteins.Add("UGA", "STOP");
    }

    public static string[] Proteins(string strand) =>
        strand.Chunked(3).Select(codon => proteins[codon]).TakeWhile(protein => protein != "STOP").ToArray();

    private static IEnumerable<string> Chunked(this string input, int size)
    {
        for (var i = 0; i < input.Length; i += size)
            yield return input[i..(i + size)];
    }

}

The approach begins by defining a private, static, readonly Dictionary for translating the codons to proteins. It is private because it isn't needed outside the class. It is static because only one is needed to serve every instance of the class. It is readonly because, although it has interior mutability (meaning its elements can change), the Dictionary variable itself will not be assigned to another Dictionary.

The static constructor loads the Dictionary from the codons and their matching protein.

The Proteins() method starts by calling the private static extension method Chunked(), which is also an iterator method. The function uses yield to return chunks of the string input as IEnumerable strings. The for loop returns the string chunk with a range operator that uses the size argument to the function for the starting and ending positions of the range.

The output of Chunked() is chained to the input of the LINQ Select() method. Inside the body of Select() is a lambda function which takes the codon chunk as an argument and looks up its matching protein from the Dictionary.

Each matching protein is chained from the output of Select() to the input of the TakeWhile() method, which filters the proteins in a lambda based on whether the protein is a STOP codon. Once the lambda in TakeWhile() encounters a failing lambda condition, it does not continue to iterate, but stops.

The proteins that survive the TakeWhile() are chained into the input of the ToArray() method.

The ToArray() method is used to return an array of the matched proteins from the Proteins() method.

6th Nov 2024 · Found it useful?