object ProteinTranslation {
def proteins(input: String): Seq[String] = {
input.grouped(3).map(codonsToProteins).takeWhile(_ != "STOP").toSeq
}
private def codonsToProteins(codon: String): String = {
codon match {
case "AUG" => "Methionine"
case "UUU" | "UUC" => "Phenylalanine"
case "UUA" | "UUG" => "Leucine"
case "UCU" | "UCC" | "UCA" | "UCG" => "Serine"
case "UAU" | "UAC" => "Tyrosine"
case "UGU" | "UGC" => "Cysteine"
case "UGG" => "Tryptophan"
case "UAA" | "UAG" | "UGA" => "STOP"
}
}
}
This approach starts by calling the grouped()
method on the input String
.
It will take a sequence of Char
s up to the amount passed in.
The grouped()
treats a String
as a plain sequence of Char
code units and makes no attempt to keep surrogate pairs or codepoint sequences together.
The user is responsible for making sure such cases are handled correctly.
Failing to do so may result in an invalid Unicode String
.
The grouped()
method works here because all of the characters are ASCII.
If the remaining number of Chars
s is less than the argument, grouped()
will return those,
and will be an empty Iterator
if no Char
s are left.
The Char
s are chained to the map()
method which passes them to the codonsToProteins()
method,
which accepts them as a String
.
The codonsToProteins()
method uses a match
to look up the protein for the codon and returns the protein from the method.
The proteins are chained to the takeWhile()
method which keeps requesting codons until they run out or
a STOP
codon is received.
Once all of the valid codons have been translated, the proteins()
method returns a sequence of the proteins.