Hamming

Hamming

Easy

Instructions

Calculate the Hamming Distance between two DNA strands.

Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

We read DNA using the letters C,A,G and T. Two strands might look like this:

GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
^ ^ ^  ^ ^    ^^

They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with :)

Implementation notes

The Hamming distance is only defined for sequences of equal length, so an attempt to calculate it between sequences of different lengths should not work.

Option is used to indicate a computation that may possibly have no useful result (for example due to an error or invalid input). If you are unfamiliar with Option you may read this tutorial. Option is a so-called Monad which covers a "computational aspect", in this case possible absence of a value. Proper use of Monads can result in very concise yet elegant and readable code. Improper use can easily result in the contrary. Watch this video to learn more.

Common pitfalls that you should avoid

There are a few rules of thumbs for Option:

  1. If you don't need it don't use it. Instead of
def add1(x: Int): Option[Int] = Some(x + 1)

better have

def add1(x: Int): Int = x + 1

(there is Option.map to apply such simple functions, so you don't have to clutter them with Option). 2. Don't "unwrap" if you don't really need to. Often there are built-in functions for your purpose. Indicators of premature unwrapping are isDefined/isEmpty or pattern matching. Instead of

val x: Option[Int] = ...

if (x.isDefined) x.get + 1 else 0
// or
x match {
  case Some(n) => n + 1
  case None => 0
}

better have

x map (_ + 1) getOrElse 0
  1. Monads can be used inside a for-comprehension FTW. This is advisable when you want to "compose" several Option instances. Instead of
val xo: Option[Int] = ...
val yo: Option[Int] = ...
val zo: Option[Int] = ...

xo.flatMap(x =>
  yo.flatMap(y =>
    zo.map(z =>
	  x + y + z)))

better have

for {
  x <- xo
  y <- yo
  z <- zo
} yield x + y + z
Edit via GitHub The link opens in a new window or tab
Scala Exercism

Ready to start Hamming?

Sign up to Exercism to learn and master Scala with 94 exercises, and real human mentoring, all for free.