Jekyll2017-03-06T00:16:01+00:00http://blog.echo.sh/jonathan chow // blogJonathan Chowclosures vs. structs in Go2016-10-25T00:00:00+00:002016-10-25T00:00:00+00:00http://blog.echo.sh/2016/10/25/closures-vs-structs-in-go<p>Today, I was reminded about this classic story about <a href="http://people.csail.mit.edu/gregs/ll1-discuss-archive-html/msg03277.html">closures vs.
objects</a>:</p>
<blockquote>
<p>The venerable master Qc Na was walking with his student, Anton. Hoping to
prompt the master into a discussion, Anton said “Master, I have heard that
objects are a very good thing - is this true?” Qc Na looked pityingly at his
student and replied, “Foolish pupil - objects are merely a poor man’s
closures.”</p>
</blockquote>
<blockquote>
<p>Chastised, Anton took his leave from his master and returned to his cell,
intent on studying closures. He carefully read the entire “Lambda: The
Ultimate…” series of papers and its cousins, and implemented a small Scheme
interpreter with a closure-based object system. He learned much, and looked
forward to informing his master of his progress.</p>
</blockquote>
<blockquote>
<p>On his next walk with Qc Na, Anton attempted to impress his master by saying
“Master, I have diligently studied the matter, and now understand that objects
are truly a poor man’s closures.” Qc Na responded by hitting Anton with his
stick, saying “When will you learn? Closures are a poor man’s object.” At that
moment, Anton became enlightened.</p>
</blockquote>
<p>The last couple of years for me was a deep dive into a very Haskell-influenced
style of Scala and JavaScript programming. Between trying to grok
<a href="http://typelevel.org/projects/">typelevel.scala</a> projects and
<a href="http://haskell-servant.readthedocs.io/en/stable/tutorial/ApiType.html">advanced</a>
<a href="https://www.youtube.com/watch?v=fVBck2Zngjo">dependent-typing</a> concepts, my
thinking naturally became biased towards modelling problems in terms of types
and functions instead of objects.</p>
<p>(To give a general idea of how deep down the rabbit hole I ended up, I wrote
both <a href="https://github.com/echojc/peano-arithmetic/blob/master/src/main/scala/Main.scala">type-level arithmetic
operators</a>
(including a primality tester!) and <a href="https://gist.github.com/echojc/ef509f407850265a2201">type-level SK
combinators</a> in Scala.)</p>
<p>Recently, I’ve turned my attention to Go. It’s been a very refreshing
experience, having the procedural-functional pendulum swing back the other way.
The specific problem that brought this story back to me was in trying to
abstract HTTP handlers in Go.</p>
<p>The standard library provides this interface to handle requests:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">type</span><span class="x"> </span><span class="n">HandlerFunc</span><span class="x"> </span><span class="k">func</span><span class="p">(</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="o">*</span><span class="n">Request</span><span class="p">)</span><span class="x">
</span></code></pre>
</div>
<p>and you’d use it like this, for example:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">func</span><span class="x"> </span><span class="n">MyHandler</span><span class="p">(</span><span class="n">w</span><span class="x"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="n">r</span><span class="x"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="n">http</span><span class="o">.</span><span class="n">NotFound</span><span class="p">(</span><span class="n">w</span><span class="p">,</span><span class="x"> </span><span class="n">r</span><span class="p">)</span><span class="x">
</span><span class="p">}</span><span class="x">
</span><span class="n">http</span><span class="o">.</span><span class="n">HandleFunc</span><span class="p">(</span><span class="s">"/"</span><span class="p">,</span><span class="x"> </span><span class="n">MyHandler</span><span class="p">)</span><span class="x">
</span><span class="n">http</span><span class="o">.</span><span class="n">ListenAndServe</span><span class="p">(</span><span class="s">":9000"</span><span class="p">,</span><span class="x"> </span><span class="no">nil</span><span class="p">)</span><span class="x">
</span></code></pre>
</div>
<p>This API is very barebones, but there’s nothing wrong with it. I really enjoy
how straightforward and pragmatic standard Go APIs tend to be.</p>
<p>One problem that creeps in, though, is when your handler needs other
dependencies in order to do whatever it needs to do. For me, that was a database
connection. Go provides generic database functionality via the <code class="highlighter-rouge">sql</code> package,
which exposes the <code class="highlighter-rouge">*sql.DB</code> type as a database handle of sorts. So how do I get
an instance of this struct into my handler?</p>
<p>A basic solution is to simply share what is essentially a global <code class="highlighter-rouge">*sql.DB</code>
instance, for instance:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">func</span><span class="x"> </span><span class="n">main</span><span class="p">()</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="n">db</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">sql</span><span class="o">.</span><span class="n">Open</span><span class="p">(</span><span class="s">"postgres"</span><span class="p">,</span><span class="x"> </span><span class="s">"postgres://..."</span><span class="p">)</span><span class="x">
</span><span class="n">myHandler</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="k">func</span><span class="p">(</span><span class="n">w</span><span class="x"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="n">r</span><span class="x"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="c">// use `db`</span><span class="x">
</span><span class="p">}</span><span class="x">
</span><span class="n">http</span><span class="o">.</span><span class="n">HandleFunc</span><span class="p">(</span><span class="s">"/"</span><span class="p">,</span><span class="x"> </span><span class="n">myHandler</span><span class="p">)</span><span class="x">
</span><span class="n">http</span><span class="o">.</span><span class="n">ListenAndServe</span><span class="p">(</span><span class="s">":9000"</span><span class="p">,</span><span class="x"> </span><span class="no">nil</span><span class="p">)</span><span class="x">
</span><span class="p">}</span><span class="x">
</span></code></pre>
</div>
<p>This might be fine for small projects, but there are some common issues with
this implementation: you can’t replace the <code class="highlighter-rouge">db</code> instance with something else for
testing purposes (whether it’s a connection to a different database or a mock),
and everything needs to be in the same scope, which doesn’t help with
modularization.</p>
<p>Knowing these problems, my immediate thought was to use partially applied
functions. In ES6, it’d be something like this:</p>
<div class="language-javascript highlighter-rouge"><pre class="highlight"><code><span class="nx">handler</span> <span class="o">=</span> <span class="p">(</span><span class="nx">db</span><span class="p">)</span> <span class="o">=></span> <span class="p">(</span><span class="nx">writer</span><span class="p">,</span> <span class="nx">request</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// use `db`</span>
<span class="p">};</span>
</code></pre>
</div>
<p>This way, the function would still close over the connection, but I’m now free
to dictate exactly which instance of <code class="highlighter-rouge">db</code> the handler should use in each
circumstance.</p>
<p>I did in fact end up writing the equivalent in Go, which looked something
like this:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">func</span><span class="x"> </span><span class="n">CreateHandler</span><span class="p">(</span><span class="n">db</span><span class="x"> </span><span class="o">*</span><span class="n">sql</span><span class="o">.</span><span class="n">DB</span><span class="p">)</span><span class="x"> </span><span class="k">func</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="k">return</span><span class="x"> </span><span class="k">func</span><span class="p">(</span><span class="n">w</span><span class="x"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="n">r</span><span class="x"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="c">// use `db`</span><span class="x">
</span><span class="p">}</span><span class="x">
</span><span class="p">}</span><span class="x">
</span></code></pre>
</div>
<p>and to test:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">func</span><span class="x"> </span><span class="n">TestFoo</span><span class="p">()</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="n">db</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">getTestConnection</span><span class="p">()</span><span class="x">
</span><span class="n">handler</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">CreateHandler</span><span class="p">(</span><span class="n">db</span><span class="p">)</span><span class="x">
</span><span class="n">req</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">httptest</span><span class="o">.</span><span class="n">NewRequest</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="x">
</span><span class="n">res</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">httptest</span><span class="o">.</span><span class="n">NewRecorder</span><span class="p">()</span><span class="x">
</span><span class="n">handler</span><span class="p">(</span><span class="n">res</span><span class="p">,</span><span class="x"> </span><span class="n">req</span><span class="p">)</span><span class="x">
</span><span class="c">// check `res`</span><span class="x">
</span><span class="p">}</span><span class="x">
</span></code></pre>
</div>
<p>But sitting back and looking at my handiwork, I couldn’t shake that niggly
feeling that this wasn’t quite right - it felt clumsy and not very idiomatic.</p>
<p>Eventually, I stumbled upon a different way to approach the problem, and it
(surprise!) involved using <code class="highlighter-rouge">struct</code>s. It wasn’t complicated, and, to be honest,
would very likely have been the first thing to spring to mind for anyone who
<em>hadn’t</em> been so heavily invested in modelling problems with functions in the
first place. Consider this:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">type</span><span class="x"> </span><span class="n">MyHandler</span><span class="x"> </span><span class="k">struct</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="n">db</span><span class="x"> </span><span class="o">*</span><span class="n">sql</span><span class="o">.</span><span class="n">DB</span><span class="x">
</span><span class="p">}</span><span class="x">
</span><span class="k">func</span><span class="x"> </span><span class="p">(</span><span class="n">h</span><span class="x"> </span><span class="o">*</span><span class="n">MyHandler</span><span class="p">)</span><span class="x"> </span><span class="n">Handle</span><span class="p">(</span><span class="n">w</span><span class="x"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="x"> </span><span class="n">r</span><span class="x"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="c">// use `h.db`</span><span class="x">
</span><span class="p">}</span><span class="x">
</span></code></pre>
</div>
<p>By using a method on a struct instead of a partially applied function, the
closing over of the <code class="highlighter-rouge">db</code> variable is now pushed into the instance of the struct
instead of an instance of a function. This is much more idiomatic Go. And if
there’s any doubt that these two implementations are semantically equivalent,
and you can see that it really is the case by comparing the test for this
version with the previous test:</p>
<div class="language-go highlighter-rouge"><pre class="highlight"><code><span class="k">func</span><span class="x"> </span><span class="n">TestBar</span><span class="p">()</span><span class="x"> </span><span class="p">{</span><span class="x">
</span><span class="n">db</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">getTestConnection</span><span class="p">()</span><span class="x">
</span><span class="n">handler</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">MyHandler</span><span class="p">{</span><span class="n">db</span><span class="p">}</span><span class="x">
</span><span class="n">req</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">httptest</span><span class="o">.</span><span class="n">NewRequest</span><span class="p">(</span><span class="o">...</span><span class="p">)</span><span class="x">
</span><span class="n">res</span><span class="x"> </span><span class="o">:=</span><span class="x"> </span><span class="n">httptest</span><span class="o">.</span><span class="n">NewRecorder</span><span class="p">()</span><span class="x">
</span><span class="n">handler</span><span class="o">.</span><span class="n">Handle</span><span class="p">(</span><span class="n">res</span><span class="p">,</span><span class="x"> </span><span class="n">req</span><span class="p">)</span><span class="x">
</span><span class="c">// check `res`</span><span class="x">
</span><span class="p">}</span><span class="x">
</span></code></pre>
</div>
<p>This shows precisely what the opening story tries to convey: that objects and
closures are different ways to express the same thing. Some languages favour one
idiom over the other, but, at the end of the day, the result is the same.</p>
<p>This was a good reminder to me that using the right tool for the right job not
only applies to languages and tooling, but to concepts and abstractions as well.</p>Jonathan ChowToday, I was reminded about this classic story about closures vs. objects:programming with types2015-01-23T00:00:00+00:002015-01-23T00:00:00+00:00http://blog.echo.sh/2015/01/23/programming-with-types<p>One of the things that really grows on you after programming with Haskell for a
while is the idea that the types alone actually reveal quite a lot about what a
program does. The implementation is almost a secondary concern.</p>
<p>To a programmer who has worked in side-effecting languages their whole life,
it’s very unnerving to learn that the most common way to search the Haskell
documentation is with a type signature. It’s arguably even more disconcerting to
see something as terse as this in the official documentation:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>const :: a -> b -> a
base Prelude, base Data.Function
Constant function.
</code></pre>
</div>
<p>Uh… cool. So what does <code class="highlighter-rouge">const</code> actually do?</p>
<h3 id="generics-are-very-generic">Generics are very generic</h3>
<p>To understand how to read type signatures, you first have to realise that
generic arguments are precisely that: they are generic. Haskell has no class
hierarchy because it’s not an OOP language. In Java, since all types inherit
from <code class="highlighter-rouge">Object</code>, at a minimum you can always call <code class="highlighter-rouge">toString()</code> on a generic type,
like so:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="o"><</span><span class="n">A</span><span class="o">></span> <span class="kt">void</span> <span class="nf">foo</span><span class="o">(</span><span class="n">A</span> <span class="n">a</span><span class="o">)</span> <span class="o">{</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">toString</span><span class="o">());</span>
<span class="o">}</span></code></pre></figure>
<p>In Haskell, absolutely <em>nothing</em> is known about a generic type other than the
fact that it is of that type. Trying to compile the Haskell equivalent of the
Java above:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">foo</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">IO</span> <span class="nb">()</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">print</span></code></pre></figure>
<p>results in a compile error:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>No instance for (Show a)
arising from a use of `print'
In the expression: print
In an equation for `foo': foo = print
</code></pre>
</div>
<p>You can’t even test for equality! This code snippet:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">bar</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Bool</span>
<span class="n">bar</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">==</span> <span class="n">x</span></code></pre></figure>
<p>also results in a compile error:</p>
<div class="highlighter-rouge"><pre class="highlight"><code>No instance for (Eq a)
arising from a use of `=='
In the expression: x == x
In an equation for `bar': bar x = x == x
</code></pre>
</div>
<p><code class="highlighter-rouge">Show</code> and <code class="highlighter-rouge">Eq</code> are examples of what is called a typeclass. On a high level,
these are similar to interfaces in Java: for example, <code class="highlighter-rouge">Show</code> tells the Haskell
compiler that <code class="highlighter-rouge">show</code>, the equivalent of Java’s <code class="highlighter-rouge">toString</code>, is available for that
type. The only way to make these functions compile is to explicitly say that you
expect the type <code class="highlighter-rouge">a</code> to have instances declared for the appropriate typeclass:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">foo</span> <span class="o">::</span> <span class="p">(</span><span class="kt">Show</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="n">a</span> <span class="o">-></span> <span class="kt">IO</span> <span class="nb">()</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">print</span>
<span class="n">bar</span> <span class="o">::</span> <span class="p">(</span><span class="kt">Eq</span> <span class="n">a</span><span class="p">)</span> <span class="o">=></span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Bool</span>
<span class="n">bar</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">==</span> <span class="n">x</span></code></pre></figure>
<p>These compile. The closest Java equivalent would be something like this:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="o"><</span><span class="n">A</span> <span class="kd">extends</span> <span class="n">Show</span><span class="o">></span> <span class="kt">void</span> <span class="nf">foo</span><span class="o">(</span><span class="n">A</span> <span class="n">a</span><span class="o">)</span> <span class="o">{</span>
<span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">show</span><span class="o">());</span>
<span class="o">}</span></code></pre></figure>
<p>The takeaway here is that Haskell is <em>that</em> strict about its types. If you don’t
explicitly say that a generic type can do something, then the only operation you
can perform on it is to return itself.</p>
<h3 id="arguments-as-the-sole-inputs">Arguments as the sole inputs</h3>
<p>Next, you have to remember that Haskell is a pure language. That means that
given the same inputs for a function, the output will always be the same, a
property known as <em>referential transparency</em> (it’s actually slightly more
complicated than that, but it’s close).</p>
<p>This, for example, means that global states are out. This Java function returns
a different value every time it is called:</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kt">int</span> <span class="n">bar</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
<span class="kt">int</span> <span class="nf">foo</span><span class="o">(</span><span class="kt">int</span> <span class="n">a</span><span class="o">)</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">bar</span><span class="o">++;</span>
<span class="o">}</span></code></pre></figure>
<p>An equivalent can’t be implemented in Haskell, at least not with the same type
signature. It is simply impossible.</p>
<h3 id="examples">Examples</h3>
<p>That brings us directly to the most straightforward example, the <code class="highlighter-rouge">id</code> function:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">id</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="n">a</span></code></pre></figure>
<p>By now you should be able to see why there can only ever be one implementation
given this type signature. The implementation is simply:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">id</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="n">a</span>
<span class="n">id</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span></code></pre></figure>
<p>This is because there is no way to find a value of type <code class="highlighter-rouge">a</code> within the context
of the function and because there are no operations that you can perform on a
value of the generic type <code class="highlighter-rouge">a</code>.</p>
<p>Let’s move onto a slightly more complex example. Here is the type signature:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">apply</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span></code></pre></figure>
<p>So <code class="highlighter-rouge">apply</code> is function that takes two arguments: the first is a function that
takes an <code class="highlighter-rouge">a</code> and returns a <code class="highlighter-rouge">b</code>, and the second is a value of type <code class="highlighter-rouge">a</code>. The
<code class="highlighter-rouge">apply</code> function itself must return a value of type <code class="highlighter-rouge">b</code>. How can it get its
hands on this value?</p>
<p>The only possible way is to call the function passed to it with the value that
was also passed to it, leading to this implementation:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">apply</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="n">apply</span> <span class="n">f</span> <span class="n">x</span> <span class="o">=</span> <span class="n">f</span> <span class="n">x</span></code></pre></figure>
<p>Those of you who have used Haskell will recognise this as the <code class="highlighter-rouge">($)</code> operator.</p>
<p>By now the only implementation for the example I gave in the introduction should
be apparent:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">const</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">a</span>
<span class="n">const</span> <span class="n">x</span> <span class="kr">_</span> <span class="o">=</span> <span class="n">x</span></code></pre></figure>
<p>or, if you prefer:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">const</span> <span class="o">::</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">a</span>
<span class="n">const</span> <span class="n">x</span> <span class="o">=</span> <span class="nf">\</span><span class="kr">_</span> <span class="o">-></span> <span class="n">x</span></code></pre></figure>
<p>Through the magic of currying, <code class="highlighter-rouge">const x</code> returns a function that ignores its
input and always produces the predefined constant result <code class="highlighter-rouge">x</code>.</p>
<h3 id="closing-thoughts">Closing thoughts</h3>
<p>And so on. The epiphany for me was the realisation that, given the constraints
of the Haskell programming language, type signatures are in fact very unique.</p>
<p>I could look a signature like this</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">-></span> <span class="p">[</span><span class="n">b</span><span class="p">]</span></code></pre></figure>
<p>and understand immediately that the only productive implementation for this
signature yields the <code class="highlighter-rouge">map</code> function. Or a signature like this:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">-></span> <span class="n">b</span></code></pre></figure>
<p>and immediately see a <code class="highlighter-rouge">fold</code> function (<code class="highlighter-rouge">foldr</code> specifically in Haskell).</p>
<p>From experience I can say that it certainly takes some time to get used to
reading and understanding functions in this manner. It’s a fundamental change in
how you interpret code. But once you’ve made the connection and internalised
this concept you don’t ever think about code in quite the same way again.</p>Jonathan ChowOne of the things that really grows on you after programming with Haskell for a while is the idea that the types alone actually reveal quite a lot about what a program does. The implementation is almost a secondary concern.exploring scala macros: map to case class conversion2013-11-04T00:00:00+00:002013-11-04T00:00:00+00:00http://blog.echo.sh/2013/11/04/exploring-scala-macros-map-to-case-class-conversion<p>Recently I had a go at writing some <a href="http://docs.scala-lang.org/overviews/macros/overview.html">Scala
macros</a>. Scala
macros are essentially an advanced version of the traditional C <code class="highlighter-rouge">#defines</code>. To
someone like me who hasn’t had too much experience with C, its macros feel like
a sophisticated find-and-replace tool that gets run before each compile. On the
other hand, Scala macros can <a href="http://docs.scala-lang.org/overviews/macros/usecases.html">bring about
benefits</a> such as
automatic code generation (via implicits), static type safety within strings
(when interpolating), and even allow for the creation very fluent DSL
interfaces.</p>
<p>My best understanding of Scala macros was the code generation aspect of it, so
I decided to tackle a problem that has probably plagued every budding developer
who’s tried to roll their own ORM in a statically typed language: persisting a
case class to the database and reading it back without using reflection.</p>
<p>The crux of the problem is always the conversion between a type-safe case class
and the database layer. In more mainstream languages like Java, there is simply
no way to automatically call some function for each field based on its type
without using reflection. With macros, however, the code to do this can be
generated at compile time.</p>
<h3 id="the-problem">The Problem</h3>
<p>Let’s reduce the problem to a very specific one: taking any
arbitrary case class and producing converter functions to and from a
<code class="highlighter-rouge">Map[String, Any]</code> where the keys are the names of the case class’s constructor
parameters pointing to their respective values.</p>
<p>[Note: many of the problems I faced while writing this macro were solved by
looking at this <a href="http://stackoverflow.com/questions/19544756/scala-macros-accessing-members-with-quasiquotes">implementation on
StackOverflow</a>,
hence the similarity.]</p>
<p>To take advantage of <a href="http://docs.scala-lang.org/overviews/macros/implicits.html">implicit
macros</a> (we’ll get
back to them later), we’ll use a type class to provide the conversion:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">trait</span> <span class="nc">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">toMap</span><span class="o">(</span><span class="n">t</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">]</span>
<span class="k">def</span> <span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span><span class="k">:</span> <span class="kt">T</span>
<span class="o">}</span></code></pre></figure>
<p>Any implementation of the <code class="highlighter-rouge">Mappable[T]</code> trait can now be used to convert a type <code class="highlighter-rouge">T</code>
to and from a map. For example, we can define one manually:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">case</span> <span class="k">class</span> <span class="nc">Person</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">age</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span>
<span class="k">val</span> <span class="nc">PersonMapper</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Mappable</span><span class="o">[</span><span class="kt">Person</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">toMap</span><span class="o">(</span><span class="n">p</span><span class="k">:</span> <span class="kt">Person</span><span class="o">)</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">(</span>
<span class="s">"name"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">name</span><span class="o">,</span>
<span class="s">"age"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">age</span><span class="o">)</span>
<span class="k">def</span> <span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span> <span class="k">=</span> <span class="nc">Person</span><span class="o">(</span>
<span class="n">map</span><span class="o">(</span><span class="s">"name"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span>
<span class="n">map</span><span class="o">(</span><span class="s">"age"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span>
<span class="o">}</span></code></pre></figure>
<p>There’s a big problem with defining the mapper explicitly though: any time the
case class changes the mapper must also be updated accordingly. Take, for
example, the case of adding a new parameter to the <code class="highlighter-rouge">Person</code> case class:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">case</span> <span class="k">class</span> <span class="nc">Person</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">age</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">height</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span>
<span class="k">val</span> <span class="nc">PersonMapper</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Mappable</span><span class="o">[</span><span class="kt">Person</span><span class="o">]</span> <span class="o">{</span>
<span class="c1">// toMap: compiles even though it's incorrect
</span> <span class="k">def</span> <span class="n">toMap</span><span class="o">(</span><span class="n">p</span><span class="k">:</span> <span class="kt">Person</span><span class="o">)</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">(</span>
<span class="s">"name"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">name</span><span class="o">,</span>
<span class="s">"age"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">age</span><span class="o">)</span>
<span class="c1">// fromMap: fails to compile (as it should)
</span> <span class="k">def</span> <span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span> <span class="k">=</span> <span class="nc">Person</span><span class="o">(</span>
<span class="n">map</span><span class="o">(</span><span class="s">"name"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span>
<span class="n">map</span><span class="o">(</span><span class="s">"age"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span>
<span class="o">}</span></code></pre></figure>
<p>When only the case class has changed, the compiler can catch the error in the
<code class="highlighter-rouge">fromMap</code> method because it’s one parameter short, but the compiler can’t catch
the semantic error in the <code class="highlighter-rouge">toMap</code> method missing the new <code class="highlighter-rouge">height</code> parameter.</p>
<h3 id="using-macros">Using Macros</h3>
<p>The reason for this is that explicitly defining the mapper leads to code that’s
not very <a href="http://en.wikipedia.org/wiki/Don%27t_repeat_yourself">DRY</a>. It
introduces multiple points in the code that have to change in order for some
changes to be semantically correct. Ideally, the mapper should be able to
figure out what fields are needed by looking directly at the class it’s defined
for rather than having each field explicitly listed in its methods.</p>
<p>It turns out that macros let you do this really easily. Let’s start by defining
a barebones macro in the companion object of the <code class="highlighter-rouge">Mappable</code> trait:</p>
<p>[Note: you can clone <a href="https://github.com/echojc/scala-macro-template">this template
repo</a> to follow along. With the
2.11.0-M5 compiler, macros must be compiled separately from the code that uses
them. With this template, the macro subproject can be used for this purpose.]</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">scala.reflect.macros.Context</span>
<span class="k">object</span> <span class="nc">Mappable</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">materializeMappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span><span class="k">:</span> <span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span>
<span class="n">macro</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="k">def</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">c.WeakTypeTag</span><span class="o">](</span><span class="n">c</span><span class="k">:</span> <span class="kt">Context</span><span class="o">)</span><span class="k">:</span> <span class="kt">c.Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">c.universe._</span>
<span class="k">val</span> <span class="n">tpe</span> <span class="k">=</span> <span class="n">weakTypeOf</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="n">c</span><span class="o">.</span><span class="nc">Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="o">{</span> <span class="n">q</span><span class="s">"""
new Mappable[$tpe] {
def toMap(t: $tpe) = ???
def fromMap(map: Map[String, Any]) = ???
}
"""</span> <span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>Even to a seasoned Scala user, if you’ve never used macros before this probably
looks like gobbledegook! <a href="http://danielwestheide.com/blog/2013/02/13/the-neophytes-guide-to-scala-part-13-path-dependent-types.html">Dependent
types</a>,
<a href="http://docs.scala-lang.org/overviews/macros/quasiquotes.html">quasiquotes</a>,
even a <a href="http://stackoverflow.com/a/4467012">context bound</a> thrown into the mix.
Behind all the flashiness, however, it’s actually fairly straightforward. Let’s
go through this one part at a time.</p>
<h3 id="implicit-function-to-trigger-macro">Implicit Function to Trigger Macro</h3>
<p>We start off with the implicit method that triggers the macro:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">implicit</span> <span class="k">def</span> <span class="n">materializeMappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span><span class="k">:</span> <span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span>
<span class="n">macro</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span></code></pre></figure>
<p>It’s easy to see that this method returns a <code class="highlighter-rouge">Mappable</code> corresponding to
whatever type is passed when the function is called. This method doesn’t have
an implementation; the <code class="highlighter-rouge">macro</code> keyword instructs the compiler to expand the
corresponding macro implementation instead, in this case,
<code class="highlighter-rouge">materializeMappableImpl</code>.</p>
<p>The reason we make this method implicit is that this allows the compiler to
automatically create mappers for types as required (the aforementioned
<a href="http://docs.scala-lang.org/overviews/macros/implicits.html">implicit macros</a>).
Without it, one would need to explicitly create a mapper before using it:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">personToMap</span><span class="o">(</span><span class="n">p</span><span class="k">:</span> <span class="kt">Person</span><span class="o">)</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">mapper</span> <span class="k">=</span> <span class="n">materializeMappable</span><span class="o">[</span><span class="kt">Person</span><span class="o">]</span>
<span class="n">mapper</span><span class="o">.</span><span class="n">toMap</span><span class="o">(</span><span class="n">p</span><span class="o">)</span>
<span class="o">}</span></code></pre></figure>
<p>By marking the method implicit, we give the compiler the opportunity to
automatically insert this method call whenever an implicit parameter of type
<code class="highlighter-rouge">Mapper[T]</code> is required. For example,</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// the compiler will insert materializeMappable[T] as the implicit parameter
</span><span class="k">def</span> <span class="n">mapify</span><span class="o">[</span><span class="kt">T</span><span class="o">](</span><span class="n">t</span><span class="k">:</span> <span class="kt">T</span><span class="o">)(</span><span class="k">implicit</span> <span class="n">mapper</span><span class="k">:</span> <span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">])</span> <span class="k">=</span>
<span class="n">mapper</span><span class="o">.</span><span class="n">toMap</span><span class="o">(</span><span class="n">t</span><span class="o">)</span></code></pre></figure>
<p>We can even use context bounds to not explicitly specify the extra parameter:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">mapify</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">Mappable</span><span class="o">](</span><span class="n">t</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span> <span class="k">=</span>
<span class="n">implicitly</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]].</span><span class="n">toMap</span><span class="o">(</span><span class="n">t</span><span class="o">)</span></code></pre></figure>
<p>In this case, the mapper is implicitly inserted into the function by the
compiler. We don’t have a reference to it, but it’s there, so we use the
implicitly function to summon it from the nether world.</p>
<h3 id="macro-boilerplate">Macro Boilerplate</h3>
<p>Let’s move on to the macro implementation. The structure of the macro function
looks at first sight to be some strange incantation:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">c.WeakTypeTag</span><span class="o">](</span><span class="n">c</span><span class="k">:</span> <span class="kt">Context</span><span class="o">)</span><span class="k">:</span> <span class="kt">c.Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">c.universe._</span>
<span class="c1">// _...
</span><span class="o">}</span></code></pre></figure>
<p>Again, however, it’s actually fairly straightforward. Macros work with code, so
we manipulate it with abstract syntax trees. The Context variable contains
information the compiler would have pertaining to the current invocation of the
macro (such as call site, parameters, etc.). This is passed as a parameter to
the macro expansion. All the other information about the macro invocation are
then passed the same way the original function is written as a dependent type
of the current Context: parameters and return types as c.Exprs (essentially
typed ASTs <a href="http://www.scala-lang.org/files/archive/api/2.11.0-M5/#scala.reflect.api.Exprs%24Expr">according to the
docs</a>),
and type parameters as c.WeakTypeTags (see <a href="https://github.com/scala/scala/commit/788478d3ab">this
commit</a> for an explanation
about why it must be a WeakTypeTag and
<a href="http://stackoverflow.com/a/12232195">here</a> for more information about TypeTags
in general).</p>
<p>Finally, we import everything inside the universe of the Context to bring all
the common utility functions into scope.</p>
<h3 id="macro-implementation">Macro Implementation</h3>
<p>Now we get into the nuts and bolts of the macro:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">tpe</span> <span class="k">=</span> <span class="n">weakTypeOf</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="n">c</span><span class="o">.</span><span class="nc">Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="o">{</span> <span class="n">q</span><span class="s">"""
new Mappable[$tpe] {
def toMap(t: $tpe) = ???
def fromMap(map: Map[String, Any]) = ???
}
"""</span> <span class="o">}</span></code></pre></figure>
<p>To start things off, we first get the type of the case class we’re creating a
mapper for out of the WeakTypeTag. This tpe variable can then be used directly
within quasiquotes.</p>
<p>[<s>Note: it looks like WeakTypeTags should also be directly usable within
quasiquotes since [they also have a Liftable
implementation](http://www.scala-lang.org/files/archive/api/2.11.0-M5/#scala.reflect.api.StandardLiftables)
but I couldn't get it to work. I didn't look too closely at it though.</s>
<a href="#comment-1138314276">densh has pointed out</a> that you need a variable of type
<code class="highlighter-rouge">WeakTypeTag</code> and not a type of one for this to work.]</p>
<p>Now,
<a href="http://docs.scala-lang.org/overviews/macros/quasiquotes.html">quasiquotes</a>. I
found this part the most awesome part about writing Scala macros. They’re
somewhat of a replacement for the <a href="http://docs.scala-lang.org/overviews/macros/overview.html#a_complete_example">earlier <code class="highlighter-rouge">reify</code>/<code class="highlighter-rouge">splice</code>
style</a>
of writing macros. They work just like <a href="http://docs.scala-lang.org/overviews/core/string-interpolation.html">interpolated
strings</a>,
but instead of a string you write normal Scala code and instead of splicing
string versions of variables with <code class="highlighter-rouge">$variable</code>, you splice ASTs. The most obvious
distinction between the two is that <code class="highlighter-rouge">reify</code> returns an <code class="highlighter-rouge">Expr</code>, while quasiquotes
return an AST which must then be wrapped into an <code class="highlighter-rouge">Expr</code> explicitly.</p>
<p>With that understanding, the rest of this code snippet should be easy to
understand. We define an <code class="highlighter-rouge">Expr</code> of type <code class="highlighter-rouge">Mappable[T]</code> and use quasiquotes to create
the AST from normal code. Note the use of the <code class="highlighter-rouge">tpe</code> variable inside the
quasiquotes in place of <code class="highlighter-rouge">T</code>. We use <code class="highlighter-rouge">???</code> here because we’ve yet to discuss the
real implementation of the <code class="highlighter-rouge">Mappable</code> instance.</p>
<h3 id="getting-fields">Getting Fields</h3>
<p>Our instance of <code class="highlighter-rouge">Mapper</code> needs to iterate over the fields of the case class it’s
used for. We don’t want <em>all</em> fields though; just the ones used in the
constructor are all we want.</p>
<p>There are many ways we can get at that information. Methods have an
<code class="highlighter-rouge">isCaseAccessor</code> flag that signifies whether they are used to access the
parameters in the constructor. We can also look at the primary implementation
of the copy function. However, because we’ll eventually need the exact order of
parameters in order to implement the <code class="highlighter-rouge">fromMap</code> method, we’ll use the <em>primary
constructor</em> to get the list of fields we need.</p>
<p>To do this, we’ll inspect the <code class="highlighter-rouge">tpe</code> variable describing our case class to get a
list of all its <em>declarations</em>. [Note: <code class="highlighter-rouge">declarations</code> are members declared directly
in this class, while <code class="highlighter-rouge">members</code> include inherited ones.] One of these will be the
primary constructor, so we use a pattern match with a guard to get it out. Once
we have the constructor, we can extract the list of parameters in the order
that we need.</p>
<p>This can be translated directly into code:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">declarations</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">declarations</span>
<span class="k">val</span> <span class="n">ctor</span> <span class="k">=</span> <span class="n">declarations</span><span class="o">.</span><span class="n">collectFirst</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">m</span><span class="k">:</span> <span class="kt">MethodSymbol</span> <span class="kt">if</span> <span class="kt">m.isPrimaryConstructor</span> <span class="o">=></span> <span class="n">m</span>
<span class="o">}.</span><span class="n">get</span>
<span class="k">val</span> <span class="n">params</span> <span class="k">=</span> <span class="n">ctor</span><span class="o">.</span><span class="n">paramss</span><span class="o">.</span><span class="n">head</span></code></pre></figure>
<p><code class="highlighter-rouge">paramss</code> looks like a typo, but in fact it’s a list of lists (of parameters),
hence the double ‘s’. There’s only ever one primary constructor, so in our case
we’re fine taking the <code class="highlighter-rouge">head</code> of that list, but methods in general can be
overloaded to take different parameter lists which is why it’s there.</p>
<h3 id="writing-tomap">Writing toMap</h3>
<p>Now that we have the fields, let’s write the <code class="highlighter-rouge">toMap</code> method. Let’s refresh
ourselves with what this method should look like by taking a look at the manual
implementation from earlier:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">toMap</span><span class="o">(</span><span class="n">p</span><span class="k">:</span> <span class="kt">Person</span><span class="o">)</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">(</span>
<span class="s">"name"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">name</span><span class="o">,</span>
<span class="s">"age"</span> <span class="o">-></span> <span class="n">p</span><span class="o">.</span><span class="n">age</span><span class="o">)</span></code></pre></figure>
<p>The implementation is just one statement! It’s just a call to <code class="highlighter-rouge">Map.apply</code> with
“stuff” in it. Let’s break down what that “stuff” includes:</p>
<ol>
<li>the name of the field as a <code class="highlighter-rouge">String</code></li>
<li>a call to the <code class="highlighter-rouge">-></code> method to create the tuple</li>
<li>a member access to the underlying field</li>
</ol>
<p>What we need, then, is an AST that represents this. What better way to generate that AST than to use quasiquotes?</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">toMapParams</span> <span class="k">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">field</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">name</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">name</span>
<span class="k">val</span> <span class="n">mapKey</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">decoded</span>
<span class="n">q</span><span class="s">"$mapKey -> t.$name"</span>
<span class="o">}</span></code></pre></figure>
<p>That’s all we need! The <code class="highlighter-rouge">mapKey</code> variable is annotated with its type <code class="highlighter-rouge">String</code> to
illustrate the fact that <code class="highlighter-rouge">String</code>s have a built-in <code class="highlighter-rouge">Liftable</code> implementation that
allows quasiquotes to convert it into the appropriate AST without us doing so
explicitly (the AST would be <code class="highlighter-rouge">Literal(Constant(mapKey))</code>).</p>
<p>There are probably two more things in here that stand out: what does it mean to
decode the name? And what’s this t variable that hasn’t been defined anywhere?
(Or has it…?)</p>
<p><a href="http://www.scala-lang.org/files/archive/api/2.11.0-M5/index.html#scala.reflect.api.Names%24NameApi@decoded:String">According to the
docs</a>,
decoding the name “replaces all occurrences of <code class="highlighter-rouge">$op_names</code> in this name by
corresponding operator symbols”. We want this because in the case a parameter
has a name like content-type, we want the map to have the key <code class="highlighter-rouge">content-type</code> and
not <code class="highlighter-rouge">content$minustype</code>.</p>
<p>The <code class="highlighter-rouge">t</code> variable is a bit more tricky. We must remember that all we’re
constructing here is an AST. It is merely some small portion of code. With no
context, this <code class="highlighter-rouge">t</code> variable makes no sense, but if we put it in some context where
some variable <code class="highlighter-rouge">t</code> is defined, then it does make sense. If you look back at the
original definition of the <code class="highlighter-rouge">toMap</code> method we used in the macro, you’ll see that
the name of the variable passed into the <code class="highlighter-rouge">toMap</code> method is, in fact, named <code class="highlighter-rouge">t</code>.
This is the <code class="highlighter-rouge">t</code> that we’re referring to.</p>
<p>Combining all this together, we can advance our macro implementation to include
the <code class="highlighter-rouge">toMap</code> method:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">c</span><span class="o">.</span><span class="nc">Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="o">{</span> <span class="n">q</span><span class="s">"""
new Mappable[$tpe] {
def toMap(t: $tpe) = Map(..$toMapParams)
def fromMap(map: Map[String, Any]) = ???
}
"""</span> <span class="o">}</span></code></pre></figure>
<p>The <code class="highlighter-rouge">toMap</code> method is implemented as described before. The <code class="highlighter-rouge">t</code> variable now has
sufficient context to give the code meaning. We use <code class="highlighter-rouge">..$toMapParams</code> to indicate
that we are passing a <code class="highlighter-rouge">List[T]</code>. There is a <code class="highlighter-rouge">...</code> variant for <code class="highlighter-rouge">List[List[T]]</code>
(e.g., parameter lists for methods) which are shown <a href="http://docs.scala-lang.org/overviews/macros/quasiquotes.html">on the quasiquotes doc
page</a>, but I
haven’t had a chance to try them out.</p>
<p>If you want, you can comment out the <code class="highlighter-rouge">fromMap</code> method from the <code class="highlighter-rouge">Mappable</code> trait and
the macro implementation to give <code class="highlighter-rouge">toMap</code> a try:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">mapify</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">Mappable</span><span class="o">](</span><span class="n">t</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span> <span class="k">=</span>
<span class="n">implicitly</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]].</span><span class="n">toMap</span><span class="o">(</span><span class="n">t</span><span class="o">)</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Item</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">price</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span>
<span class="k">val</span> <span class="n">map</span> <span class="k">=</span> <span class="n">mapify</span><span class="o">(</span><span class="nc">Item</span><span class="o">(</span><span class="s">"lunch"</span><span class="o">,</span> <span class="mf">15.5</span><span class="o">))</span>
<span class="n">println</span><span class="o">(</span><span class="n">map</span><span class="o">(</span><span class="s">"name"</span><span class="o">))</span> <span class="c1">// "lunch"
</span><span class="n">println</span><span class="o">(</span><span class="n">map</span><span class="o">(</span><span class="s">"price"</span><span class="o">))</span> <span class="o">//</span> <span class="mf">15.5</span></code></pre></figure>
<p>Cool, huh?</p>
<h3 id="writing-frommap">Writing fromMap</h3>
<p>The <code class="highlighter-rouge">fromMap</code> method can be written in an analogous way. Let’s take a look at
what we need:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span> <span class="k">=</span> <span class="nc">Person</span><span class="o">(</span>
<span class="n">map</span><span class="o">(</span><span class="s">"name"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span>
<span class="n">map</span><span class="o">(</span><span class="s">"age"</span><span class="o">).</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span></code></pre></figure>
<p>There are two things we need here that we didn’t need for the implementation of
<code class="highlighter-rouge">toMap</code>: the companion object for the <code class="highlighter-rouge">apply</code> method, and the type of each
parameter for the cast. We can get both from the <code class="highlighter-rouge">tpe</code> variable:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">companion</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">typeSymbol</span><span class="o">.</span><span class="n">companionSymbol</span>
<span class="k">def</span> <span class="n">returnType</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">Name</span><span class="o">)</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">declaration</span><span class="o">(</span><span class="n">name</span><span class="o">).</span><span class="n">typeSignature</span></code></pre></figure>
<p>Using these and the same list of fields we had from the <code class="highlighter-rouge">toMap</code> implementation,
we can generate the <code class="highlighter-rouge">fromMap</code> implementation:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">fromMapParams</span> <span class="k">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">field</span> <span class="k">=></span>
<span class="k">val</span> <span class="n">name</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">name</span>
<span class="k">val</span> <span class="n">decoded</span> <span class="k">=</span> <span class="n">name</span><span class="o">.</span><span class="n">decoded</span>
<span class="k">val</span> <span class="n">returnType</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">declaration</span><span class="o">(</span><span class="n">name</span><span class="o">).</span><span class="n">typeSignature</span>
<span class="n">q</span><span class="s">"map($decoded).asInstanceOf[$returnType]"</span>
<span class="o">}</span>
<span class="n">c</span><span class="o">.</span><span class="nc">Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="o">{</span> <span class="n">q</span><span class="s">"""
new Mappable[$tpe] {
def toMap(t: $tpe) = Map(..$toMapParams)
def fromMap(map: Map[String, Any]) = $companion(..$fromMapParams)
}
"""</span> <span class="o">}</span></code></pre></figure>
<p>Remember that decoded is a <code class="highlighter-rouge">String</code> that gets lifted into an AST by the
quasiquotes. <code class="highlighter-rouge">map</code> will be the name of the variable that gets passed to the
<code class="highlighter-rouge">fromMap</code> method. The factory for the case class is the <code class="highlighter-rouge">apply</code> method of the
companion object, which we can call by doing a function application directly on
the companion object’s symbol, just like in standard Scala.</p>
<p>It’s important to note that the order of the parameters that get fed into the
<code class="highlighter-rouge">apply</code> method is important. This is why in the beginning we chose to retrieve
the list of parameters from the primary constructor. By doing so, we’ve
guaranteed ourselves that the order will indeed be correct.</p>
<p>And that’s it! You can try it out like this:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">materialize</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">Mappable</span><span class="o">](</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span> <span class="k">=</span>
<span class="n">implicitly</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]].</span><span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="o">)</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Item</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">price</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span>
<span class="k">val</span> <span class="n">item</span> <span class="k">=</span> <span class="n">materialize</span><span class="o">[</span><span class="kt">Item</span><span class="o">](</span><span class="nc">Map</span><span class="o">(</span><span class="s">"name"</span> <span class="o">-></span> <span class="s">"dinner"</span><span class="o">,</span> <span class="s">"price"</span> <span class="o">-></span> <span class="mf">25.8</span><span class="o">))</span>
<span class="n">println</span><span class="o">(</span><span class="n">item</span><span class="o">.</span><span class="n">name</span><span class="o">)</span> <span class="c1">// "dinner"
</span><span class="n">println</span><span class="o">(</span><span class="n">item</span><span class="o">.</span><span class="n">price</span><span class="o">)</span> <span class="o">//</span> <span class="mf">25.8</span></code></pre></figure>
<h3 id="wrapping-it-up">Wrapping It Up</h3>
<p>This is the complete implementation of the macro. You can also find it in the
<code class="highlighter-rouge">complete-example</code> branch of my <a href="https://github.com/echojc/scala-macro-template">macro template
repo</a>. I’ve taken the liberty
to simplify the code where possible to make it short and concise.</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">scala.reflect.macros.Context</span>
<span class="k">trait</span> <span class="nc">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="o">{</span>
<span class="k">def</span> <span class="n">toMap</span><span class="o">(</span><span class="n">t</span><span class="k">:</span> <span class="kt">T</span><span class="o">)</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">]</span>
<span class="k">def</span> <span class="n">fromMap</span><span class="o">(</span><span class="n">map</span><span class="k">:</span> <span class="kt">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Any</span><span class="o">])</span><span class="k">:</span> <span class="kt">T</span>
<span class="o">}</span>
<span class="k">object</span> <span class="nc">Mappable</span> <span class="o">{</span>
<span class="k">implicit</span> <span class="k">def</span> <span class="n">materializeMappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span><span class="k">:</span> <span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span> <span class="k">=</span>
<span class="n">macro</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="k">def</span> <span class="n">materializeMappableImpl</span><span class="o">[</span><span class="kt">T:</span> <span class="kt">c.WeakTypeTag</span><span class="o">](</span><span class="n">c</span><span class="k">:</span> <span class="kt">Context</span><span class="o">)</span><span class="k">:</span> <span class="kt">c.Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">c.universe._</span>
<span class="k">val</span> <span class="n">tpe</span> <span class="k">=</span> <span class="n">weakTypeOf</span><span class="o">[</span><span class="kt">T</span><span class="o">]</span>
<span class="k">val</span> <span class="n">companion</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">typeSymbol</span><span class="o">.</span><span class="n">companionSymbol</span>
<span class="k">val</span> <span class="n">fields</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">declarations</span><span class="o">.</span><span class="n">collectFirst</span> <span class="o">{</span>
<span class="k">case</span> <span class="n">m</span><span class="k">:</span> <span class="kt">MethodSymbol</span> <span class="kt">if</span> <span class="kt">m.isPrimaryConstructor</span> <span class="k">⇒</span> <span class="kt">m</span>
<span class="o">}.</span><span class="n">get</span><span class="o">.</span><span class="n">paramss</span><span class="o">.</span><span class="n">head</span>
<span class="k">val</span> <span class="o">(</span><span class="n">toMapParams</span><span class="o">,</span> <span class="n">fromMapParams</span><span class="o">)</span> <span class="k">=</span> <span class="n">fields</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">field</span> <span class="k">⇒</span>
<span class="k">val</span> <span class="n">name</span> <span class="k">=</span> <span class="n">field</span><span class="o">.</span><span class="n">name</span>
<span class="k">val</span> <span class="n">decoded</span> <span class="k">=</span> <span class="n">name</span><span class="o">.</span><span class="n">decoded</span>
<span class="k">val</span> <span class="n">returnType</span> <span class="k">=</span> <span class="n">tpe</span><span class="o">.</span><span class="n">declaration</span><span class="o">(</span><span class="n">name</span><span class="o">).</span><span class="n">typeSignature</span>
<span class="o">(</span><span class="n">q</span><span class="s">"$decoded → t.$name"</span><span class="o">,</span> <span class="n">q</span><span class="s">"map($decoded).asInstanceOf[$returnType]"</span><span class="o">)</span>
<span class="o">}.</span><span class="n">unzip</span>
<span class="n">c</span><span class="o">.</span><span class="nc">Expr</span><span class="o">[</span><span class="kt">Mappable</span><span class="o">[</span><span class="kt">T</span><span class="o">]]</span> <span class="o">{</span> <span class="n">q</span><span class="s">"""
new Mappable[$tpe] {
def toMap(t: $tpe): Map[String, Any] = Map(..$toMapParams)
def fromMap(map: Map[String, Any]): $tpe = $companion(..$fromMapParams)
}
"""</span> <span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>I hope this introduction to Scala macros has been helpful. I’m no expert in
them and most of what I’ve done here was the result of <a href="http://www.scala-lang.org/files/archive/api/2.11.0-M5/index.html#scala.reflect.api.Universe">scouring the Scala
docs</a>
and a lot of googling. Comments and suggestions are most welcome!</p>Jonathan ChowRecently I had a go at writing some Scala macros. Scala macros are essentially an advanced version of the traditional C #defines. To someone like me who hasn’t had too much experience with C, its macros feel like a sophisticated find-and-replace tool that gets run before each compile. On the other hand, Scala macros can bring about benefits such as automatic code generation (via implicits), static type safety within strings (when interpolating), and even allow for the creation very fluent DSL interfaces.dynamically creating tests with ScalaTest2013-05-12T00:00:00+00:002013-05-12T00:00:00+00:00http://blog.echo.sh/2013/05/12/dynamically-creating-tests-with-scalatest<p>At the Code Retreat run at Movio this weekend we watched <a href="http://www.youtube.com/watch?v=vX-Yym7166Y">Corey Haines do the
Roman Numerals kata in Ruby</a>. An
interesting thing he did was to list all conversion in a hash and iterate over
it to dynamically create the test for each conversion:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="n">describe</span> <span class="s2">"Converting arabic numbers to roman numerals"</span> <span class="k">do</span>
<span class="p">{</span>
<span class="mi">1</span> <span class="o">=></span> <span class="s2">"I"</span><span class="p">,</span>
<span class="mi">2</span> <span class="o">=></span> <span class="s2">"II"</span><span class="p">,</span>
<span class="mi">5</span> <span class="o">=></span> <span class="s2">"V"</span>
<span class="c1"># ...</span>
<span class="p">}.</span><span class="nf">each_pair</span> <span class="k">do</span> <span class="o">|</span><span class="n">arabic</span><span class="p">,</span> <span class="n">roman</span><span class="o">|</span>
<span class="n">it</span> <span class="s2">"converts </span><span class="si">#{</span><span class="n">arabic</span><span class="si">}</span><span class="s2"> to </span><span class="si">#{</span><span class="n">roman</span><span class="si">}</span><span class="s2">"</span> <span class="k">do</span>
<span class="n">expect</span><span class="p">(</span><span class="n">convert</span><span class="p">(</span><span class="n">arabic</span><span class="p">)).</span><span class="nf">to</span> <span class="n">eq</span><span class="p">(</span><span class="n">roman</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span></code></pre></figure>
<p>Later while attempting to create <a href="http://en.wikipedia.org/wiki/Conway's_Game_of_Life">Conway’s Game of
Life</a> using TDD I came
across some tests that were repetitive in a similar manner:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">describe</span> <span class="o">(</span><span class="s">"alive cells"</span><span class="o">)</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">cell</span> <span class="k">=</span> <span class="nc">Cell</span><span class="o">(</span><span class="nc">Alive</span><span class="o">)</span>
<span class="n">it</span> <span class="o">(</span><span class="s">"should become Dead when there are 0 live neighbours"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">cell</span><span class="o">.</span><span class="n">next</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> <span class="n">should</span> <span class="n">be</span> <span class="o">(</span><span class="nc">Cell</span><span class="o">(</span><span class="nc">Dead</span><span class="o">))</span>
<span class="o">}</span>
<span class="n">it</span> <span class="o">(</span><span class="s">"should become Dead when there are 1 live neighbours"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">cell</span><span class="o">.</span><span class="n">next</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span> <span class="n">should</span> <span class="n">be</span> <span class="o">(</span><span class="nc">Cell</span><span class="o">(</span><span class="nc">Dead</span><span class="o">))</span>
<span class="o">}</span>
<span class="n">it</span> <span class="o">(</span><span class="s">"should become Alive when there are 2 live neighbours"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">cell</span><span class="o">.</span><span class="n">next</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span> <span class="n">should</span> <span class="n">be</span> <span class="o">(</span><span class="nc">Cell</span><span class="o">(</span><span class="nc">Alive</span><span class="o">))</span>
<span class="o">}</span>
<span class="n">it</span> <span class="o">(</span><span class="s">"should become Alive when there are 3 live neighbours"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">cell</span><span class="o">.</span><span class="n">next</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span> <span class="n">should</span> <span class="n">be</span> <span class="o">(</span><span class="nc">Cell</span><span class="o">(</span><span class="nc">Alive</span><span class="o">))</span>
<span class="o">}</span>
<span class="c1">// ...
</span><span class="o">}</span></code></pre></figure>
<p>It turns out that ScalaTest also supports creating tests using the same style:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">describe</span> <span class="o">(</span><span class="s">"alive cells"</span><span class="o">)</span> <span class="o">{</span>
<span class="k">val</span> <span class="n">cell</span> <span class="k">=</span> <span class="nc">Cell</span><span class="o">(</span><span class="nc">Alive</span><span class="o">)</span>
<span class="nc">Seq</span><span class="o">(</span>
<span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="nc">Dead</span><span class="o">),</span>
<span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="nc">Dead</span><span class="o">),</span>
<span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="nc">Alive</span><span class="o">),</span>
<span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Alive</span><span class="o">)</span>
<span class="c1">// ...
</span> <span class="o">)</span> <span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">count</span><span class="o">,</span> <span class="n">state</span><span class="o">)</span> <span class="k">=></span>
<span class="n">it</span> <span class="o">(</span><span class="n">s</span><span class="s">"should become $state when there are $count live neighbours"</span><span class="o">)</span> <span class="o">{</span>
<span class="n">cell</span><span class="o">.</span><span class="n">next</span><span class="o">(</span><span class="n">count</span><span class="o">)</span> <span class="n">should</span> <span class="n">be</span> <span class="o">(</span><span class="nc">Cell</span><span class="o">(</span><span class="n">state</span><span class="o">))</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></figure>
<p>It might take some getting used to, but I think it’s a rather nice way to run
different inputs for the same test. It lets you isolate each input into its own
test case without any code duplication and concisely lists all your test cases
and expected outputs together.</p>
<p>You can see how I did Corey’s Roman Numerals kata in Scala (along with other
katas I have done/will do) <a href="https://github.com/echojc/scala-kata">on my GitHub</a>.</p>Jonathan ChowAt the Code Retreat run at Movio this weekend we watched Corey Haines do the Roman Numerals kata in Ruby. An interesting thing he did was to list all conversion in a hash and iterate over it to dynamically create the test for each conversion:birthday problem2013-03-22T00:00:00+00:002013-03-22T00:00:00+00:00http://blog.echo.sh/2013/03/22/birthday-problem<p>Recently at work we came across a case where we needed to generate up to 10,000
random unique numbers. We had to fit it into 23 bits, giving us roughly 8
million different numbers to choose from.</p>
<p>We’d all done combinatorics before, so we knew that if we were to randomly
generate these numbers, the chance of there being a collision <a href="http://en.wikipedia.org/wiki/Birthday_problem">isn’t going to be
as low as what our intuition tells
us</a>. But none of us were really
that fluent with our math, so when we plugged our formula into <a href="http://www.wolframalpha.com/">Wolfram
Alpha</a> and it spit out 99.8% chance of a
collision, we were sure that the problem was with our formula and not with the
scenario.</p>
<p>I ended up testing the situation empirically in the Scala REPL and it turns out
the math was right after all. Here’s the template for empirically testing the
classic birthday problem:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">scala.util.Random</span>
<span class="k">def</span> <span class="n">sample</span><span class="o">(</span><span class="n">size</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">limit</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span>
<span class="nc">Stream</span><span class="o">.</span><span class="n">continually</span><span class="o">(</span><span class="nc">Random</span><span class="o">.</span><span class="n">nextInt</span><span class="o">(</span><span class="n">limit</span><span class="o">)).</span><span class="n">take</span><span class="o">(</span><span class="n">size</span><span class="o">)</span>
<span class="k">def</span> <span class="n">isUnique</span><span class="o">(</span><span class="n">sample</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span>
<span class="n">sample</span><span class="o">.</span><span class="n">distinct</span><span class="o">.</span><span class="n">size</span> <span class="o">==</span> <span class="n">sample</span><span class="o">.</span><span class="n">size</span>
<span class="k">def</span> <span class="n">collisionChance</span><span class="o">(</span><span class="n">size</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">limit</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">times</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Double</span> <span class="o">=</span>
<span class="o">(</span><span class="nc">Stream</span>
<span class="o">.</span><span class="n">continually</span><span class="o">(</span><span class="n">sample</span><span class="o">(</span><span class="n">size</span><span class="o">,</span> <span class="n">limit</span><span class="o">))</span>
<span class="o">.</span><span class="n">take</span><span class="o">(</span><span class="n">times</span><span class="o">)</span>
<span class="o">.</span><span class="n">count</span><span class="o">(</span><span class="n">isUnique</span><span class="o">)</span>
<span class="o">.</span><span class="n">toDouble</span> <span class="o">/</span> <span class="n">times</span><span class="o">)</span>
<span class="n">assert</span><span class="o">(</span><span class="n">collisionChance</span><span class="o">(</span><span class="mi">23</span><span class="o">,</span> <span class="mi">365</span><span class="o">,</span> <span class="mi">10000</span><span class="o">)</span> <span class="o">~=</span> <span class="mf">0.5</span><span class="o">)</span></code></pre></figure>Jonathan ChowRecently at work we came across a case where we needed to generate up to 10,000 random unique numbers. We had to fit it into 23 bits, giving us roughly 8 million different numbers to choose from.