359040 (1) [Avatar] Offline
#1
Note: apparently unicode characters are not allowed in this forum, so imagine a unicode smiley face in place of the question marks, or view the code in this Gist: https://gist.github.com/padde/08b257f7aba523318156

In the section about matching binary strings you write:

Variables b1, b2, and b3 hold corresponding bytes from the string you matched on. This isn’t very useful, especially if you’re dealing with unicode strings. Extracting individual characters is better done using functions from the String module.


If I get this right, you mean to prevent the reader from slipping into a situation where a multi-byte unicode character will be split up, resulting in invalid strings:

iex> <<first, rest::binary>> = "? how dare you, unicode!"
"? how dare you, unicode!"

iex> <<first>>
<<226>>

iex> rest
<<152, 186, 32, 104, 111, 119, 32, 100, 97, 114, 101, 32, 121, 111, 117, 44, 32, 117, 110, 105, 99, 111, 100, 101, 33>>


However, there is the size expression ::utf8 for exactly this purpose, and I find it quite convenient to work with:

iex> <<first::utf8, rest::binary>> = "? how dare you, unicode!"
"? how dare you, unicode!"

iex> <<first::utf8>>
"?"

iex> rest
" how dare you, unicode!"


Please consider mentioning ::utf8 in the next release.

Best,
Patrick
sjuric (86) [Avatar] Offline
#2
Very good point, thanks for noticing!

I should probably include a link to full reference on binary matching (http://elixir-lang.org/docs/stable/elixir/Kernel.SpecialForms.html#<<>>/1).

I'll keep it in mind for the next edition.