Prerequisites
Strings and Binaries
Understand Elixir UTF-8 strings, binary representation, string interpolation, the String module, charlists, and numeric parsing. With Python and JS comparisons.
Strings in Elixir are UTF-8 encoded binaries. This means that every double-quoted string like "hello" is actually a sequence of bytes in memory, and Elixir provides full Unicode support out of the box. This lesson covers everything from basic string operations to the binary underpinnings, the String module, charlists, and parsing strings into numeric types.
String Basics
Strings in Elixir are delimited by double quotes and support interpolation, escape sequences, and multi-line content:
# Basic string
name = "Alice"
# String interpolation with #{}
greeting = "Hello, #{name}!"
# => "Hello, Alice!"
# Interpolation evaluates any expression
"2 + 2 = #{2 + 2}"
# => "2 + 2 = 4"
# Escape sequences
"Line one\nLine two"
"Tab\there"
"A backslash: \\"
"A quote: \""
# Multi-line strings
multi = "This is
a multi-line
string"
# Heredoc syntax for long multi-line strings
doc = """
This is a heredoc string.
It preserves newlines and
is great for documentation.
"""
Strings Are Binaries
In Elixir, a string is a binary – a contiguous sequence of bytes. The is_binary/1 function returns true for strings. Each character is encoded as one or more bytes using UTF-8:
- ASCII characters (a-z, 0-9, etc.) use 1 byte each
- Many accented characters use 2 bytes
- CJK characters, emoji, and others may use 3 or 4 bytes
This means byte_size/1 (the number of bytes) can differ from String.length/1 (the number of grapheme clusters, i.e., visible characters).
iex> is_binary("hello")
true
iex> byte_size("hello")
5
iex> String.length("hello")
5
iex> byte_size("helo")
6
iex> String.length("helo")
4
iex> byte_size("neko")
12
iex> String.length("neko")
4
String Concatenation and Operations
# Concatenation with <>
"Hello" <> " " <> "World"
# => "Hello World"
# Concatenation in pattern matching
"Hello, " <> name = "Hello, Alice"
name # => "Alice"
# String.contains?
String.contains?("Hello World", "World") # => true
String.contains?("Hello World", ["Foo", "World"]) # => true
# String.starts_with? and String.ends_with?
String.starts_with?("Elixir", "El") # => true
String.ends_with?("hello.ex", ".ex") # => true
The String Module
Elixir’s String module provides a comprehensive set of functions for working with UTF-8 strings:
# Case conversion
String.upcase("hello") # => "HELLO"
String.downcase("HELLO") # => "hello"
String.capitalize("hello") # => "Hello"
# Trimming whitespace
String.trim(" hello ") # => "hello"
String.trim_leading(" hello") # => "hello"
String.trim_trailing("hello ") # => "hello"
# Splitting
String.split("a,b,c", ",") # => ["a", "b", "c"]
String.split("hello world") # => ["hello", "world"]
String.split("a--b--c", "--") # => ["a", "b", "c"]
String.split("abc", "", trim: true) # => ["a", "b", "c"]
# Replacing
String.replace("Hello World", "World", "Elixir")
# => "Hello Elixir"
# Padding
String.pad_leading("42", 5, "0") # => "00042"
String.pad_trailing("hi", 10, ".") # => "hi........"
# Slicing
String.slice("Hello World", 0, 5) # => "Hello"
String.slice("Hello World", 6..-1//1) # => "World"
# Reversing (Unicode-aware)
String.reverse("hello") # => "olleh"
# Duplicate
String.duplicate("ha", 3) # => "hahaha"
# Length vs byte size
String.length("cafe") # => 5
byte_size("cafe") # => 6
# Elixir - String module functions
text = " Hello, World! "
text
|> String.trim()
|> String.downcase()
|> String.replace("world", "elixir")
|> String.split(", ")
# => ["hello", "elixir!"]
# Python - string methods
text = " Hello, World! "
text.strip().lower().replace("world", "elixir").split(", ")
# => ['hello', 'elixir!']
// JavaScript - string methods
const text = " Hello, World! ";
text.trim().toLowerCase().replace("world", "elixir").split(", ");
// => ["hello", "elixir!"]
Parsing Strings to Numbers
Elixir provides multiple ways to convert strings to numeric types, each with different behavior for invalid inputs.
String.to_float and Float.parse
# String.to_float/1 requires a proper float format with a decimal point
String.to_float("2.2017764e+0") # => 2.2017764
String.to_float("3.0") # => 3.0
# String.to_float raises on integers or invalid strings:
# String.to_float("34") # => ** (ArgumentError)
# String.to_float("abc") # => ** (ArgumentError)
# Float.parse/1 is more forgiving -- it returns a tuple
Float.parse("34") # => {34.0, ""}
Float.parse("34.25") # => {34.25, ""}
Float.parse("56.5xyz") # => {56.5, "xyz"}
Float.parse("xyz56.5") # => :error
Float.parse("") # => :error
String.to_integer and Integer.parse
# String.to_integer/1 requires a pure integer string
String.to_integer("4") # => 4
String.to_integer("042") # => 42
# Raises on non-integer strings:
# String.to_integer("4.1") # => ** (ArgumentError)
# String.to_integer("abc") # => ** (ArgumentError)
# Integer.parse/1 is more forgiving
Integer.parse("34") # => {34, ""}
Integer.parse("34.25") # => {34, ".25"}
Integer.parse("0xFF", 16) # => {255, ""}
Integer.parse("abc") # => :error
Use String.to_integer/1 and String.to_float/1 when you are certain the input is valid (e.g., data you control). Use Integer.parse/1 and Float.parse/1 when handling external input, since they return :error instead of raising on invalid data. Combine them with pattern matching for clean error handling:
case Integer.parse(user_input) do
{number, ""} -> {:ok, number}
{_number, _rest} -> {:error, "trailing characters"}
:error -> {:error, "not a number"}
end
Binaries
Under the hood, strings are binaries, and you can work with binary data directly using the <<>> syntax:
# A binary is a sequence of bytes
<<0, 1, 2, 3>>
# Strings are UTF-8 binaries
"hello" == <<104, 101, 108, 108, 111>>
# => true
# You can specify bit sizes
<<3::size(2), 1::size(3), 2::size(3)>>
# => <<202>> (binary: 11 001 010)
# Binary pattern matching
<<first_byte, rest::binary>> = "hello"
first_byte # => 104 (ASCII 'h')
rest # => "ello"
# Match a fixed-size prefix
<<head::binary-size(3), tail::binary>> = "hello world"
head # => "hel"
tail # => "lo world"
Binary Pattern Matching
Binary pattern matching is a powerful feature for parsing binary protocols, file formats, and network data:
defmodule BinaryParser do
# Parse a simple binary header: 1 byte version, 2 bytes length, rest is payload
def parse(<<version::8, length::16, payload::binary-size(length), _rest::binary>>) do
%{version: version, length: length, payload: payload}
end
# Parse an RGB color from a hex string like "FF8800"
def parse_hex_color(<<r::binary-size(2), g::binary-size(2), b::binary-size(2)>>) do
{String.to_integer(r, 16), String.to_integer(g, 16), String.to_integer(b, 16)}
end
end
BinaryParser.parse_hex_color("FF8800")
# => {255, 136, 0}
Charlists
Charlists are lists of integer codepoints, written with single quotes or the ~c sigil. They exist primarily for compatibility with Erlang, which uses them as its native string type.
# Charlists
~c"hello" # => ~c"hello"
is_list(~c"hello") # => true
# They are just lists of integers
~c"hello" == [104, 101, 108, 108, 111]
# => true
# Converting between strings and charlists
String.to_charlist("hello") # => ~c"hello"
List.to_string(~c"hello") # => "hello"
to_string(~c"hello") # => "hello"
to_charlist("hello") # => ~c"hello"
"hello") unless you are specifically interfacing with Erlang code that requires charlists. The String module works only with binaries (double-quoted strings), not charlists.Sigils
Sigils provide syntactic sugar for working with textual representations. The most common string-related sigils are:
# ~s creates a string (useful when the string contains double quotes)
~s(She said "hello")
# => "She said \"hello\""
# ~S creates a string without interpolation or escaping
~S(No #{interpolation} here\n)
# => "No \#{interpolation} here\\n"
# ~w creates a word list
~w(apple banana cherry)
# => ["apple", "banana", "cherry"]
# ~w with atom modifier
~w(apple banana cherry)a
# => [:apple, :banana, :cherry]
# ~r creates a regular expression
Regex.match?(~r/hello/, "hello world")
# => true
Practical Example: Text Processing Pipeline
Here is a complete example combining many string operations with the pipe operator:
defmodule TextProcessor do
@doc "Extracts and counts unique words from text, ignoring case and punctuation"
def word_frequencies(text) do
text
|> String.downcase()
|> String.replace(~r/[^\w\s]/u, "")
|> String.split()
|> Enum.frequencies()
|> Enum.sort_by(fn {_word, count} -> count end, :desc)
end
@doc "Truncates a string to max_length, adding ellipsis if truncated"
def truncate(text, max_length \\ 50) do
if String.length(text) <= max_length do
text
else
text
|> String.slice(0, max_length - 3)
|> Kernel.<>("...")
end
end
@doc "Converts a string to a URL-friendly slug"
def slugify(text) do
text
|> String.downcase()
|> String.replace(~r/[^\w\s-]/u, "")
|> String.trim()
|> String.replace(~r/\s+/, "-")
end
end
TextProcessor.word_frequencies("the cat sat on the mat the cat")
# => [{"the", 3}, {"cat", 2}, {"sat", 1}, {"on", 1}, {"mat", 1}]
TextProcessor.truncate("This is a very long string that should be truncated", 30)
# => "This is a very long string ..."
TextProcessor.slugify("Hello World! This is Elixir.")
# => "hello-world-this-is-elixir"
String Parser Module
Create a Parser module with these functions:
parse_csv_line(line)– splits a CSV line by commas and trims whitespace from each field. Returns a list of strings.Parser.parse_csv_line(" Alice , 30 , admin ") # => ["Alice", "30", "admin"]parse_key_value(string, separator \\ "=")– parses a"key=value"string into a{key, value}tuple with trimmed strings.Parser.parse_key_value("name = Alice") # => {"name", "Alice"}safe_to_integer(string)– converts a string to an integer, returning{:ok, integer}or{:error, :invalid}. UseInteger.parse/1internally.Parser.safe_to_integer("42") # => {:ok, 42} Parser.safe_to_integer("42abc") # => {:error, :invalid} Parser.safe_to_integer("abc") # => {:error, :invalid}extract_emails(text)– extracts all email addresses from a block of text usingRegex.scan/2.
Bonus: Write a parse_csv(text) function that splits the text by newlines and maps parse_csv_line/1 over each line, returning a list of lists.
Related Lessons
Key Takeaways
- You can explain the core ideas in this lesson and when to apply them in Elixir projects
- You can use the primary APIs and patterns shown here to build working solutions
- You can spot common mistakes for this topic and choose more idiomatic approaches