UTF-8, ASCII, UTF-16 — Which Encoding Should Your .txt File Use?

Under the Hood · 5 min read · March 2025

Encoding is one of those topics that sits quietly in the background until it causes a problem. When it does cause a problem — garbled characters, broken scripts, files that look fine on your machine but display garbage on someone else's — it's encoding. Here's what you need to know.

What encoding actually is

Every character in a text file is stored as a number. Encoding is the system that defines which number corresponds to which character. The letter "A" might be stored as the number 65. The character "é" needs a more complex representation. "中" (Chinese character for "middle") needs even more space.

Different encoding systems handle this mapping differently, and if a file is created with one encoding but read with another, the characters don't match up — you get garbled output.

ASCII — the original, smallest

ASCII (American Standard Code for Information Interchange) covers 128 characters: the 26 English letters in upper and lower case, digits 0–9, punctuation, and some control characters. It was designed in the 1960s for English-language computing.

If your file contains only standard English characters and basic punctuation, ASCII works fine. The problem is the moment you need an accent mark, a non-English character, an emoji, or anything outside basic English — ASCII can't represent it at all. Use ASCII only if you're certain your content will always be plain English and you have a specific reason to use it.

UTF-8 — the right choice almost always

UTF-8 can represent every character in Unicode — over a million characters covering every language, symbol, and emoji. For characters that exist in ASCII, UTF-8 uses exactly the same byte values, so pure ASCII files are valid UTF-8.

UTF-8 is the default encoding on macOS, Linux, and the modern web. When you create a .txt file with Terminal's touch command or download one from txtnote.online, it's UTF-8. When a server reads a text file, it assumes UTF-8 unless told otherwise.

Default to UTF-8. Unless you have a specific, documented reason to use something else, UTF-8 is the correct encoding for any plain text file in 2025.

UTF-16 — for specific Windows/legacy situations

UTF-16 uses at least two bytes for every character, even ASCII ones. This means UTF-16 files are roughly twice the size of UTF-8 files for English text. Some older Windows applications and some Microsoft file formats use UTF-16 internally, which is why it shows up as an option.

On macOS you generally have no reason to use UTF-16 for .txt files. If you're receiving files from a Windows system and they look garbled, UTF-16 might be the cause — open the file in TextEdit and check Format → File Encoding to see what it's reading.

How to check a file's encoding on Mac

Open Terminal and run:

file -I yourfile.txt

This reports the file's detected encoding. A UTF-8 file will show something like text/plain; charset=utf-8. A file with an unknown or unexpected encoding will show that too.

How to convert encoding on Mac

If you need to convert a file from one encoding to another, iconv is built into macOS:

iconv -f UTF-16 -t UTF-8 input.txt > output.txt

The -f flag is "from" and -t is "to". This creates a new file with the correct encoding without modifying the original.

The short answer: always use UTF-8. It handles every language and character, it's universally supported, and it's what everything on your Mac already expects. The encoding selectors in text editors exist for legacy situations and specific technical workflows — for normal use, UTF-8 is the answer every time.