Lesson 3 50 min

Data Representation: Text and Images

AI Explain — Ask anything
AI Illustrate — Make it visual

Why This Matters

This lesson explores how computers represent text and images digitally. We will cover character encoding schemes like ASCII and Unicode, and delve into the principles of representing images using pixels, colour depth, and resolution.

Key Words to Know

01
Character Set — A defined list of characters that a computer can recognise and process.
02
Character Encoding — The process of assigning a unique binary code to each character in a character set.
03
ASCII — American Standard Code for Information Interchange, an early 7-bit character encoding standard.
04
Unicode — A universal character encoding standard designed to represent text from all writing systems, using variable-length encoding (e.g., UTF-8, UTF-16).
05
Pixel — The smallest addressable element in a digital image, representing a single point of colour.
06
Resolution — The number of pixels per unit of area (e.g., pixels per inch) or the total number of pixels in an image (width x height).
07
Colour Depth — The number of bits used to represent the colour of a single pixel, determining the range of colours available.
08
Metadata — Data that describes other data, such as image dimensions, colour depth, or creation date.

Introduction to Character Sets and Encoding

Computers process information in binary, so every character, including letters, numbers, and symbols, must be converted into a binary code. A character set is a defined list of characters that a computer can recognise and process. Each character in the set is assigned a unique binary code through a process called character encoding.

Early character sets were limited due to memory constraints. For instance, a 7-bit character set can represent 2^7 = 128 unique characters, while an 8-bit set can represent 2^8 = 256 characters. The choice of character set directly impacts the range of languages and symbols that can be displayed and processed by a computer system. Understanding character encoding is fundamental to comprehending how text is stored, transmitted, and displayed accurately across different systems and languages.

ASCII and Extended ASCII

ASCII (American Standard Code for Information Interchange) was one of the earliest and most widely adopted character encoding standards. It uses 7 bits to represent 128 characters, including uppercase and lowercase English letters, digits (0-9), punctuation marks, and control characters (e.g., newline, tab).

  • Advantages of ASCII: Simplicity, widespread adoption in early computing, efficient for English text.
  • Limitations of ASCII: Only supports English characters, insufficient for representing characters from other languages (e.g., French accents, German umlauts, Asian scripts).

To address these limitations, Extended ASCII was developed. This typically used 8 bits, allowing for 256 characters. The additional 128 characters often included accented letters, graphical symbols, and other characters specific to certain Western European languages. However, different Extended ASCII variants existed, leading to compatibility issues when exchanging text between systems using different extensions.

Unicode and UTF-8

The need for a universal character encoding standard that could represent text from all writing systems led to the development of Unicode. Unicode aims to provide a unique number (code point) for every character, regardless of the platform, program, or language. It supports over a million characters, encompassing almost all known scripts, symbols, and emojis.

Unicode itself is a large character set, but it requires encoding schemes to represent these code points in binary. The most common Unicode encoding scheme is UTF-8 (Unicode Transformation Format - 8-bit).

  • UTF-8 is a variable-length encoding scheme. This means that characters are represented using 1 to 4 bytes, depending on their code point. Common ASCII characters are represented using a single byte, making it backward compatible with ASCII. Characters from other languages require more bytes. This efficiency makes UTF-8 the dominant encoding on the web and in many modern operating systems.
  • Other Unicode encodings include UTF-16 (variable-length, 2 or 4 bytes per character) and UTF-32 (fixed-length, 4 bytes per character).

Representing Images: Pixels, Resolution, and Colour Depth

Digital images are represented as a grid of tiny coloured squares called pixels (picture elements). Each pixel is as...

This section is locked

Image File Size and Metadata

The file size of an image is directly influenced by its resolution and colour depth. A higher resolution or a greate...

This section is locked

2 more sections locked

Upgrade to Starter to unlock all study notes, audio listening, and more.

Exam Tips

  • 1.Be able to calculate the storage requirements for text given the character set size and number of characters, and for images given resolution and colour depth.
  • 2.Clearly distinguish between ASCII, Extended ASCII, and Unicode/UTF-8, explaining their advantages and limitations.
  • 3.Understand the relationship between colour depth, the number of colours, and image file size. Practice calculations for different bit depths.
  • 4.Explain the terms 'pixel', 'resolution', and 'colour depth' accurately and describe how they impact image quality and storage.
  • 5.Know the purpose of metadata in image files and be able to give examples of common metadata attributes.
👋 Ask Aria anything!