Data Representation: Text and Images
Why This Matters
This lesson explores how computers represent text and images digitally. We will cover character encoding schemes like ASCII and Unicode, and delve into the principles of representing images using pixels, colour depth, and resolution.
Key Words to Know
Introduction to Character Sets and Encoding
Computers process information in binary, so every character, including letters, numbers, and symbols, must be converted into a binary code. A character set is a defined list of characters that a computer can recognise and process. Each character in the set is assigned a unique binary code through a process called character encoding.
Early character sets were limited due to memory constraints. For instance, a 7-bit character set can represent 2^7 = 128 unique characters, while an 8-bit set can represent 2^8 = 256 characters. The choice of character set directly impacts the range of languages and symbols that can be displayed and processed by a computer system. Understanding character encoding is fundamental to comprehending how text is stored, transmitted, and displayed accurately across different systems and languages.
ASCII and Extended ASCII
ASCII (American Standard Code for Information Interchange) was one of the earliest and most widely adopted character encoding standards. It uses 7 bits to represent 128 characters, including uppercase and lowercase English letters, digits (0-9), punctuation marks, and control characters (e.g., newline, tab).
- Advantages of ASCII: Simplicity, widespread adoption in early computing, efficient for English text.
- Limitations of ASCII: Only supports English characters, insufficient for representing characters from other languages (e.g., French accents, German umlauts, Asian scripts).
To address these limitations, Extended ASCII was developed. This typically used 8 bits, allowing for 256 characters. The additional 128 characters often included accented letters, graphical symbols, and other characters specific to certain Western European languages. However, different Extended ASCII variants existed, leading to compatibility issues when exchanging text between systems using different extensions.
Unicode and UTF-8
The need for a universal character encoding standard that could represent text from all writing systems led to the development of Unicode. Unicode aims to provide a unique number (code point) for every character, regardless of the platform, program, or language. It supports over a million characters, encompassing almost all known scripts, symbols, and emojis.
Unicode itself is a large character set, but it requires encoding schemes to represent these code points in binary. The most common Unicode encoding scheme is UTF-8 (Unicode Transformation Format - 8-bit).
- UTF-8 is a variable-length encoding scheme. This means that characters are represented using 1 to 4 bytes, depending on their code point. Common ASCII characters are represented using a single byte, making it backward compatible with ASCII. Characters from other languages require more bytes. This efficiency makes UTF-8 the dominant encoding on the web and in many modern operating systems.
- Other Unicode encodings include UTF-16 (variable-length, 2 or 4 bytes per character) and UTF-32 (fixed-length, 4 bytes per character).
Representing Images: Pixels, Resolution, and Colour Depth
Digital images are represented as a grid of tiny coloured squares called pixels (picture elements). Each pixel is as...
Image File Size and Metadata
The file size of an image is directly influenced by its resolution and colour depth. A higher resolution or a greate...
2 more sections locked
Upgrade to Starter to unlock all study notes, audio listening, and more.
Exam Tips
- 1.Be able to calculate the storage requirements for text given the character set size and number of characters, and for images given resolution and colour depth.
- 2.Clearly distinguish between ASCII, Extended ASCII, and Unicode/UTF-8, explaining their advantages and limitations.
- 3.Understand the relationship between colour depth, the number of colours, and image file size. Practice calculations for different bit depths.
- 4.Explain the terms 'pixel', 'resolution', and 'colour depth' accurately and describe how they impact image quality and storage.
- 5.Know the purpose of metadata in image files and be able to give examples of common metadata attributes.