A glyph symbol defines a glyph’s shape (shape of the character) and how it relates to the symbol that it represents. The symbol represents a particular thing.

This is used in the Unicode Standard, and in character encodings in general for representing text. As we will see later, glyph symbols exist separately from the character itself, so that they are not rendered as the real character itself.

One simple example of a glyph symbol is a ‘U’ for Unicode 8. The U stands for Universal and the I for International.

When we use UTF-8, we know for a fact that all characters have an uppercase, lowercase, and a punctuation character (not case sensitive). We also know that a non-breaking space is an uppercase, a non-breaking dash an accented lowercase, and a non-breaking asterisk or exclamation point and hyphen an uppercased punctuation or a punctuation mark (except for ‘C’ which would be a single punctuation or a hyphen).

In the same way that Latin words are used within the Unicode Standard, so are the Latin letters themselves called ‘glyphs’. They’re just like the letters themselves.

When a Unicode character does not have a glyph symbol, then it can be represented as a Unicode code point. There is also a ‘U+1000’ which stands for Unicode Extended Unicode.

Example from the Unicode Standard ‘U+0000’

This is an example of a text character used in Windows 95. The “u”s stand for Universal and the character “0000” stands for International. The “U”s are the Unicode unicode values for these two strings ‘0000’ and ‘FFFF’, respectively.

What is a character encoding? Which character encoding should we use?

A character encoding, also called a character set should be the encoding that is most efficient for representing text that the user wants to read.

A character encoding works with the ASCII, WIC, ODC, or UCS character sets. They are all character sets that encode characters like Greek, Arabic, and Cyrillic. Each character set specifies how the symbols they encode should be encoded in. In an ASCII character set, a character encoding (typically the characters 0-9) is given to the characters that can be converted from one or more of these character sets to another character set. In other words, for each encoding, there were characters that could be

