Add to Technorati Favorites Electronics Projects and Articles: CHARACTERS AND THE ASCII CODE

Friday, February 5, 2010

CHARACTERS AND THE ASCII CODE

I've created this blog to write electronics projects and electronics articles which can be used by any person interested in the electronic design. Your questions, doubts, comments and suggestions are welcome providing that they are written in a courteous and respectful way.

--Español-----English



CHARACTERS AND THE ASCII CODE

Abstract: When we talk about information, messages and data in electronic design we must talk about characters and their representation (encoding). This is important because we know that information is sent and received by using digital media. Around the world there are many languages therefore thousands of characters exist. This leads to the existence a lot of code tables in order to represent those characters. This variety confuses users. For instance, there is confusion between ASCII encoding and ALT codes for Microsoft users.
In this article, I try to address the most common questions about the encoding of characters. To illustrate I briefly introduce the ASCII code and its representations in several numeric systems and its use in electronic design.

Goals:
  • To introduce the concepts about representation of information and encoding to the reader of this electronic design blog. I assume that the reader knows about numerical systems and he has worked with them. In this article, the numerical systems are not explained but there are links to their basic explanations.
  • Finding differences between ASCII code, control characters, Extended ASCII code, Code Pages, Unicode code and ALT codes.
  • To develop an online tool in order to facilitate the conversion of ASCII characters to common numerical representations such as: Decimal, Hexadecimal, Binary and BCD. This tool will be developed in Java language.
Utility: Using the online tool in this article to find the equivalent codes of ASCII characters in the Decimal System, Binary System, Hexadecimal System and BCD Systems. Also, to identify the true name for each ASCII character. Finally, understanding why the ASCII characters are used in electronic design.

Fundamentals

In electronics, the majority of the management of information (data) is carried out by using digital media. That implies that standards and rules exist for encoding the data in such a way that machines can interpret that information and show it to a human user or another machine properly.
The main component in a message is the character. A set of characters for a language is named an alphabet.
As we already know, a character uses digital media in order to travel from one machine or device to another. Each character is ascribed with a numeric value in the chosen numerical system and this is what is known as coding.
For example: Someone had the idea to ascribe the numbers from 1 up to 26 to the English alphabet. He has special hardware and firmware and he wants to send word “hello” word from one electronic device to another. He does this using his encoding, therefore the word is: 85121215. This is an example of a simple coding.

Generally, any coding is not dependent on the transmission media which can be wired, wireless or even via satellite.

Note: It is important that you understand the difference between the terms encode and encrypt.

Since the encoding topic is not new, we should take into account that there are many standards and rules used in character encoding. Also, we need to know the numeric systems and the encoding systems most often used. In electronic design the basic numerical systems used to encode characters, are: Decimal, Binary, Hexadecimal and BCD.
When we transmit information it is possible that our receptor, for instance Windows HyperTerminal, shows characters unknown to us and we do not know which it is its interpretation in Hexadecimal or Binary. If we convert the received message to Hexadecimal or Binary this conversion will display the correct characters. Frequently, these sent characters are encoded in ASCII.
The ASCII code or US-ASCII was created in 1963 to transfer information between electric and electronic equipment. The ASCII code is based on the English alphabet. In the beginning, the ASCII code also included some punctuation signs, Arabic numbers and capital letters of the English alphabet. Lowercases were added in 1967. If you want to know more about ASCII history you could click here.

Now, the ASCII code is compound of 128 codes, of which the first 32 (0-31) and the 127 code are known as control characters or unprintable characters. These control characters also can be displayed on your screen but they cannot be printed. As additional information, the control characters were used to control printers and peripheral devices using the parallel port and COM ports of the computer.

With the online tool of this article you can see all ASCII characters and their values for different numeric systems.

In electronic design the ASCII code is often used to communicate between two devices regardless of whether you are making a control (master-slave) or a simple transmission (master-master).
The ASCII code is used because every character has two nibbles which facilitate handling and packaging when many bytes are transmitted. Although each ASCII character is represented by 7 bits, all 8 bits are used, but the MSB is always 0.
Many people will think: If I use the MSB of the byte it will increase the possible characters up to 256 (2^8) and with this I will include new characters and symbols.
This increase up to 256 characters and symbols in ASCII code already exists and it was named Extended ASCII.
Don’t worry; this was what many people and companies thought when they saw that the ASCII code was so limited. Therefore, each company created its own Extended ASCII. Later, these Extended ASCII tables were called code pages.

Since the character encoding is also used to communicate from a keyboard to a computer and to display the characters on the screen all keyboards have their own distribution of characters and this distribution depends on the language and manufacturer. For example, if you want to see Microsoft’s keyboard distribution for several languages you could click here .

Not all characters can be displayed on the keyboard. Many times we need to include special characters into our manuscripts or files in order to present them properly. It is possible, depending on the font used, that some characters can be included and others not. For example, the fonts Arial and Symbol do not have the same characters. Adding this to the code pages may be confusing to the common user.
The characters on a keyboard and the character codes for commercial computers depend on the code page. All code pages have the first 128 ASCII codes, but the codes greater than 128 depend on the particular Extended ASCII used by the manufacturer and the language. Because many code pages exist, this can create confusion for the common user. However, I must add that this is a reason why electronic designers use the ASCII code in order to transmit data because the ASCII code is not depending on manufacturer or language, in fact all code pages include the ASCII code. For this reason, another advantage is that the ASCII code does not depend on the operating system.

As character encoding has been evolving, there has been a demand for a general standard (one table only) that includes all characters of all languages and, of course, all symbols and punctuation signs. When we type ALT+ DECIMAL NUMBER in Word, where DECIMAL NUMBER even can be a number greater that 256 and still we can obtain a character. Then, not only a code page is present but also the Unicode standards. The most used is UTF-8. Gathering all this we can confuse.
In figure 1 there is an example of a distribution map of characters for MS DOS operating system taking into account code page 850 (PC850: Multilingual code page, including all characters from most of European languages, North and South American) and the UTF-8 standard. I hope this figure clarifies the things.



Figure 1. Distribution of Characters Codes for MS DOS.

We should take into account that the code pages vary according to the language, the font and even the operating system. What this means is that the ALT codes can change from one computer to another. What this means is that the ALT codes can change from a computer to another except for ASCII codes.

If your operating system is WINDOWS 2000, XP or Vista you can follow these steps to see the default code page:

1. Go to prompt using WINDOWS key + R.
2. Write CMD in the textbox.
3. In the new window write CHCP and,
4. Intro.

Using the ASCII character in other operating systems:
1. In Linux and Ubuntu: On the text processor type: Ctrl +Shift + U + ASCII HEXADECIMAL NUMBER. The caps indicator must be turn off. Linux and Ubuntu have the UTF-8 character table.
2. In Mac operating systems: to introduce a special character out of keyboard layout follow these steps .

Using the online Tool

1. Click on the button ASCII characters.
2. On the window that appears, which is called The ASCII Characters, select a character from the list. You can use the scroll bar to find a specific character.
3. Several representations for the specific character will appear on the same window.
4. The ALT + Decimal Number field is useful for Microsoft Office users.

Author's personal page here.

More tutorials, projects and news please visit:
Electrónica Plug and Play