05 May 2012

Data Representation


To understand how a computer process data, you should know how a computer represents data, you should know how a computer represents data. People communicate through speech by combining words into sentences. Human speech is analog because it uses continuous (wave form) signals that vary in strength and quality. Most computers are digital. They recognize only two discrete states: on and off. This is because computers are electronic devices powered by electricity, which also has only two states: on and off.
 The two digit, 0 and 1, easily can represent these two states. The digit 0 represents the electronic state of off (absence of an electronic charge). The digit 1 represents the electronic state of on (presence of an electronic charge).
 When people count, they use the digits in the decimal system (0 through 9). the computer, by contrast, uses a binary system because it recognizes only two states. The binary system is a number system that has just two unique digits, 0 and 1, called bits. A bit (short for binary digit) is the smallest unit of data the computer can process. By itself, a bit is not very informative.
 When 8 bits are grouped together as a unit, they form a byte. A byte provides enough different combinations of 0s and 1s to represent 256 individual characters. These characters include numbers, uppercase and lowercase letters of the alphabet, punctuation marks, and others, such as the letters of the Greek alphabet.
 The combinations of 0s and 1s that represent characters are defined by patterns called a coding scheme. In one coding scheme, the number 4 is represented as 00110100, the number 6 as 00110110, and the capital letter E as 01000101. ASCII which stands for American Standard Code for Information Interchange is the most widely used coding scheme to represent data.
 The ASCII coding scheme is sufficient for English and Western European languages but is not large enough for Asian and other languages that use different alphabets. Unicode is a 16-bit coding scheme that has the capacity of representing more than 65,000 characters and symbols. The Unicode coding scheme is capable of representing almost all the world's current written languages, as well as classic and historical languages. To allow for expansion, Unicode reserves 30,000 codes for future use and 6,000 codes for private use. Unicode is implemented in several operating systems, including Windows, Mac Os, and Linux. Unicode-enabled programming languages and software include Java, XML, Microsoft Office, and Oracle.
 Coding schemes make it possible for humans to interact with a digital computer that processes only bit. When you press a key on a keyboard, a chip in the keyboard converts the key's electronic signal into a special code that is sent to the system unit. Then, the system unit converts the code into a binary form the computer can process and store it in memory. Every character is converted to its corresponding byte. The computer then processes the data as bytes, which actually is a series of on/off electrical states. When processing is finished, software converts the byte into a human-recognizable number, letter of the alphabet, or special character that is displayed on a screen or is printed. All of these conversions take place so quickly that you do not realize they are occurring.
 Standards, such as those defined by ASCII and Unicode, also make it possible for components in computers to communicate with each other successfully. By following these and other standards, manufacturers can produce a component and be assured that it will operate correctly in a computer.