4. Primitive Data Types

4. Primitive Data Types
Prev		Next

Primitive type, wikipedia (http://en.wikipedia.org/wiki/Primitive_type).

Bytes hold numbers 0-255₁₀, 00000000-11111111₂, 00-FFh It's all the computer is ever going to have. We need to use these bytes to represent things more useful/familiar to us.

Using bytes of 0-255, languages implement a set of primitive data types (and provide operators to manipulate the primitive data types).

integers:e.g. 42, 1024, -100
characters: e.g. 'a','Z','0',' '
Note
This explanation of the difference between '0' and 0 was later in the lesson, but the students immediately protested that '0' was a number and not a character.
What's the difference between the integer 0 and the character '0'?
- the integer 0:.
  If represented by a single byte, it will be 00000000. You can do arithmetic operations (e.g. multiply, add, subtract and divide) with the integer 0.
- the character/symbol '0':.
  Has particular shape. It's represented by the byte 30h. When the computer needs to draw/print this character on a screen, the byte 30h is sent to the screen/printer, where the hardware knows to draw a symbol of the right shape to be a zero. The computer is not allowed to do arithmetic operations (e.g. add, multiply, subtract or divide) on the character '0'. However the computer can test the variable holding the character '0' to see whether it represents a decimal digit (number), hexadecimal digit, punctuation, letter and if a letter, whether it's upper or lower case.
In situations where the computer doesn't know whether 0 is a number or character, you have to explicitly write '0' and/or "0" (depending on the language) for the character, while 0 is used for the number.
To add to the confustion, the word "number" is used to mean both a numerical quantity and the characters which represent it. Context will indicate which is meant.
I will be talking about the ASCII character set, ASCII, wikipedia (http://en.wikipedia.org/wiki/ASCII), which is useful for simple text in (US) English. An attempt at a universal character set, see Unicode, wikipedia (http://en.wikipedia.org/wiki/Unicode).
Early in the days of computing, the US Govt decided to only buy computers that used the same character set and it mandated ASCII. Until then, manufacturers all used different hexadecimal representations of characters. Because ASCII was required for computers bought by the USGovt from the early days of computing, all manufacturers supported ASCII. ASCII is still the only guaranteed way of exchanging information between two computers. Usually if one computer wants to send the value 3.14159 to another computer, it is sent as a series of characters (string) and transformed into a number at the receiving end. (There is no agreed upon convention for exchanging numbers.) Thus e-mail and webpages all use ASCII. Many computer peripherals (e.g. temperature sensors) send their data as a string of ascii characters (terminated by a carriage return), which is then turned into a number within the computer.
see big government does work.
Note
The US Govt could have set standards for exchange of numbers too, but it didn't, so numbers are exchanged between computers by ASCII.
real numbers: e.g. -43.0, 3.14159, 98.4
Floating point numbers, wikipedia (http://en.wikipedia.org/wiki/Floating_point).
boolean: e.g. true, false (these are the only two allowed values) (most languages don't have booleans, you have to fake it).
Boolean datatype, wikipedia (http://en.wikipedia.org/wiki/Boolean_datatype). Boolean logic in computer science, wikipedia (http://en.wikipedia.org/wiki/Boolean_logic_in_computer_science).
strings: e.g. "happy birthday", "my birthday is 1 Jan 2000".
String (computer science), wikipedia (http://en.wikipedia.org/wiki/String_%28computer_science%29).

	Note
This explanation of the difference between '0' and 0 was later in the lesson, but the students immediately protested that '0' was a number and not a character.

	Note
The US Govt could have set standards for exchange of numbers too, but it didn't, so numbers are exchanged between computers by ASCII.

4.1. Primitive Data Type: Integer

Programs don't usually do much arithmetic with integers. Integers are used as counters in loops and to keep track of the position in an executing program. Integers do come from digital sensors: e.g. images from digital cameras, digital audio, digital sensors. However most data, by the time it arrives at the computer, is reals.

In a 32 bit computer, an integer has a range of 0-4294967295 (2³², this number is referred to, somewhat inaccurately as 4G, but we've all accepted what it means - it's the 32 bit barrier).

#in bash
#binary
declare -i result;result=2#11111111111111111111111111111111; echo $result
4294967295

#hexadecimal
declare -i result;result=16#ffffffff; echo $result
4294967295

4.2. Arithmetic with Long Numbers

Numbers needing more bits than the machine's register size are called Long (or long), e.g. a 64 bit number on a 32 bit machine. Arithmetic on long numbers needs at least two steps, each of 32-bit numbers, and requires an "add with carry" (ADC) instruction (found on all general purpose computers). Here's how addition of long numbers works. Let's assume a 2bit computer and we want to add a 4bit number.

0010
1011+
----
????

First split the problem into pieces managable by the hardware (here 2 bits) giving us the right hand half (the least significant bits) and the left hand half (the most significant bits).


LH         RH       
00         10         
10+        11+
--         --
??         ??

Next a word about addition and carry: When doing addition by hand, there is never a carry for the rightmost digit, but a computer has a carry bit for the rightmost bit which is set to 0 at the start of addition.

RH
10
11+
----
?? sum
?0 carry

step 1: right column, add two digits + carry digit. The carry to the 2nd column is 0.
RH
10
11+
--
?1 sum
00 carry

step 2: left column, add two digits + carry digit. There is overflow
RH
10
01+
--
01 sum
00 carry (with overflow)

The computer has a FLAGS register (32-bits in a 32 bit computer), which holds, in each bit, status information about the executing program, including whether the previous instruction overflowed, underflowed or set a carry.

The addition above overflowed, but the computer doesn't know if the bit is required for Long addition, in which case the overflow is really a carry. The computer stores the overflow bit in the carry bit in the flags register just in case. If the computer is doing a Long addition, the next step will ask for the carry bit. If the computer isn't doing a Long addition, then then the carry bit will be ignored (and will be lost).

Here's what the calculation looks like now (only the state of the carry bit is shown in the FLAGS register). The computer will first add the right most digits in its 2bit registers, using the regular add (ADD) instruction, which only adds the two numbers and the information setup in the carry input to the adder.

before 1st addition

LH         RH        FLAGS
00         10         ?
10+        11+
--         --
??         ?? sum 
?0         ?0 carry

after 1st addition

LH         RH        FLAGS
00         10         1
10+        11+
--         --
??         01 sum 
?0         00 carry

Because of the overflow, the FLAGS register is now 1. The computer has been told that it's doing the 2nd step in a Long addition. It uses the "add with carry" (ADC) instruction, which transfers the carry bit in the FLAGS register to the adder, and then does a normal addition.

2nd addition. first step, copy carry bit from FLAGS to carry input for LH

LH         RH        FLAGS
00         10         1
10+        11+
--         --
??         01 sum 
?1         00 carry

2nd step, add digits and carry digits for LH numbers

LH         RH        FLAGS
00         10         1
10+        11+
--         --
11         01 sum 
01         00 carry

we now read out the sum digits

11         01

giving the required answer of 1101

You can chain addition to any precision (on a 32-bit computer, to 64, 96, 128-bits...) Standard calculations rarely need more than 64 bits, but some people want to calculate PI to billions of places and this is how they do it.

Long arithmetic is slower than regular arithmetic. You don't ask for Long operations unless you know you need them.

	Note
	End Lesson 4

4.3. Negative Integers

If we wanted negative integers, how would we do it? Pretend you're a 1 byte computer and you need to represent -1. You can do this by finding out the number which added to 1 gives 0.

00000001
????????+
--------
00000000

The answer is 11111111₂, 255₁₀or FFh (computers need overflow to work).

00000001
11111111+
--------
00000000

You've seen the computer version of -ve numbers before. They're called what ^[29] ? They are the (-ve of the number + the base) (in a 1 byte computer the base is 256).

What is -2₁₀ in binary, hexadecimal? ^[30]

How do we know whether 255 should be interpreted as -1 or 255?

The level of primitive data types, is one level above a byte. Your program keeps a record of the primitive data type that each particular byte represents. When you write your program, your code will have descriptors stating whether this integer will have +ve only values (called an unsigned int) or both +ve and -ve values (called a signed int). Some programming languages will have already decided that you'll be using a signed int and you won't have any choice.

If you have a signed int, then integers with high order bit=1 are -ve while those with high order bit=0 are +ve.

binary     hexadecimal decimal
00000000       00         0
00000001       01         1
.
.
01111111       7F       127
10000000       80      -127
10000001       81      -126
.
.
11111100       FC        -4
11111101       FD        -3
11111110       FE        -2
11111111       FF        -1

Linux runs on 32 bit computers. What values is represented by 32x1's (or FFFFFFFF) in Linux bash. On a 32 bit machine we might expect this to be -1.

declare -i result;result=16#ffffffff; echo $result
4294967295

This is not a negative number. What's going on? Let's try a 64-bit number (just for reference, the biggest number that can be represented by 64 bits is 2⁶⁴=18,446,744,073,709,551,616=18.45*10¹⁸).

declare -i result;result=16#ffffffffffffffff; echo $result
-1
declare -i result;result=16#7fffffffffffffff; echo $result
9223372036854775807
declare -i result;result=16#8000000000000000; echo $result
-9223372036854775808

bash in Linux rolls over to -ve numbers half way through the 64 numbers: the integers in bash on Linux on 32 bit machines are 64-bit signed integers.

	Note
	see long_numbers for 64-bit math on 32 bit machines.

4.4. Range of Integers

Here is the range of values that various sized integers (int) can represent. To give an idea of the relative sizes, the table shows the time (stored as seconds) represented by that integer.

			8bit		16bit		32-bit		64-bit	
unsigned		0-255		0-65535		0-4294967296	0-18446744073709551616
signed			-/+127		-/+32767	-/+2147483647	-/+9223372036854775808

unsigned time		4mins		18hrs		136yrs		584,942,417,355yrs
signed time		2mins		9hrs		68yrs		292,471,208,677yrs

You'll see these numbers often enough that you'll start to remember them. For the moment be prepared to see numbers pop up that you've not previously seen much of.

The Y2.038K problem. In 1999 the computer world was in a flurry: programmers who'd prepresented years in their dates using 2 digits, realised that the year 00 would represent the year 1900. Paychecks would not be processed, elevators would stop working at midnight, trapping thousands (if not millions) of innocent people, and planes would fall out of the sky (except Japanese planes). A bigger calamity could not be imagined. The world's bureaucrats heroically spent millions$ of taxpayer's and consumer's money to prevent certain disaster. On 01 Jan 2000, none of the predicted misfortunes occured, for which we must thank the selfless and unacknowledged taxpayers and consumers of the world.

Unix represents time as a signed 32-bit integer of seconds, starting 1 Jan 1970, this date itself a major blunder ^[31] . If in Jan 2038, you're still using a 32 bit OS (not likely for desktop machines, but quite possible for embedded devices sitting in computer controlled machinery, which rarely need 32 bits, much less 64 bits), Unix time will overflow in Jan 2038. If in Jan 2038, your computer controlled refrigerator stops ordering food, it will be because the refrigerator is asking for food to be delivered in 1970. Jan 2008 was a good time to take out a 30yr loan for 250,000$; your monthly payments would be -1,600$ (a comment from slashdot in Jan 2008. http://it.slashdot.org/article.pl?sid=08/01/15/1928213).

A 32 bit computer can generate how many different different integers ^[32] ? This computer then is capable of generating that many different integers. If you use the integers to be labels or addresses, you can label that many different items.

A 32 bit computer addresses its memory in bytes using address lines in the hardware. The computer has to read/write a byte as one unit; it can't address individual bits in a byte in memory - it has to read or write the whole byte. Once the computer has read the byte into a register, then each bit within the byte is separately addressable. What is the maximum number of bytes a 32 bit computer can address, and how much memory is this ^[33] ?

Not too long ago, microprocessors had 4kbytes of memory. Now for many applications, computers need more than 4Gbytes of memory. These applications all have to be run on 64-bit computers. But if you wanted to increase the amount of memory available to 32-bit computers, how could you do it ^[34] ?

In fact most data and instructions in a 32-bit computer are 32 bits and are fetched 32 bits at a time, so changing the addressing to 32 bits would not be a big change. A char would now have to be represented by the byte of interest, followed (or preceded) by 3 empty bytes. The instructions that work on chars would have to mask off the 3 empty bytes. Since chars (in most programs) are a small fraction of the data held in memory, these unused bytes would not cause too much wasted space in memory.

I don't know why 32 bit computer manufacturer's didn't go to 32 bit addressing. Possible reasons might be

Compatibility is a big factor in the commodity market; unless of course you're Apple, who brings out new hardware and software every 3yrs, discontinues support of the current hardware and tells everyone how fortunate they are to be able to buy the new hardware and software. This selects for grateful and well-heeled customers. For the rest of the commodity market, everything has to be backward compatible, back to the clay tablet. The new architecture would break old code. Most people using PCs are running applications on the desktop (business/office applications), and don't have the source code for the programs they run; they run proprietary binaries and would have to buy new versions of their programs if they upgraded to 32 bit addressed hardware.
Most of a desktop machine's activities are character oriented; typing into a wordprocessor, display on a screen, sending characters or mouse clicks to machines on the internet (all of which have to be sent one character at a time in an IP packet, and have to await a reply packet before being displayed on your screen). (In contrast, an HPC machine doesn't interact with the slow user, keyboard, or screen; it runs for hours, days or weeks and then dumps its output as files, to be poured over later by the programmer/user.)
64 bit addressing is available if you need it. 64 bit machines have been around for 30yrs or so. People needing large amounts of memory are crunching large amounts of data. They have more money and aren't running desktop applications. They're either using programs they wrote themselves (i.e. they have the source code, which they can recompile) or are prepared to pay for 64 bit versions of standard programs. They are also prepared to pay for 64 bit machines.
64 bit addressing is available for the desktop too. The hardware for the commodity market (PCs) is changing over to 64 bit about the same time as the arrival of desktop programs that handle large amounts of data.

Harddisks read/write data to/from independantly addressable blocks (i.e. the computer cannot address individual bits or bytes within a block; it has to address and then read or write the block as one indivisible unit). Let's assume a blocksize of 512bytes. If the computer wants 1 bit off the harddisk, the computer has to know the address of the block, read the whole 512bytes into registers, manipulate the 1 bit and the write the 512byte block back to disk. What's the maximum number of blocks a 32 bit computer can address on a harddisk and how much can be stored on a disk with blocksize=512bytes ^[35] ?

If you wanted to put a bigger disk on a 32 bit computer, how would a harddisk manufacturer do it ^[36] ?

Harddisk manufacturers originally started with block size of 512 bytes and have incremented the size of the blocks continuously over the years. I don't know why hard disk manufacturers can change the blocksize whenever they want and not have programs fail, while at the same time microprocessor manufacturers have not been able change from 8bit addressing to 32 bit addressing. Possible reasons might be

While the 32 bit barrier for memory was always a long way off and programs never had to change this size, harddisk manufacturers set specs for their disks that were regularly exceeded almost the next year. You'd think the harddisk manufacturers would set a spec that would last 10yrs or so, but they didn't. OS writers were continually virtuallising out new harddisk hardware, while not having to change the word size for fetching from memory.

My guess as why memory addressing stayed constant at the 1 byte granularity over decades, while disk block size (granularity) increased from 512 bytes to 8192 bytes, is that going to larger block size forgoes the chance of addressing a 512 byte block, which doesn't cause any problems (you don't ship disk blocks off the local machine), but choosing 32-bit addressing forgoes the chance to address a byte, which you do want to be able to move around (including sending to other machines or peripherals).

IPv4 internet addressing uses a 32 bit unsigned integer (IP) to uniquely identify each network device (e.g. no two devices/computers can have the came IP). How many devices can be simultaneously on the internet using an IPv4 IP ^[37] ?

The internet game "World of Warcraft" (reported in slashdot, http://games.slashdot.org/games/08/01/19/1321241.shtml) uses a 32 bit integer to store the amount of gold that players earn. Some players have reached that limit and can no longer receive gold from other players. If these players have 1c of gold for each bit, what is their wealth in $ ^[38] .

The world's telephone system carries voice data in 8bit form i.e. it converts your voice into bytes, each byte representing the amplitude of your voice at any instant. How many different amplitude levels can be expressed using 1byte ^[39] ? Since the phone system uses 8bits, it's simple to send bytes from computer data across phone lines. Hifi audio is usually 12bits (how many levels is this? ^[40] ) which has less noise than 8bit audio.

Digital cameras, and computer monitors use 8 bits to represent the intensity of each of the 3 primary lights of a picture; red, green and blue. This turns out to be more levels than the eye can differentiate, but not much more (the eye doesn't see the edge between one intensity and the next and only sees a continuous change in color).

A color picture from an 8 bit digital camera can have how many different colors ^[41] ?

Matte print (e.g. books, newspapers) can only reproduce about 100 levels of light, but glossy print (e.g. in fine books and magazines) can reproduce about 180 levels, which is why expensive advertisements are run in glossy magazines.

The human eye can accomodate a range of light from nightime to midday on a cloudless day, a range of 10⁷ in intensity (I think). The eye can see features in a face in shadow and in the face of a person standing next to them in full sun, but an 8 bit digital camera will, according to the exposure, only see the face of one, while the other will be washed out (either dark, or light). To help in situations of high contrast, expensive digital cameras record 12bits, allowing range compression and expansion. These photos are post-processed to reduce the range to 8bits for display (or printing) but keeping constrast in both the light and dark areas (i.e. you can see the features of a face in shadow and a face in bright light in the same photo).

What other things could we use 32-bits for: How many -ve numbers could we have ^[42] ? How many prime numbers could we address ^[43] ?

	Note
	End Lesson 5. At the start of the next class, I went back over the number of items that a 32 bit computer could represent. The students had forgotten the number of integers that could be represented by 32 bits (not only the 4G value, but the concept of there being a limit associated with 32 bits). I went through the number of integers, computers on the internet, blocks on a hard disk etc again. The seemed to remember the concept after a few examples. My partner reminded me that you have to tell a student 3 times before you can expect them to start to remember a fact.

4.5. Integer Arithmetic in Python

Now that we're at the level of primitive data types, we can use a language like python.

fire up python; you will be running python in immediate (interactive) mode, where it interprets each instruction one at a time, and returns to the prompt. (In normal operation a program keeps executing till it has to wait, say for a keystroke from you.)

in a terminal (Linux, Mac, Cygwin on windows) type python.
on windows; start-programs-python-python(commandline)

You will get the python ">>>" prompt

Python 2.4.3 (#1, Apr 22 2006, 01:50:16) 
[GCC 2.95.3 20010315 (release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

	Note
	The following examples are based on Chapter 1 of the LiveWires tutorial "Introducing Python".

try a few subtractions, multiplications and divisions on your own.

>>> 12 + 13
25

>>> 123456 * 3
370368

How about this one?

>>> 7/3
2	#7,3 are integers. You're asking to do integer arithmetic. You get an integer answer.
>>> 

>>> 7%3
1	#'%' is modulo (remainder)
>>> 

>>> 7.0/3
2.333333333333333 #as long as one of the numbers is real, the answer will be promoted to real

you'll learn about real numbers soon.

4.6. Largest/Smallest Integer in Python

	Note
	This went over the kid's heads, so I skipped to the next section (they don't need to know this right now)

Most programs (including python) use the machine's native libraries (e.g. math, string) (which are usually written in C). (No-one writes a library when a well tested one is already available.) The size (number of bits) for various primitive types in python will then depend on the native libraries. The documentation for python says there are two types of integers (see Numeric Types http://docs.python.org/lib/typesnumeric.html).

plain integers: (called "int" in most languages)
the size depends on the native libraries. We would expect on a 32 bit PC for plain integers to be 32-bit (you don't always guess right: bash on Linux uses 64 bit integers).
long (or Long) integers (a number followed by "L"):
numbers which are bigger than plain integers and have unlimited precision (the machine will use enough bits to handle whatever you throw at it). (Most languages restrict the number of bits you can have).

The Python documentation doesn't tell you the sizes for these two types of integers for any particular platform: you're supposed to be able to work it out yourself. What's the largest plain integer that python can represent? (for likely numbers, look at the table in integer_range) i.e. is it 32 or 64 bit? You won't have to remember the range of integers in python, but you'll need to understand enough about a computer to figure it out. You also should not be surprised if numbers become Long when they become big enough. (In the following, remember 65536 fills 2 bytes. For compactness, I'll use hexademical to illustrate what's happening.)

Python has no trouble representing any size integers. Here are some integers from 16-256bits (Long integers end with "L").

>>> 65536-1			#16bit FFFF
65535
>>> 65536*65536-1		#32-bit FFFFFFFF
4294967295L
>>> 65536*65536*65536*65536-1	#64-bit FFFFFFFFFFFFFFFF
18446744073709551615L
>>> 65536*65536*65536*65536*65536*65536*65536*65536-1	#128-bit  (32 Fs)
340282366920938463463374607431768211455L
>>> 65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536*65536-1 #256bit
115792089237316195423570985008687907853269984665640564039457584007913129639935L

From the above output, the 16bit number 65536 is a plain integer, but the 32 bit number is Long. To calculate 65536*65536-1, we would first have had to calculate the intermediate result 65536*65536 which would be a Long (needs 33bits). If you subtract 1 from a Long, you still have a Long (even if it could be represented as a plain), so FFFFFFFF could be a plain integer, but we wouldn't have found out doing it this way. Let's look around for a 32 bit limit. Remember that only half of the integer range is used in signed integers, so let's look at half of a 32-bit number.

#here we would need 33 bits to handle the multiplication overflow 
#so we already know that the answer will be a L number.
>>>65536*65536-1	# FFFFFFFF
4294967295L
#what's half of 65536?
>>> 65536/2 		# 8000       
32768
#The result will be 80000000H, which if a signed integer will be a -ve number
#It looks like python promotes the integer to a L
>>> 32768*65536		# 80000000
2147483648L
#Let's get below 80000000H to 7FFF0000H. 
Yes it's a plain integer
>>> 32767*65536
2147418112
#Let's try 7FFFFFFFH. Yes it's a plain integer.
>>> 32767*65536+65535
2147483647
#just checking 80000000H again, this time approaching from below.
#It's L
>>> 32767*65536+65536
2147483648L

The largest plain integer represented by python is 7FFFFFFFh or 2147483647.

This process above, of poking numbers (or different piece of code) into a program to see what it does is called noodling. It's a good way to learn.

What's the -ve most plain integer in python ^[44]

Python uses a signed 32-bit integer to represent plain integers.

4.7. Primitive Data Type: Characters, ASCII table

We need to represent the characters in the alphabet, so the computer can type them on the screen and receive them from the keyboard. We need upper and lower case (52) + numbers (10), plus some punctuation and printer/screen control characters (move up/down/left/right, carriage return, line feed, end of file).

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
,. !@#$%^&*()-_[]{};:'"<>;/?`~

This is more than 64, but less than 127. This number of characters requires 7 bits. A regular 8 bit byte is used with the top 127 generally unused. The mapping between the 256 possibilities in a byte and the symbols displayed above, as mandated by the USGovt, is called ASCII.

A table of ascii characters and their binary/decimal/hexadecimal equivalents is at wiki, ASCII (http://en.wikipedia.org/wiki/ASCII). The table of printable characters (http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters). shows that in ASCII, the characters are in alphabetical order.

	Note
	unlike some other character sets e.g.EBCDIC http://en.wikipedia.org/wiki/EBCDIC originally devised as an extension of Binary Coded Decimal (BCD) http://en.wikipedia.org/wiki/Binary-coded_decimal needed to handle money.

A table which better illustrates the hexadecimal organisation of ASCII is ASCII Chart and Other Resources (http://www.jimprice.com/jim-asc.shtml#table). (A slightly fancier table ASCII Table and Unicode Characters http://ascii-table.com/).

The numbers are 3hex+number. This allows easy conversion of a character representing a number into a number (you mask off the left 4 bits and add the right 4 bits into the output number).

bash converts the hexadecimal representation of a character to its ascii symbol using this command

echo $'\x41'
A

	Note
	The "41" is hex and the 'A' output is a char (not a hex number). The rest of the command is obscure bash magic.

How many letters down the alphabet is the character 'B' (try it at the prompt) ^[45] ? How many letters down the alphabet is the character represented by hex '51' ^[46] ? Knowing that the hex for 'A' is 41h, figure out the hex for 'Z' and then try it ^[47]

To change between upper and lower case, the 6th bit in the byte is flipped. What change in value (as an integer) does flipping the 6th bit represent ^[48] ?

echo $'\x5A'
Z

echo $'\x7A'
z

In a program. to differentiate a character from a number or variable do this

'c'	char
 c      the variable named c #better to use a longer descriptive name, eg computer_name
 7      the number 7
'7'	the character 7

Computers can scan text to test which characters are letters (A-Z,a-z), which are numbers (0-9) and which are punctuation. The computer can match characters (e.g. is the character an 'A' or a '9'?).

	Note
Every keystroke on your keyboard is a character. If you type "3.14159" on the keyboard, the computer accepts it as a series of characters. If you want this to be a real, then you have to explicitely tell the computer to convert the string of characters into a number. If the computer asks you to input a number at the keyboard, your keystrokes will first be put into a string buffer and later you program will have to convert the string to a number. If you have "3.14159" displayed on the screen and swipe it with your mouse and put it into a buffer, it will be in your buffer as a string of characters. All normal input and output on a computer is characters and strings e.g. keyboard, screen, printer. (Some programs exchange data as binary, but you have to set that up.)

Note

Every keystroke on your keyboard is a character. If you type "3.14159" on the keyboard, the computer accepts it as a series of characters. If you want this to be a real, then you have to explicitely tell the computer to convert the string of characters into a number. If the computer asks you to input a number at the keyboard, your keystrokes will first be put into a string buffer and later you program will have to convert the string to a number.

If you have "3.14159" displayed on the screen and swipe it with your mouse and put it into a buffer, it will be in your buffer as a string of characters.

All normal input and output on a computer is characters and strings e.g. keyboard, screen, printer. (Some programs exchange data as binary, but you have to set that up.)

4.8. Primitive Type: Real Numbers

	Note
	The representation of real numbers take a bit of explaining. You don't need to understand how the are represented to use them (we'll do that later - look in Real Numbers)

"real" numbers (also called a "floating point" numbers) in computing are numbers like 3.14159 - anything with a decimal point. You do arithmetic on them.

>>> 3.0*6.0
18.0

You can mix integers and reals - the computer handles it for you, promoting the integer to a real.

>>> 3.0*6
18.0

Be careful how you mix integers and reals. The computer first evaluates (7/2) not knowing that ahead an evaluation of a real.

>>> 7/2*5.0
15.0

A minor rearrangement of this code gives

>>> 5.0*7/2
17.5

Minor editing of this code makes a big difference in the output (one is correct and one is not). Code where a minor edit (like rearranging the order of a multiplication, which should not change the result) gives a different answer. Such code is called fragile code. Someone (maybe you), years later, could be working on your code and not see the code bomb and will rearrange the line to trigger the bomb.

You should practice safe programming. When mixing integers and reals, explicitely promote the integers to reals, and don't expect the computer to do it for you. Don't rely on rules of precedence too much. Use brackets to make code clear for the reader. This is how the code should be written.

>>> (5.0*7.0)/2.0
17.5
>>> (7.0/2.0)*5.0
17.5

4.9. Primitive Type: Strings

a string is a series of characters delineated by a pair of double or singe quote characters; e.g. "happy birthday!", "1600 Pennsylvania Ave", "temperature=32F, pressure=29.90in", 'I am "late"'

In principle it's possible to operate on strings as arrays of characters, but strings are the dominant form of input and output on a computer (all computers and people can read them), so all languages have instructions to search, match, read and write strings.

In situations where enormous amounts of data are involved, which will never be read by a human and only ever read by another computer (mapping, photos, MRI data), then data is written in the more compact binary form. You'll still need a team of programmers to write the code to allow each new generation of computer to read and write that format.

	Note
	End Lesson 6

4.10. Is it a string or number?

Until you get used to the rules, and familiar with the error messages, you will have to check each time what you have. Some interconversions are done without asking and others have to be invoked explicitely.

Here's some operations on numbers

>>> variable=3.14159
>>> 3*variable		#since you can multiply, it must be a number, the integer is automatically promoted to real
9.4247699999999988

			#how does python know 3.14159 is a number and not a string?
			#see below for variable=3.14159q	

>>> print variable	#you can print numbers
3.14159

			#the + operator joins strings
	                #but you can't print a string and number at the same time using a +
			#the error message is helpful
>>> print "the number is " + variable
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: cannot concatenate 'str' and 'float' objects

			#you have to turn the number into a string
>>> print "the number is " + repr(variable)
the number is 3.1415899999999999

			#you print numbers using a ','
>>> print "the number is", variable
the number is 3.14159

Here's some operations on strings

>>> variable="3.14159"
>>> 3*variable 
'3.141593.141593.14159'	#same string 3 times. 
	                #not real useful, and probably not what you want.
	                #no other language has this. 
	                #if you really need to do this, 
	                #use a construction common to all languages
	                #or no-one will be able to maintain your code.
			#
			#what would have happened if you'd done the following?
			#variable="3.14159 " 
			#or
			#variable="3.14159q"
			#or
			#variable=3.14159q	

>>> 3.0*variable	#the error message is not helpful
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can't multiply sequence by non-int

>>> 3.0*float(variable) #float() converts a string to a real 
9.4247699999999988

>>> print variable      #you can print a string.
3.14159

Note the unhelpful error message resulting from the command 3.0*string. You should be prepared for error messages to be wrong and have to go to google to find out the real problem (don't expect the right answer to be available there either, it usually will be, but you'll still have to figure some out yourself). The interpreter knows that it has a string and not a number and could have told you. Unfortunately this is part of computing - it's always been this way. I wonder if the messages are designed to raise the cost of entry to being a programmer. The error messages from gcc, the GNU C compiler, have improved dramatically in the last 10yrs.

	Note
	One of the students commented that it was like having life boats that don't work.

print out the product of the numbers 3.0 and 4.0 ^[49]
print out the product of the number 3 and the number represented by the string "4.0". ^[50]
print out the string "here is the result" followed by the + sign, followed by the product of the numbers 3.0 and 4.0. ^[51]

4.11. Other primitive data types

The data types described so far are found in most languages. Others are common, but on knowing these four, the new ones will be easy to use when we need them.

What primitive data types do we know now ^[52] ?

Prev	Up	Next
3. binary numbers, the bit (b)	Home	5. Other Languages