Oh, Those Terrible Units

published at 06 May, 2010 by Szymon Lipiński tags: programming

There are some numbers and units. Numbers are numbers. Units are units. Numbers informs how many. Units inform about two things:

  • how many
  • of what

Unfortunately it seems like most programmers just don’t care about all that stuff.

The SI Base Units

There is the International System of units, which is normally used e.g. in physics.

Name Unit Symbol Quantity
metre m length
kilogram kg mass
second s time
ampere A electric current
kelvin K thermodynamic temperature
candela cd luminous intensity
mole mol amount of substance

IT Base Units

In IT we have a couple more units.

Name Unit Symbol Quantity
bit b number of bits
byte B number of bytes
bits per second bps number of bits per second

Have you noticed the difference? BYTES and BITS? This is surprising for many programmers, but these are two different things. What’s more, as two different things, they have two different symbols.

The Word Of Truth

A couple of basic definitions, also it’s too surprising for many people.

  • 1 byte = 8 bits
  • 1 B = 8 b
  • bit represents logical value (True or False), (0 or 1)
  • there is nothing smaller than 1b
  • you cannot have half of a bit - what is the half of False?

SI Prefixes

There are also prefixes used for multiples and submultiples of the basic units.

name sumbol factor
kilo k 10³
mega M 10⁶
giga G 10⁹
tera T 10¹²
peta P 10¹⁵
exa E 10¹⁶

These symbols are used for avoiding too many zeros. So instead of 1000m you can write 1km and instead of 1000000B you can write 1GB.

There is another problem. Normally 1 kilobyte = 1024 bytes. Yea, the IT world is a little bit different. For distinguishing between 10³ = 1000 and 2¹⁰ = 1024 there are also some other symbols:

name symbol factor
kibi Ki 1024¹ = 2¹⁰
mebi Mi 1024² = 2²⁰
gibi Gi 1024³ = 2³⁰
tebi Ti 1024⁴ = 2⁴⁰
pebi Pi 1024⁵ = 2⁵⁰
exbi Ei 1024⁶ = 2⁶⁰

The only difference between Now we’ve got 1024B = 1KiB and so on.

The Problem

Why am I writing all that? Because I don’t understand why many people in the IT world, including many programmers, still don’t know that this is important to write according to some standards. This is like a language: you have to use the correct grammar so others have the chance to understand correctly what you say.

Just imagine that you have a new ISP and you pay some money for an internet connection with the speed of 1MB per second. Then you get something which is 8 times slower. All that just because they treated B as a symbol for bit . I’m sure you will get angry. Why nobody seems to be angry at “programmers” who mix bytes with bits all the time. Even Elasticsearch gives the size of an index in mb. Unfortunately this even doesn’t exists. How much is a millibit?

Some Examples From Web

  • 1KB – this means 1 Kelvin Byte – I have no idea what it means. Author could think about: 1kB = 1000B or 1KiB = 1024B.
  • 1mb – funny… 1 milibit, this is 0.001 of bit. Bit is a logical value, basically it means True or False, 0 or 1. How can you have 0.001 of True or False? Maybe someone wants to write 1MB = 1,000,000B?
  • 1mB – another funny, not found so often. This means 1 milibyte = 0.001B. But 1 byte is 8 bits.
  • 1gb – sorry, I have no idea what the g is.
  • 1gB – the same as above.
  • 1kb – often author wants to write 1kB because 1kb = 125B – quite unusual number of bytes.

Why Nobody Cares?

Why nobody seems to care about being exact? The whole programming is about being exact. Writing good programs working in a reliable way. Why cannot we say the same about being exact about even so simple thing as units?