__                                             __ 
  ___ / /____ _  _____ ______ _____ ____    ___  ___ / /_
 (_-</ __/ -_) |/ / -_) __/ // / _ `/ _ \_ / _ \/ -_) __/
/___/\__/\__/|___/\__/_/  \_, /\_,_/_//_(_)_//_/\__/\__/ 
                         /___/              
      

Sample Pairs Analysis Explained

January 2, 2022

Sample Pairs Analysis (SPA) attempts to calculate the lengths of secret messages hidden in natural images by detecting subtle changes to neighbouring bytes caused by LSB Replacement (LSBR). (Natural images are images taken with a camera, whereas artificial images are images created using software). SPA is a type of statistical analysis steganalysis.

For simplicity let us assume our stego-image is greyscale (one colour channel) rather than RGB (three colour channels). Each pixel is a single byte with a value ranging from 0 (black) to 255 (white). The image is 3 x 3 pixels and can be visualised as follows:

255 180  37
  4   5 202
  7  10 211

P is the set of all horizontal pairs:

{(255,180), (180,37), (4,5), (5,202), (7,10), (10,211)}

We create the sets X and Y using the following rules:

* X is the set of horizontal pairs where
  - the second byte is even and greater than the first byte, or
  - the second byte is odd and smaller than the first byte.

* Y is the set of horizontal pairs where
  - the second byte is even and smaller than the first byte, or
  - the second byte is odd and greater than the first byte.

Using these rules we calculate X and Y as follows:

X = {(180,37), (5,202), (7,10)} 
Y = {(255,180), (4,5), (10,211)}

The sizes of X and Y will be roughly the same for all natural images. However when a message is added to a natural image using LSBR, this balance changes and results in the creation of a new set, Z. This is because randomly changing bytes’ LSBs in the image does not affect X and Y in the same way.

There are three ways each byte pair in X and Y can be changed using LSBR: 10 refers to a change to the first byte only, 01 refers to a change to the second byte only, and 11 refers to changes to both bytes. A pair where neither byte has been changed is referred to as 00.

For example, a 10 change, a 01 change, and a 11 change to the pair (201,202) from X produces pairs which either stay in X ((200,202)) or move to Y ((201,203), (200,203)). Applying the same logic to the pair (4,5) from Y causes further imbalance as some of the new pairs do not belong to either X or Y: a 10 change produces (5,5), and a 01 change produces (4,4). Neither of these pairs belong to X or Y, so they join a new set, Z. The set Z always consist of identical byte pairs such as (4,4) and (5,5).

This movement between sets based on which LSBs are changed can be measured and is how SPA calculates the length of the hidden message.

The diagram below shows how LSB changes move pairs between sets. X is the set defined above, W is the set of pairs in Y which differ on the LSB only such as (4,5), V is the rest of Y such that VW = Y, and Z is the set of equal pairs.

We can derive an expression for the sizes of these sets and the percentage length of the message, p. If the percentage length is p, then the probability that any one byte is changed is p/2. For example, if the cover image has 100 bytes and the message is 25 bits, we will need to change roughly 12 LSBs as around half of the LSBs will already be set to the correct value. This means the probability of changing one byte is 12.5%, and the probability a byte will not be changed is 1 − p/2 or 87.5%. Expanding on this gives us the following probabilities:

* The probability of a byte being changed is p/2
* The probability of a byte not being changed is 1 − p/2
* The probability of one byte in a pair of bytes being changed is (p/2) * (1 − p/2)
* The probability of both bytes in a pair of bytes being changed is (p/2)²
* The probability of neither byte in a pair of bytes being changed is (1 − p/2)²

We can define four more sets, X′, Y′, W′, and Z′. These sets are defined in the same way as their corresponding sets (X, Y, W, and Z, respectively), however they represent the pixel values after LSB Replacement. We can calculate the sizes of X′ and Y′ (V′W′) as follows:

|X′| = |X|(1 − p/2)² + |X|p/2 * (1 − p/2) + |V|p/2 * (1 − p/2) + |V|(p/2)²
|V′| = |V| * (1 − p/2) + |X| * p/2
|W′| = |W|(1 − p + p²/2) + |Z|p(1 − p/2)

Similar logic can be used to calculate the size of Z′.

Finally, we can solve the following quadratic equation to determine the length of the hidden message, p:

0.5 * (|W′| + |Z′|) * p² + (2|X′| − |P|) * p + |Y′| − |X′| = 0

To calculate p for an RGB image, the stego-image is preprocessed so the bytes in each row of pixels are sorted by colour channel. For example, a 3 x 3 RGB image can be visualised as follows:

R1G1B1 R2G2B2 R3G3B3
R4G4B4 R5G5B5 R6G6B6
R7G7B7 R8G8B8 R9G9B9

After sorting, the image changes as follows:

R1R2R3 G1G2G3 B1B2B3
R4R5R6 G4G5G6 B4B5B6
R7R8R9 G7G8G9 B7B8B9

The pairs would be as follows:

{(R1,R2), (R2,R3), (R3,G1), (G1,G2), (G2,G3), (G3,B1), ...}

If LSB changes were made to R1 from the first pixel and R2 from the second pixel, these changes would be recognised as a 11 change to (R1, R2). However if LSB changes were made to R1 and G2, SPA would only see this as 10 changes to (R1, R2) and (G2, G3), and as a 01 change to (G1, G2).