In previous articles we have seen LSB Replacement (LSBR), LSB Matching (LSBM), and Matrix Embedding can be used to hide secret messages.
In this article we discuss a new steganography technique, SYM (short for symmetrical), for hiding secret messages in artificial images.
If you zoom into a natural image, you will notice the colours tend to blend into each other. For example, if there is a photo of a person wearing a yellow t-shirt in front of a red wall, the area of the photo where the yellow and red colours meet will show a gradient going from yellow to orange to red. Artificial images do not have this property: there is no gradient when two colours meet. Therefore, if a secret message is embedded in an artifical image, we risk introducing statistical noise which can be detected using statistical analysis steganalysis, especially Sample Pairs Analysis (SPA).
SYM has two general goals: (1) reduce the effectiveness of SPA by ensuring the secret message does not introduce statistical noise or reduce the symmetry of the image, and (2) improve the use of artificial images as carriers by embedding the message in such a way that it actually improves the symmetrical properties of the image.
SPA works by looking for LSB changes which (a) are random rather than sequential and (b) where one or more of the LSBs in each pair of adjacent bytes have been changed. Therefore if we use SYM with a sequential embedding spread, and embed the data in such a way that we are able to artificially add or maintain the symmetry of the image, we can avoid or reduce detection. We do this by embedding the secret message in such a way that every LSB change has a corresponding change in its neighbouring byte. By changing both bytes, the symmetry is maintained and the message bit is undetectable.
The logic for SYM is quite straightforward, as it is an enhancement of the logic used by LSBM. LSBM works by randomly adding or subtracting one from an image byte if its LSB does not match the message bit. SYMs logic is similar, however the image bytes are incremented and decremented in sequential order, and we alter image bytes even if the LSB and message bits match. This creates a smooth up-down curve between each LSB.
It works as follows:
* If the message bit differs from the LSB of the image byte, alternate between adding or subtracting one from the image byte. In other words:
– If this is the first message bit we are processing and the LSB of the image byte needs to be changed, add one to the image byte.
– If this is the second message bit we are processing and the LSB of the image byte needs to be changed, subtract one from the image byte.
– If this is the third message bit we are processing and the LSB of the image byte needs to be changed, add one to the image byte.
– And so on, maintaining the same order of increments and decrements.
* If the message bit and the LSB of the image byte are already set to the same value, alternate between adding or subtracting two from the image byte.
Although the above logic appears simple, there are a few important details which may not be obvious:
* Adding or subtracting two from a byte does not change its LSB, therefore we are able to change the byte while maintaining its original LSB.
* Every byte pair will have an up-down (or down-up) change, regardless of whether the LSBs match or do not match the message bits. For example, to make a 10 change, if the first image byte is increased by one, the second image byte will be decreased by two; to make a 01 change, if the first image byte is increased by two, the second image byte will be decreased by one; to make a 11 change, if the first image byte is increased by one, the second image byte will be decreased by one; and finally to make a 00 change, if the first image byte is increased by two, the second image byte will be decreased by two.
What we are effectively doing is improving upon LSBM by including an extra artificial symmetry step to balance the changes in the bytes.
Decoding a message hidden using SYM follows the same process as decoding a message hidden using LSBR and LSBM. This is because the changes made to the image’s LSBs are the same as the bits in the hidden message.
To thoroughly understand the performance of SYM, we have prepared the following test setup:
* A set of ten artificial images. Each image is 192 x 192 pixels and in RGB format.
* We create five messages for each image: a message which alters 1% of the image’s bytes; a message which alters 10% of the image’s bytes; a message which alters 25% of the image’s bytes; a message which alters 50% of the image’s bytes; and a message which alters 100% of the image’s bytes. As each image has a different number of bytes (different file size), we calculate the message lengths individually for each image. The messages are then dynamically generated by encrypting a series of ”A”s with AES.
* We have created an encoder and decoder which can handle the various steganography algorithms and spreads.
* We have written a script which tests SYM against SPA, Regular and Singular Groups (RS), Weighted Stego-Image (WS), Triples Analysis (TA), and deep learning steganalysis using Aletheia.
The above requires 2,000 individual tests. The results are displayed in four tables: LSBR & Artificial Images, LSBM & Artificial Images, Matrix Embedding & Artificial Images, and SYM & Artificial Images. Instead of showing the individual results for each image, we have aggregated the results into a single score.
We have taken an attackers perspective, so the result in each cell is the percentage of tests which caused a false negative. By false negative we mean the steganalysis technique failed to detect a hidden message, and classified the stego-image as a cover image.
To make it easier to see the results at a glance, we have colour coded the cells as follows: green means 80%+ false negative rate, red means 80%+ detection rate, and colourless refers to everything else. As an attacker, we are most interested in rows which are entirely green (ideally with scores of 100%), as it means our stego-images have a high chance of being undetected.
LSBR False Negative Rates (Artificial Images) | |||||
SPA | RS | WS | TA | ML (LSBR Model) | |
Sequential, 1% | 100% | 100% | 90% | 100% | 100% |
Sequential, 10% | 60% | 70% | 30% | 100% | 100% |
Sequential, 25% | 0% | 0% | 0% | 30% | 90% |
Sequential, 50% | 0% | 0% | 0% | 10% | 90% |
Sequential, 100% | 0% | 0% | 0% | 0% | 0% |
Random, 1% | 100% | 100% | 90% | 100% | 50% |
Random, 10% | 10% | 40% | 0% | 10% | 0% |
Random, 25% | 0% | 0% | 0% | 0% | 0% |
Random, 50% | 0% | 0% | 0% | 0% | 0% |
Random, 100% | 0% | 0% | 0% | 0% | 0% |
LSBM False Negative Rates (Artificial Images) | |||||
SPA | RS | WS | TA | ML (LSBM Model) | |
Sequential, 1% | 100% | 100% | 90% | 100% | 100% |
Sequential, 10% | 80% | 80% | 50% | 100% | 100% |
Sequential, 25% | 50% | 60% | 50% | 70% | 100% |
Sequential, 50% | 50% | 50% | 50% | 50% | 100% |
Sequential, 100% | 20% | 10% | 20% | 30% | 0% |
Random, 1% | 100% | 100% | 90% | 100% | 100% |
Random, 10% | 80% | 90% | 60% | 80% | 10% |
Random, 25% | 40% | 80% | 30% | 10% | 0% |
Random, 50% | 30% | 30% | 30% | 20% | 0% |
Random, 100% | 20% | 0% | 20% | 30% | 0% |
Matrix Embedding False Negative Rates (Artificial Images) | |||||
SPA | RS | WS | TA | ML (LSBR Model) | |
Sequential, 1% | 100% | 100% | 90% | 100% | 90% |
Sequential, 10% | 90% | 90% | 70% | 100% | 100% |
Sequential, 25% | 70% | 90% | 50% | 70% | 100% |
Sequential, 50% | 50% | 60% | 40% | 50% | 100% |
Sequential, 100% | 30% | 30% | 30% | 10% | 0% |
Random, 1% | 100% | 100% | 90% | 100% | 100% |
Random, 10% | 90% | 90% | 80% | 100% | 10% |
Random, 25% | 70% | 90% | 50% | 70% | 0% |
Random, 50% | 40% | 90% | 30% | 10% | 0% |
Random, 100% | 20% | 30% | 20% | 20% | 0% |
SYM False Negative Rates (Artificial Images) | |||||
SPA | RS | WS | TA | ML (SYM Model) | |
Sequential, 1% | 100% | 100% | 90% | 100% | 90% |
Sequential, 10% | 100% | 100% | 80% | 100% | 0% |
Sequential, 25% | 90% | 90% | 90% | 100% | 0% |
Sequential, 50% | 90% | 90% | 90% | 100% | 0% |
Sequential, 100% | 90% | 90% | 90% | 30% | 10% |
Random, 1% | 100% | 100% | 90% | 100% | 90% |
Random, 10% | 80% | 90% | 60% | 70% | 60% |
Random, 25% | 30% | 80% | 40% | 0% | 30% |
Random, 50% | 40% | 20% | 50% | 40% | 0% |
Random, 100% | 90% | 90% | 90% | 0% | 10% |
We can see SYM is by far the best performer when it comes to artificial images resisting SPA, with a 14% improvement over Matrix Embedding, a 20% improvement over LSBM, and a 71% improvement over LSBR. In separate testing, not included here, SYM did not perform as well when it comes to natural images, however it only differs from Matrix Embedding and LSBM by 3%.
Average False Negative Rate Per Steganography Algorithm (SPA only) | |
Average | |
SYM (Artificial Images) | 96% |
Matrix Embedding (Artificial Images) | 78% |
LSBM (Artificial Images) | 74% |
LSBR (Artificial Images) | 43% |
We can see SYM has the best overall performance for artificial images, with a 67% improvement over LSBR, an 18% improvement over LSBM, and a 12% improvement over Matrix Embedding. In separate testing, not included here, SYM does less well with natural images, lagging 14% behind LSBM and Matrix Embedding, however it has a 93% improvement over LSBR.
Looking at the performance of SYM against the various steganalysis techniques, we can see SYM performs well against statistical analysis steganalysis, with higher than average false negative scores for all of the algorithms. Our deep learning model is significantly more effective than statistical analysis, being able detect more than half of the stego-images.
Average False Negative Rate Per Steganalysis Technique (SYM only) | |||||
SPA | RS | WS | TA | ML (SYM Model) | |
Artificial Images | 90% | 91% | 84% | 80 | 45% |
In conclusion, SYM is a superior steganography technique when used with artificial images, and should be used in place of LSBR, LSBM, and Matrix Embedding.