Friday, August 17, 2007

Picture SOM

Howdy,

This article is about the use of SOM's, I've made an interesting application to demonstrate the use of this self organizing maps.

This program loads pictures and groups them with each other.
So that matching or simular pictures are put together.
I saw this once on a longhorn demo, and I wanted to know how this stuff worked.

Step 1:

So the first step of this project was to download hot women, not nude, otherwise I get in trouble with my girlfriend ;), I found a site (trough another article also) fobpro where you can download thumbnails of woman in simular positions, and with simular backgrounds etc. Which is excellent to test on.

Step 2:

Step 2 was getting data from pictures.
I began with getting the histogram of an image. (greyscale and hsl), an histogram is actually a series of buckets where you put values in, meaning, you look at every pixel and calculate a value from it, say from 0 to 255, then you take for example 16 buckets, with range 255 / 16 and you put the pixel-values in the right bucket, at the end we have the counts of the ammount of values that are in each bucket, and we have our histogram. These 'buckets' will be our inputs for a neural network later on.

Second thing I did was taking Color and Texture area's from the image, in this case, we devide the image (thumbnailed to a size of 50x50) in 9 pieces. (Note that overlapping areas would be better) For each area, we calculate the average color, and texture value ( high value = noisy area, low value is not-noisy area )
This results in 9 values / picture. So 9 input vectors.



On the pic you see on the left the histogram of the image, and the red lines are the area's where i can take the texture or color values from.

I made a library that can fetch these values or combination of, and put them in an input vector. So if we take a histogram and an area, then we have 16 + 9 inputs.

Step 3:

Here I made loaded and thumbnailed the images into memory, and made input vectors for each image, so that at the end I have a list of input vectors to train the neural SOM network with.

As feature map i took a matrix of 10 by 10, and i trained the network with a series of 200 pictures from fobpro.
This way we have only an average of 2 pictures / mapnode.

Step 4:

In step 4 i made the GUI, providing possibilities to manage which types of histograms or area's we would like to train on, also the map-size, and a few buttons to select the input directory, to train the network and to provide an image that's not in the trained collection.

Step 5:

Testing the application :)

First i selected an input directory, and trained the network.
For the thumbnails, it took a few minutes.



After the training has been done, we get the pictures distributed over the mapnodes, is the matrix of buttons on the right.
The number on the buttons tells me how many pictures are under each node.

I've selected the top-left node, and you see the simular pictures found under this node (3 pictures):



Now we take a look the the top-right node, which has 4 nice pictures below it:



If we look at the bottom-left node with 4 pixtures, then we see that the pictures
are of the same model in the same clothes, but in other positions, so it works!



This is the bottom-right node of 3 pictures:



Now let's test that the network really can group the pictures together!
I fed the network an image that was not in the collection of the training data,
and look, it has found the right picture-set! In the first set, there is another girl with the same clothes on, with different hair, the network looks at color and texture, histograms and areas, in this case the histogram had 64 buckets, and the area's 9 divisions. So the input vector for this network was 64 + 64 + 9 + 9 nodes.

The other girl matched the texture area's and texture histogram pretty closely, also the color histogram is almost the same, and the color area's which are averages will be almost the same, so it's correct that the girl is in this collection.



Let's feed another one:



Conclusion:

The SOM network can do amazing stuff, there are tons of other applications where you can use it for. The training did not take very long, because all thumbs are resized before getting the data from it.

The next step would be face recognition, and edge detection, so that I can actually filter persons out of the collection, that would be really cool ;)

I also found some information about hierachically growing SOM networks, that way the network has a granularity for the spreading of the nodes.
For example if we take the color som, and we get different shades of red, then the shades of red would come under a node which is also a som network, that way we have toplevel red, green, etc, and sublevel under the red, the shades of red, this way we can group even better.

The next thing i'm going to try is to finetune this application and make it a WCF service (.NET 3.0) and feed it my entire picture collection, so that I can make a kind of picasa form grouping pictures :)

Now, it's just therapy to learn about A.I and make some pretty cool thing with it, in a pretty-learning environment. Not that the pictures supplied here are only for educational purposes :)

I hope you got somewhat wiser about neural networks, and the possibilies... I will come back later with more fun stuff :)

Regards,

Whizzo

7 comments:

Anonymous said...

How do you know that which pictures belong to a certain mapnode? are the values of the input vector of that node exactly the same as the pictures or do you take an average?

Whizzo said...

Hi,

Well the input vector is a calculation of the pixels in the pictures, like a histogram etc (you see the options in the screenshot), with a histogram, you devide the picture in 'buckets', and the values of the buckets are the input values. When you select more algorithms, the input vector expands...

There is a mapping between an input vector and a picture, when you present the input pattern, you'll get a 'winning' node, and then you map the picture to the node. That's what you see in the grid on the right of the application.

I hope this answers your question ;)

Regards
F.

Anonymous said...

Thank you for your reply. I was looking at it the wrong way but now I understand how the pictures are spread over the map nodes.

I have made adjustments to my own program and it seems to work :)

Good fun

dj_enkidu said...

how did you calculate the texture? i couldn't find any info on the web about a practical algorithm or method to calculate texture. i would appreciate it if you could point me to any info regarding this.
thanks for all the info.

Dyv said...

have a look at http://www.mperfect.net/aiSomPic

Anonymous said...

Hello,

can you explain the bucket part more in detail?
I don't know if the attached approach is the right one.

And what is a "texture value" of an image. I really cannot find anything that explains this clearly (even on the linked site is nothing really useful)?!

The SOM code I used is based on one of the linked sites (ai-junkie) and works quite well.
I can load a color table and the random weights get sorted to the defined colors, so I have a nice ordered color picture as result.

And I have a question for this cite of you:
"when you present the input pattern, you'll get a 'winning' node, "
I know that if I have trained the SOM that each node has one or more final weights and I assume that those nodes (weights) are your 10x10 grids of buttons, correct?
The SOM node itself has a maximum number of one weight (which - if displayed - is a grayscale map) as I understand your article correctly?

But I don't understand the mapping at all. Lets say we have our 50x50 thumbnail as search pattern.
How is this supposed to be transformed to match one single weight?!


Finally, here is the bucket approach. It'd be nice if you tell me if it's correct or what is wrong:

double bucketDivisor = 255. / 16.;
double* buckets = new double[ 16 ];
for ( int b = 0; b < 16; ++b ) {
buckets[b] = 0.;
}
double pixels = BM.bmHeight * BM.bmWidth;

for( int y = 0; y < BM.bmHeight; ++y) {
for( int x = 0; x < BM.bmWidth; ++x) {
COLORREF col = GetPixel( bmpX, x, y );
double r = (double) GetRValue( col );
double g = (double) GetGValue( col );
double b = (double) GetBValue( col );
double avg = ( r + g + b ) / 3.; // get average color


int idx = (int) (( avg / 255. ) * 16. ) ;
if ( idx > 15 ) {
--idx;
}
++buckets[ idx ];
}
}
for( int b = 0; b < 16; ++b ) {
std::vector< double > val;
val.push_back( buckets[b] / pixels );
m_TrainingSet.push_back( val );
}
delete [] buckets;


Many thanks!

Anonymous said...

Hey, forget my post above :).

Though I still don't know what a texture value is, but when I was doing my business on the toilet this morning I realized what I did wrong.
It was a chain reaction of missunderstandings [..].

And the bucket code was plain wrong at all, too.

With an input vector of 94 values, a trained map, the understanding that the accuracy depends on the map size as well as on the initial weights (and of course the number of training data) and that I have to use the training data to map the nodes to the pictures I finally got it to work.

Sorry, for the previous stupid post. I just thought too hard about it.