A few things that you should know about the social search networkand how it is configured to work:
1) Now the process of data clustering is started, and it happens once an hour. We made it be so frequent because it is easier to test it. Some cron jobs are running every hour, and are updating the Sphinx data on the server. After a while, we will change the data to be clustered once a day, which will be enough.
2) Users are divided into groups when they fill in the questionnaire. If the user is not logged into Zabang, or did not complete the survey, then the Search will work for that user as before, and no social search will apply, because we don't know who that user is.
3) Information about logged in users is collected regardless of whether they belong to any group yet, or not. In other words, the information about what a logged in user is looking for and what interests him - is being recorded for all logged in users. However, as long as the logged in user has not filled in the questionnaire, that information about the user is not used at all. As soon as the logged in user fills in the questionnaire, he is assigned to SIGs, and the data that was gathered about him starts being used by the system automatically, and is added to the data of other users belonging to the same social network. All the data gathered about users belonging to the same social network, help users to obtain relevant information for their social network.
4) Let's say a new user has just registered, and didn't have time do do any searches on Zabang, but he did fill in the questionnaire. As a result of that, the system will put him in the appropriate social network and will start showing him content that is potentially interesting to him, based only on his demographic data, and on the data gathered from other users from the same social network.
5) The search results that users from a specific social network are visiting - determine what search results are considered as being more relevant for that social network. Over time, the system learns that certain search results get a lot more hits than others, which means that users from that social network are primarily interested in some results, and not interested as much in others. The results that are of a higher interest to that social network are pushed up for all members of that social network when searching for specific keywords.
6) The social network updates and distribution occurs once per hour, and is triggered by one of the cron jobs, as described in point 1 from above.
7) In order to start testing the social network, it is desirable to attract about 362 users or more to fill in our questionnaire. After that, they should be actively using the Zabang search. At that point, the social network and machine learning system that we developed will work at their full capacity and will work at its best.
It has been a lot of work, but it is getting there underneath the shell.