This week, we worked on integrating the Social Searcher module and we made good progress on it.
We are starting to make room in the search results for the social media posts that will come from Social Searcher. There is more work to do to get Social Searcher API fully integrated, and we will continue the work on it next week.
We are still busy rendering the new design live. There were a number of bugs to fix and code mergers to do, to make sure all the social search functionality is present and functional. We have completed this successfully.
We are now studying the Social Search API and getting things ready on the Zabang side for this API integration. This will allow Zabang to search on social media postings, which doesn't happen much on Google.
This task is in progress, and we are still working on it this week, and will continue this next week as well.
A few things that you should know about the social search networkand how it is configured to work:
1) Now the process of data clustering is started, and it happens once an hour. We made it be so frequent because it is easier to test it. Some cron jobs are running every hour, and are updating the Sphinx data on the server. After a while, we will change the data to be clustered once a day, which will be enough.
2) Users are divided into groups when they fill in the questionnaire. If the user is not logged into Zabang, or did not complete the survey, then the Search will work for that user as before, and no social search will apply, because we don't know who that user is.
3) Information about logged in users is collected regardless of whether they belong to any group yet, or not. In other words, the information about what a logged in user is looking for and what interests him - is being recorded for all logged in users. However, as long as the logged in user has not filled in the questionnaire, that information about the user is not used at all. As soon as the logged in user fills in the questionnaire, he is assigned to SIGs, and the data that was gathered about him starts being used by the system automatically, and is added to the data of other users belonging to the same social network. All the data gathered about users belonging to the same social network, help users to obtain relevant information for their social network.
4) Let's say a new user has just registered, and didn't have time do do any searches on Zabang, but he did fill in the questionnaire. As a result of that, the system will put him in the appropriate social network and will start showing him content that is potentially interesting to him, based only on his demographic data, and on the data gathered from other users from the same social network.
5) The search results that users from a specific social network are visiting - determine what search results are considered as being more relevant for that social network. Over time, the system learns that certain search results get a lot more hits than others, which means that users from that social network are primarily interested in some results, and not interested as much in others. The results that are of a higher interest to that social network are pushed up for all members of that social network when searching for specific keywords.
6) The social network updates and distribution occurs once per hour, and is triggered by one of the cron jobs, as described in point 1 from above.
7) In order to start testing the social network, it is desirable to attract about 362 users or more to fill in our questionnaire. After that, they should be actively using the Zabang search. At that point, the social network and machine learning system that we developed will work at their full capacity and will work at its best.
It has been a lot of work, but it is getting there underneath the shell.
This past weekend until today has been hectic for us as we have deployed code on the live site. We've been doing tests and found a few areas that we had to update and improve. The initial tests gave good results about two weeks ago. The tests on the live site where there is a lot more users and searched data, showed some areas that would cause problems and would give inaccurate results in the future.
Here are the things that we did during the last two weeks:
- Integrated code for machine learning, additionally to what we had until recently
- This new code works with Python, which is a different back end programming language than PHP, which Zabang is written in. So we had additional development to do to get these to work with Zabang. The good part is that Python is super fast (faster than PHP), so for this machine learning part of the project this is a perfect fit.
- We had to update some of the PHP codes for them to integrate and communicate with the Python codes and the machine learning code
- Following these Python and PHP code updates, we had to change the search configurations.
We have a couple more tweaks and fixes to do, and we are continuing to work on this over the weekend to get everything completed, and be able to have our first beta testers fill in the questionnaires, and start using the site and the search.