Freebase - Now Open For Searching

Phil Butler,


freebase logoFreebase is an accessible database that is editable like Wikia and Wikipedia. The startup has been in private alpha testing but just opened beta doors to the public. The service is aimed at organizing the world's data. The database has been seeded with over 2 million topics from Wikipedia and other sources and this data is currently in "read" form for everyone with "write" capabilities reserved for registered users. Freebase aims to deliver on Google's promise to organize the world's data.

The heart of Freebase is the searchable database that essentially filters down data via keyword narrowed queries. Essentially the user narrows subjects until the desired result is obtained. Freebase allows users to edit incorrect data in their path and submission of new data as well. The similarity to Wikia or Wikipedia is readily evident as far as user contributions to the database but not in the look or feel. Freebase is tackling the vast data of the Web in an interesting if not unique way by seeking user collaboration within a new interface.

Freebase has added a great deal of data since I first visited with tens and hundreds of thousands of entries on topics from sports to film. Freebase is also available via its API for applications offsite in read-only format.

Testing Relevance

Semantic is another term for "meaning", and several innovative ventures are attacking improved relevance via the use of advanced technology combined with either natural language and/or human filtering. I performed a simple test of my own to see just how effective Freebase's variant could be. I was sure that my favorite search engine hakia would be able to eclipse anything that Freebase could come up with at this stage of their development. I was surprised to find that the Freebase result was interestingly relevant while the hakia results provided more depth in a keyword search for "Island and Thera".

freebase result

The hakia result has captured these keywords contextually within sentences, while the Freebase top results have pinpointed exactly the Island of Santorini. This demonstrates to a degree the power of human filtered search. The hakia results rendered better choices for a broad search intended for narrowing, while the Freebase one rendered rather exactly relevant top results (however limited in scope) for the "thing". This is by no means a clinical analysis of the two entities but it is interesting to see the two philosophies in action.

Man vs. Machine

Obviously Freebase is in a rather embryonic stage of development, as is hakia to some extent. Increased meaning in search has gone from a novelty idea to an accepted eventuality for most people. Jimmy Wales has approached this from the human search angle with Search -Wikia and Riza Berkan and the great scientists of hakia are attacking the problem from an advanced technological standpoint as are Powerset and others.

It is rather obvious that human search has two fundamental obstacles to overcome. First, the amount of data to be gleaned is massive - the results for overall queries will remain limited by submissions - this is a function of time. Secondly, the sources of the information will be subjective unless further refinements are made to results - this is a quality function. Human search could be the most relevant of all given time and scrutiny, but making a data base of the necessary scope is daunting.

Hakia, Powerset and the others are confronted with one big problem in my view - that is the random occurrence (or inserted in the case of SEO) of sentences and even whole pages of seemingly relevant content within a huge number of documents. In the example below, it is evident that a hotel site has crept into the mix. It is obvious that hakia is increasingly "trimming" down these kinds of results, but the problem of semantics within somewhat irrelevant documents is a problem. I think this can be overcome (I have an idea Melek) and we should also not loose sight of the fact that the human that is "us" individually needs to be the final filtering agent.  

hakia search

Conclusion - Alberts

Freebase has great potential for helping users find relevance (on some level) easily, and also for collaboration within its community. The power of user generated content has been revealed via the conduits of Wikipedia, MySpace, Facebook and a host of other communities, but there are limitations. Freebase is dependent on Wikipedians and their cousins resident on the Web. In the end, Search-Wikia will obviously aggregate the flock into a much larger human knowledge base over time.

I have always asserted that the perfect search engine is a collaboration between these two philosophies. Hakia, Powerset and Search Wikia combined could search out and narrow until what is presented is geometrically more "exact" than anysingular method. I know my good friends at hakia are capable of producing AI, but that one "personality" will still be just one Einstein amidst a sea of Schweitzer's. A sea of Alberts face a daunting task when confronted with an unfiltered universe. 


If you enjoyed this post, make sure you subscribe to profy RSS feed!
4 Comments (Subscribe to rss)
  • Dear Phil

    If your query to hakia was, say “Malta” instead of “Island and Thera” your discussion above may have been much different. hakia has not yet covered its analysis of the content on http://WWW. Thus, I am not sure if the example above is valid for the comparison of man-versus-machine at this early stage. But your point is well taken.

    Your philosophy of the hybrid system is very reasonable, however the discussion is the degree of contribution. In my personal opinion, a system with limitless user contribution encourages the ambitious people, those who want to push their agenda, their view, their products, and services. So, it can become a rat race very easy at the expense of neglecting credibility and fairness. At the end, nice and quite people will suffer. More importantly, kids and students using these systems will be more vulnerable.

    As you suggested, we should have a round table discussion one day on this interesting topic.

    Best Regards

    Riza

  • What is the legality of pulling content from wikipedia? Not just the legality but wikipedia is known for being easily abused.

  • Hello Riza and hi Steve!

    Yes Riza your are quite correct - the discussion was in fact very different with any of several such quires. If I had the time I would have tried to find some resultant variables with the various forms eg. sentences, keywords, single words and also less obscure questionings too.

    I am sure once hakia has begun deciphering massive amounts of web data that there will really be more comparison, but as you kindly and honestly suggest - a combination of human scrutiny, especially at a higher level would improve (narrow) relevance substantially.

    As for the round table - I know all of you are open to whatever brings out the best result. I too worry about objectivity and true answers as I am sure you know. In the precarious world in which we live - further ambiguity and blind ambition have less and less of a place.

    Thank you as always Riza for your time and caring.

    Steve, I believe that Wikipedia is the open source of open sources as far as grabbing data is concerned. I know Jimmy fits so prominently into the discussion here as well as into the one dealing with data and knowledge.

    You make a valid point in regard to tampering and abuse, I myself am often amazed at how “pristine” Wikipedia and even Wikia have been in this regard - with only a hiccup here and there.

    As Riza so aptly pointed out - any system with limitless contributory access has its problems. I think what we are looking at in the end (and it should be quite obvious I think) is a system of filter upon filter. The task at hand (besides the great work of hakia and others in semantic search) is a technology that accelerates a whole tier of filters.

    This is fascinating stuff - the most interesting in my time as a blogger for sure - and without a doubt the most important.

    Thanks always, Phil

  • Very interesting discussion so far but one thing that hasn’t been highlighted here is the very real difference between a wiki and something like freebase.

    With freebase the “semantics” are applied at the point of data entry which is in a different universe to the likes of wikipedia. This distinction was of course made years ago in off line application so I’m incredibly happy to see it finally arrive in the web knowledge space.

    Not to labor the point but if this is done right freebase.com vs wikipedia.com will look like sql server vs notepad :)

    I honestly believe (coming from a background making applications) that the key to any kind of semantic web is to establish the relationship and meaning of data “at point of entry” rather than processing it later. The only caveat being if that is an option (as it may be with the next generation of wikis)

    As Phil said this is a very interesting discussion and the outcome of these kinds of sites will literally affect the very fabric of the web.

Leave a comment (We support avatars from Gravatar, MyBlogLog, and FriendFeed)