Portals and personalization: The next generation

In last month's column, Curt discussed how developers and implementors require tools that let them meet the need for context- sensitivity without assuming a terrible new burden of complexity. Just as the old system analysis techniques -- data modeling, data normalization, object modeling and so on -- were invented to help reduce and manage complexity of applications, new extensions to handle "context" are needed.

One kind of context-sensitivity that's implemented pretty well today is what I call "subject-sensitivity" or "concept-sensitivity" -- whole dynamically generated (or semi-dynamic) pages based on one key context parameter, almost like a master-detail form run amok. The most visible examples are investment and other company news sites, which have one or more pages of information for each company (more precisely, for each company traded on certain stock exchanges, typically the major U.S. ones). Usually included are stock quotes and charts, hit lists for news-feed document searches on the company, financial ratios (based on historical databases, but recalculated in part using current stock prices), and other company-related information (much of it typically more static).

Primitive personalization

More problematic is "personalization." One of the big themes of Web site competition is straightforward personalization: My Yahoo, My Excite, My AOL, My Many Many Others. Users fill in a form about their interests, location, etc., and the site -- based on that "context" -- gives them the information (and advertisements) it thinks they care about.

Up to a point, these capabilities are popular. Many users are happy to specify stock portfolios, "watch lists" of companies and a couple of other categories of breaking news (weather and personal interests such as sports, health, science, hobbies, etc.). Otherwise, personalization's track record is pretty spotty. Without filling in a lot of forms, the level of precision isn't great; but filling in a lot of information seems like an awful lot of trouble.

In my opinion, there are two basic problems with most implementations of personalization to date: They're too coarse-grained; and they're too static.

Personally, I'd like my basic news-headlines interface to take account of factors such as:

  • 3How busy I am at the moment.
  • Whether I'm traveling, and, if so, which companies I am about to visit.
  • The stocks in my portfolio.
  • The companies I track because I hope to do business with them.
  • The companies I track just for general industry background.
  • The kinds of information I like to track about companies I know well.
  • The kinds of information I like to see when researching software companies I don't know well.
  • The kinds of information I like to see when checking up on investment banking firms.
  • The kinds of information it would benefit me to see when researching companies in other industries in which I'm not a domain expert.

The key to most of this functionality would be the intelligent use of context -- much more context than is reflected in this kind of app today.

If there's one area where context-sensitivity is needed, it's text search. Notwithstanding ongoing improvements, a typical search engine gives a lot of irrelevant results. Wouldn't it be nice if the engine "understood" context, and gave much more accurate results?

The same plaint can be made about a more structured text-filtering application, such as a part of a salesman's briefing book. Let's assume we're trying to track events that might produce hitherto-unrecognized sales leads. For example, if a company is acquired, all its previous "strategic" software decisions are up for re-evaluation. The same goes for any other major kind of deal, organizational change, plant opening, layoff, new business strategy or other development that could herald a change in business priorities. Salespeople would do well to stay informed of such events. (If you prefer, you can think of this as an investment-research application example, since exactly the same needs are present. And similar issues arise in many different kinds of apps.)

Unfortunately, the required level of article-filtering technology does not exist in useful form today, because people wrongly believe that it's too hard to write and
deliver the queries. For example, to jump headlong into some computational linguistics: It's probably impossible to construct a single text query that will do a great job of returning articles that are about plant openings and only about plant openings. However, it's more reasonable to construct a set of industry-specific queries to meet the same goal -- i.e., one query that does a great job of finding articles about semiconductor manufacturer's wafer fabs, another that finds articles on steel mills and smelters, another that finds articles on soft drink bottlers and so on.

Note: Full discussion of these issues -- including the reasons why developing a query for every industry is not as great a task as it seems -- is beyond the scope of this article. But the main computational-linguistics point is the prevalence of industry-specific synonyms for basic concepts; for example, "mill," "fabrication" and "bottling" are virtual synonyms for "plant" when -- and only when -- discussing the steel, semiconductor and soda industries, respectively.

In other words -- key point here! -- the query needs to vary substantially, according to which industry (or industries) the company in question belongs to.

Another complicating factor is that text queries about companies in different industries or locations need to be executed against whole different collections of articles. This point should be pretty obvious in the case of industry-specific trade press, and it also can arise in other instances, such as online newsgroups, hometown newspapers and so on. What's more, these different collections might be stored in entirely different text-management systems on different computers -- or they might be in the same system, but with subtle differences in the meta data different publishers provide to help improve query accuracy. (Note: In the new age, this "variable-schema" problem can even arise in relational applications -- see the next section.)

In other words, the "same" query may actually be executed in a completely different way, depending on context! If it's a relational query, the FROM and WHERE clauses may be very different from one context to the next, and even the SELECT clause could show some variation.

Let me call out that point one more time: A component (in this case, a query) does different things, depending on context. Hang onto that thought. It's central to this whole discussion.

Quick digression on text-filtering: The relational assumption above was not idly chosen. The SQL/MM standard for multimedia data access has finally been adopted, and IBM and Oracle are implementing at least the text-handling portions. Non-SQL/ MM-compliant text-query and relational-middleware vendors are therefore risking destruction. (The imminence of SQL/MM, of course, just further strengthens my belief that relational database application development and deployment paradigms are due for some serious extension.)

An extreme case: e-commerce

In line with the prior section, the most profound need for sophisticated information filtering arises in e-commerce. In most search engines and similar portal applications, you get unwanted information by accident. But in a marketing situation, people try to give you unwanted information on purpose. So a user's defenses in that case need to be particularly sophisticated.

E-commerce also raises the most extreme form of a problem hinted at briefly above -- variable schemas. Not only might every vendor have a different schema for their inventory databases, but different kinds of merchandise have different attributes that can be tracked! The XML EDI-replacement community is breaking its head on these issues -- but exactly the same ones arise from the online shopper's viewpoint. And the key to a solution is, again, context-sensitivity.

The essence of context-based architectures

Rather than enumerate yet more examples of problems that need to be solved, let's consider the common element so far: The "same" component has to behave "differently" in different "contexts." That's the essence of "context-sensitivity," and hence the key to a new generation of killer apps.

In the text-filtering case, we saw that a component had to behave differently in different contexts, where in that case "context" was mainly the primary subject of inquiry.

Similarly, to generalize personalization, it's pretty obvious that what we need is for components to behave differently in different contexts, where a "context" now is some set of attributes by which you want to personalize (see my laundry list above).

Indeed, in the case of e-commerce, it's important to be context-sensitive in both ways at once; I want to get the information I want about the product categories I want; I don't want to be told what the "average" person is willing to listen to about the product category.

The "plumbing" needs some context-sensitivity too. Graphics are a bandwidth hog, and hence should maybe be cached near the most likely users. But who is likely to want some particular set of images? Well, that depends on the subject of the images, and the roles and goals of the individual users ... and we're right back into a context-sensitive scenario.

So what does it mean for a component to "behave differently" in different contexts? Clearly, it has to involve some kind of choice among candidate versions of that component. There are three implications that have to be satisfied for this to make sense.

First of all, component boundaries have to be sensibly aligned with the functionality that needs to be context-sensitive--which means, first and foremost, that they have to be sensibly drawn, period, so as to accommodate the natural structure of today's new apps. For that reason, many of the requirements for context-based architectures don't deal explicitly with context-sensitivity at all; rather, they're the kind of thing addressed in "The end of traditional architectures" [see "The age of context-based architectures," ADT, May 1999, p. 100].

Second, there has to be some sort of a naming and instantiation model that's consistent with the demands of the component architecture used. I think this is possible in any major or reasonable component model (anything Java-based, for instance, but also COM+), but a detailed discussion is beyond the scope of this article.

Finally, different components have to have a consistent set of possible "contexts" to align themselves by. In other words, there needs to be a "context model," much like a data model or object model. Since these things don't exist yet, we might as well make sure that they're all consistent with each other -- i.e., they should be partly standardized, albeit fully extensible.