In-Depth
Nielsen Media
Innovator Award Submission
Sybase IQ and Nielsen Media
I. Project Information
a. Company and division name: Sybase
c. Project designation
d. Brief explanation of the goals of the project:
For many years, Nielsen mailed audience-rating data on CD-ROMs and in summary databases in ASE to its clients each month. But when demand grew for more timely and detailed information, Nielsen decided to build an online data warehouse that would allow each client to access its vast data store on an ad hoc basis and to query that data rapidly. To help validate its architectural concepts for this new, Web-accessible data warehouse, Sun and Sybase® built a "reference architecture" in the Sun iForce Solution Center to test Nielsen's concept and workloads.
e. Brief description of the business risks involved:
Sybase IQ enables Nielsen to address a number of formidable challenges inherent in building and maintaining a very large data warehouse. For instance, Nielsen loads almost 300 million rows a day to the warehouse. And this needs to be done in a very narrow time frame. Nielsen has found Sybase IQ's load speed to be eight times faster than the solution it previously used.
f. Brief description of how the system helps users: Sybase IQ's data compression functionality saves Nielsen Media Research hard dollars by reducing the company's storage requirements by about 70 percent due to compression of raw data and not traditional explosion due to inherent indexes in traditional DBMS’s.
II. Organizational Objectives
a. What short- and long-term benefits did the organization achieve from the project? Did the solution meet the projected goals for saving time and money? How were benefits measured? Was the system missioncritical to the organization?
Key benefits:
- Provides overnight and historical audience rating data on demand
- Delivers responses 10 to 100 times faster even for the most complex queries against terabytes of data
- 70 percent compression ratio produces significant hard-cost storage savings
- 800 percent faster data loads increase productivity during maintenance window
- New client products and applications added quickly and easily
- Eliminated monthly CD-ROM mailings to clients
- Reduced complexity, costs and risks of deploying a new data warehouse
b. Describe the business purpose of the new system:
Timely, accurate and detailed information provided by Nielsen Media Research governs the $65 billion TV advertising industry. Data is its lifeblood. Using Sybase® IQ, Nielsen developed an online audience data warehouse enabling its clients to access audience data through the Internet.
c. Describe the features of the new system:
Provides Telecast and Time Period Television viewing information at the ¼ hour level for over 99 Demographic (information about people) Categories and 160 Market Breaks (information about households)
d. Explain the functions of the new system:
Sybase IQ enables Nielsen to address a number of formidable challenges inherent in building and maintaining a very large data warehouse. For instance, Nielsen loads almost 300 million rows a day to the warehouse. And this needs to be done in a very narrow time frame—an hour or two at the most. Nielsen has found Sybase IQ's load speed to be eight times faster than the solution it previously used.
e. Who were the internal sponsors of the project? Which officials or groups were opposed to developing the application? Why?
The key driver for the warehouse was actually IT. A key legacy customer reporting application was being redesigned to take advantage of Internet architecture to replace a fat client / server approach. The legacy application utilized both a central server database and local databases on the desktop. When analyzing the database needs for the new application, it was determined that a data warehouse could be built which not only would contain all the data required for this application, but could also be leveraged for many other reporting applications, many of which each had their own databases.
One hesitation in building the warehouse was how it would affect the delivery date of the reporting application, but the concept of a central common repository of the data was accepted as the right goal.
f. Were users of the system involved in the project during the planning and development phases? If so, how?
Our clients were represented by members of our Product Management and Marketing groups, who have responsibility for understanding our data and how our reporting applications and their associated clients use it in their business operations.
g. What were the greatest challenges in completing this project? How were they overcome?
Some of the large issues included:
- Designing a warehouse design that satisfied the initial reporting application but would also satisfy the logical data needs and performance needs of subsequent applications which would port to the new design. In addition, the design scope had to cover a significant portion of the overall footprint of data reported to Nielsen’s National client base, meaning we didn’t start with just a single data mart.
- Corresponding rules for accessing the warehouse and making further reporting computations had to be developed and then interpreted in software. Software was modified for both simplicity and performance reasons, and a series of summary tables were also developed to support larger report requests.
- Where to get historical data and ensure that it agreed with other reporting applications which were considered the currency for the data to our client base, which meant creating data quality checks as part of the deliverable.
- Create an ETL process to load such a massive amount of data a day
h. Were the goals changed as the project progressed? If so, what were the changes and why were they made?
III. Category
Data Warehousing
IV. Methodology/Process
a. Describe how productivity tools or techniques were used in the project.
Power Designer was essential in our development process, easing the transition from the first cut physical, a representation in another RDBMS and finally making conversion easy when we chose Sybase IQ.
b. Were testing tools used during development? If so, when were they used? Was the testing cost-effective?
Other than home grown queries and quality checks, no
c. Was a formal or informal software development life-cycle methodology employed? If yes, please describe it.
We followed Starsoft Solutions, Business Dimensional Modelling Methodology.
d. What formal or informal project management methodologies and/or tools were used to manage the project?
The project management methodology was relatively informal but did follow principles of the Data Warehouse Lifecyle
e. Were software quality metrics used? If so, what were they, and did using them significantly help the project? No
V. Technology
a. What were the major technical challenges that had to be overcome to complete the project successfully?
Overcoming some corporate resistance to Data Warehousing. Dealing with the enormous volumes of data we had to load, as well as massive deletes and updating that are not common to most Data Warehouses.
How did the team respond to those challenges?
We worked closely with Sybase to overcome these challenges. Sometimes it took creative software vs. packages to achieve the load performance goals. Other time it took working closely with Sybase IQ engineers to overcome issues. We had to work with hardware solutions for our backup process, ie SAN technology and mirrors of databases.
b. What software tools, including databases, operating systems and all development tools, were selected for the project? Why were they selected over competing tools? What process was used to select development tools and software platforms?
We had originally slated another data warehousing RDBMS as the product for this warehouse. We compared it with IQ on a 1TB test system and due to Sybase IQ’s oveall superior performance (loads and query times), and its ability to operate in a multi-plex environment made IQ the best solution for our warehouse.
c. Describe the overall system architecture. Were elements of the technical infrastructure put in place to support the new system? Please describe.
We used a n-tier approach to overall architecture to allow the database to do what databases are good at, selecting data, grouping, sorting etc. In the end we made sacrifices to do more mathematical functions in the database then needed, causing some performance concerns.
d. What characteristics of the tools and technologies used were most important in achieving the business purposes of the system?
Robustness and also stability. In the end, unfortunately we were and still our on bleeding edge with some of our tools and hardware.
VI. Project Team
a. What was the size of the development team?
10-12
b. Describe the software development experience of the team members:
Seasoned members with database knowledge and data knowledge, however none with prior knowledge of Sybase IQ
c. What was the composition and skill level of the team? Did development teams require training to work with the technology?
Some junior members that were trained so they could support once in production
d. Please list team members and their titles:
At the risk of leaving someone off across all the areas involved we won’t list team members. But acknowledged IT roles involved were Data Warehouse Architect, DBA, Mainframe and Server developers, SQA. Also significant involvement by Computer Operations, Product Management, Marketing, and Production Operations.
e. How many person-months/days did the project take, and over what calendar time frame? Was a formal schedule created at the start of the project? Did the project stay on schedule?
Took over 2 years to finally complete with all historical data and application ports. Still porting new applications to it and growing portfolio of applications
f. Did management and the user community consider the project a success?
Yes, the evidence based on more and more applications continuing to port to the warehouse
g. If you had to do the project over again, would you do anything differently? If yes, please explain why.
Yes, we would add a tier to handle data requests and remove the burden of SQL writing and tuning from the individual applications, as well as providing a single point for tuning.