Talend Unveils Open Source Clean Data App
- By Will Kraft
- August 26, 2008
Talend last week announced an open source software package to help clean data records. The San Diego-based company's new Data Quality product identifies and helps to remedy so-called "dirty data" in database management systems.
Dirty data is typically seen with nicknames and shortened street addresses in fields, which can lead to duplicate records. Talend's solution updates the data with standardized information gathered from the U.S. Postal Service and other sources.
The Data Quality product allows companies to easily distinguish between "Peggy," "Peg," "Marge" and "Meg" (all variations of "Margaret") when they reference the same person. It can match "William Smith at 15 Main Street" with "Billy Smith at 15 Main Str." Such inconsistencies have resulted in lost or redundant mailings in the past.
The software includes data profiling, cleansing and enrichment functions. Data profiling allows a company to track data degradation over time. With data cleansing, the software corrects "bad" data by cross-checking against other databases and reference data.
The data enrichment feature associates additional information with the data, which can be used to help target mailings to a specific demographic. The additional information might include latitude and longitude, census data, and credit scores.
The Data Quality product will be available in September as an individual product or as an extension to the Talend Integration Suite, which is the company's data integration service.
Earlier this summer, the company also announced a new open source data profiler called Talend Open Profiler. More information about Talend's data integration solutions can be found here.