September 17, 2008
Like death and taxes, the perplexities of data integration are unavoidable. The never-ending data explosion is just one factor. Mergers and acquisitions, globalization, outsourcing, partnerships, and regulations all contribute to the massive pile-up of data from different repositories, business systems and applications, in different formats and languages, structured and unstructured. Giving people a single, cohesive view into what would otherwise be a massive quagmire is not easy.
“Data integration is always a challenge and will remain that way because data grows exponentially and new types always have to be added to the mix,” says Krishna Roy, enterprise software analyst for The 451 Group. “Keeping up with the growth of data and enabling it to be not only integrated, but cleansed, as well, is the big challenge.”
According to Roy and other enterprise IT analysts, one of the leading companies meeting that challenge is Informatica. The company's flagship, PowerCenter, is a platform for accessing and integrating data from different business systems and repositories, then sharing that data throughout the enterprise. A grid version lets organizations distribute data integration tasks in a scalable, resilient, high-performance environment.
PowerCenter, like Informatica, has evolved over the years to handle all the different chores involved in an integration undertaking, including data migration and replication, synchronization, master data management, governance and standardization. "We started out to help with the automation of data warehouses, taking data from lots of different sources and providing a holistic view of it, whether it came from mainframes, packaged applications, databases, message queues, all the different feeds, and even data outside the enterprise," says Adam Wilson, senior vice president of product management and marketing. During the past eight or so years, the company has expanded beyond data warehousing into “broader data integration, including data from outside the enterprise, data that’s structured or unstructured,” Wilson says. “It’s really broader business intelligence we’re focusing on.”
The increasing need for better data integration is being driven, Wilson says, by companies trying to "get proactive about governing their data, and providing access to that data,” plus globalization, which brings not only new data systems to contend with, but also new sources, such as partners and providers, not to mention new formats, character sets and regulations.
After Virgin Media in the United Kingdom acquired ntl and Telewest as part of its expansion into online and telephony services, the firm had 20 new data sources and 5.6 million customers to consolidate into their operational systems. "They said they really needed a way to pull together all this data that's smeared across all these different systems,” Wilson says. “They wanted a single view of all their 10 million customers, and that information was kept in an Oracle data warehouse, various customer management systems, and running on a mix of hardware and operating systems.” Taking advantage of PowerCenter’s real-time capabilities, Virgin was able to build a consolidated data integration hub that delivers updated information, resulting in better customer service and more accurate market analysis.
Getting on the Grid
With PowerCenter 8 in 2006, Informatica extended its integration capabilities to the grid, enabling customers to distribute tasks across multiple processor nodes while taking advantage of commodity hardware, scalability and high availability. With the Enterprise Grid Option, Informatica says, it has developed a grid system that understands data integration tasks and the resources they need, and is able to adapt accordingly. PowerCenter 7 -- way back in 2003 -- included some basic grid support, but the version 8 and the Enterprise Grid Option brings the platform's whole shebang to a grid environment: universal access to a wide range of applications and legacy systems; data delivery functions; and tools for capturing, cleansing, managing and migrating data.
In the PowerCenter grid, data integration services, repository services, and logging services run on one more nodes (logical representations of a physical machine). On each node, a service manager handles the services assigned to that node. Each manager keeps statistics on available CPU usage, available memory and number of running threads. What Informatica calls “gateway nodes” control the routing of service manager requests. The gateway keeps an eye on the availability of other nodes, manages application services, and makes sure services execute by dynamically redirecting to a secondary node if the primary node is down. Administrators can specify the resources available to run a task on each node, and also assign higher priority to the most important integration operations. A GUI Web console provides central control for adding nodes or services and managing resources across the grid.
Adaptive load balancing dynamically assigns and executes tasks on the basis of resource availability or according to the resource requirements of a particular data integration task. Dynamic partitioning adjusts the parallel execution plan when nodes are added or dropped. PowerCenter's load balancing technology is platform-agnostic and can interoperate across a heterogeneous grid environment (Linux, Windows, Unix; different CPU speeds; 32- and 64-bit software; varying memory capacities among nodes; etc.).
Page: 1 of 3(Digg, Technorati, more)