Problem: Develop Hadoop infrastructure for one of the largest banks in the US for Enterprise asset aggregation analytics. The Hadoop ecosystem was to be used to track Bank’s credit risk and analyze loan data to identify the portfolio’s risks and reduce maximum losses of a portfolio and also to help decision makers to take steps to improve portfolio’s performance.
Implementation: Our architects are involved in setting up Hadoop infrastructure. Multiple clusters of more than 100 nodes will be used to import data into HDFS, pre-process the data and export into Netezza. ETL layer is developed using Hive and Pig. Netezza integration is being achieved with Sqoop.