Development of Rapid Data Quality and Data Integrity Validation Tool

CLIENT PROFILE

FirstEigen was founded in 2015, by Seth Rao. Seth has a Ph.D. in Engineering, with significant focus on has had experience in resolving Data Integrity issues. Leveraging the experience, FirstEigen created a tool, DataBuck, to validate Data Quality and Data Integrity for data in the 10’s to 100’s of GB’s in minutes. DataBuck can even handle petabytes of data without scaling issues. The First Eigen has received industry wide recognition and it was selected as a Gartner Cool Vendor-2017.

PROJECT DESCRIPTION

I2I has been working with FirstEigen since June 2017, in the development of DataBuck, the automatic data quality and continuous data monitoring product. I2I is also an implementation partner for DataBuck providing implementation services and support to clients across the globe. The following illustration depicts the workflow of quality testing.

TECHNOLOGY USED

• Java, Apache Spark,
• AWS, Multiple SQL,
• JavaScript/jQuery

CLIENT CHALLENGES

I2I encountered several challenges prior to project commencement and during its execution. Thanks to set processes and customized strategies, I2I is able to execute the project with customary precision. The challenges were as follows:

1. I2I developers had to procure training and knowledge of Big Data and data quality analysis.
2. We had to identify quality errors during the development of DataBuck.
3. Databuck was to be configured on cloud.
4. Technology was a major challenge for the developers, especially learning cloud computing and Apache Spark.
5. The unavailability of a resource for testing from the client side was another challenge.

OUR SOLUTION

We started with one developer and as the project requirements grew, five I2I developers were made part of the I2I team.

1. We provided development expertise in Java, Spark, Big Data area for the DataBuck product and Major, minor enhancements and bug fixes done in UI module and Core Engine.
2. I2I developers undertook training and knowledge procurement as and when required during the project.
3. Databuck was configured on cloud (Amazon AWS).
4. Developers with relatively less experience were able to execute the project successfully.
5. Developers undertook rigorous training in cloud computing and Apache Spark.
6. We also provide implementation and post-implementation support for DataBuck clients.
7. I2I undertook the responsibility for functional testing and fixing related issues.

CLIENT BENEFITS

• High level expertise in specialized skills
• Due to partnering with Ideas to Impacts, Client could save huge costs in the hardware, software and resources.
• Major, minor enhancements and bug fixes done in UI module.