Alfred is a custom data ingestion engine that acts as a gatekeeper to prevent ungoverned data from being loaded into a data lake. It allows business users to upload and analyze data themselves. Alfred enables the business user to define and implement files for ingestion. With a simple and intuitive user interface, the customer can provide the file details and submit directly.
This process will automatically perform much of the technical setup and configuration.
This allows user to more quickly determine if data has value and should be promoted to a production process.

The Technology Behind Alfred

Alfred’s set of REST services is a Java 7 Spring Boot project. Java 7 was chosen for compatibility with HDFS edge node Java 7 installs. The UI is a React project. The ingestion scripts are written in Python 2.7. It currently has been tested and operates on Hive, HDFS, and a Unix-based system. It has been tested and operates on the Cloudera Quickstart VM, but it is not at all Cloudera dependent.

Alfred’s Data Flow

There are 3 types of datasets within Alfred: Sandbox, Production and Refined.