Data Processing Engine

Run and orchestrate production-grade ETL workflows.

Process
Execute batch data processing jobs to extract, transform and load data from any source to any destination.
Automate
Create workflows with the visual builder and schedule them to run in order to automate any data-intensive task.
Develop
Code and run any custom Python or PySpark script, leveraging a complete SDK with 40+ connectors.
Iterate
Organize and version your code, via native versioning system or via Git integration

Who is this for?

Data Engineers

Create pipelines to extract data from enterprise data sources and aggregate them into data warehouse tables at any frequency.

ML Engineers

Carry out all the data cleaning and feature engineering needed for the training of machine learning (ML) models.

Software Engineers

Deploy any data-intensive code, such as custom Python optimization solvers to compute equation optimums.

Data Processing Engine features

Create and customize data processing tasks

Connect to any data source and process data into any destination. A rich catalog of pre-built job templates allows you to build actions for data extraction, loading, aggregation, cleaning, and metadata updates. Code and run any custom script in Python or PySpark to tackle specific use cases, leveraging a complete SDK with more than 40 connectors. If you have existing data processing scripts in Python, simply import them to centralize and orchestrate them into ForePaaS.

Custom actions let you manage packages and dependencies, including your own custom libraries you can reuse from a project to another. Data Processing Engine comes with two version control systems to ensure production-critical workloads are never impacted: ForePaaS version control allows you to track simple version evolution on the platform, while developers can synchronize with any external Git repository.

Define and orchestrate workflows

Make the most out of Data Processing Engine’s orchestrator’s drag-and-drop experience to define, sequence and schedule jobs and resource management with workers you can control as needed to scale appropriately. A user-friendly visual builder enables you to visualize and execute your plan on the cloud, without requiring deep technical knowledge nor manage cloud infrastructure. Schedule triggers to automate job executions, including CRON-based triggers.

Run and scale data processing pipelines on the cloud

Execute single actions or whole workflows as jobs, in one API call. Data Processing Engine integrates two engines for you to choose from: a Pandas engine (in Python 3) optimized for smaller data processing tasks, and Spark engine (in PySpark) for the data-intensive workloads.

Scale your jobs horizontally and vertically for faster execution, using ForePaaS Units (1 FPU-S = 0,5 CPU & 2 GB RAM). Leverage the power of segmentation to parallelize tasks and accelerate processing. Use our perimeter option to include or exclude data points beyond a given perimeter.

Monitor job performances and executions

Visualize a complete report of the job execution details for each execution, including workers’ CPU and RAM over time, complete execution logs and more. Troubleshoot your jobs, and optimize resource consumption by pinpointing chokepoints in your workflows.

Get notified for success or failure of jobs, duration or RAM usage by integrating with ForePaaS Control Center and setting up alerts on job executions. Manage fine-grained access control with ForePaaS IAM.

47%

average weekly time perceived as wasted for data preparation

Source: IDC, February 2019

32%

of data experts feel dissatisfied about the way data is prepared for analytics

Source: Data Preparation CXP Group, 2017

33%

of data workers feel they spend too much time on data preparation

Source: IDC, February 2019

Process

Automate

Develop

Iterate

Create and customize data processing tasks

Define and orchestrate workflows

Run and scale data processing pipelines on the cloud

Monitor job performances and executions