icon Data Prep Runner (DPR)

DPR can be used for data prep as part of automating business processes, comparing data, and data profiling to help execute on integration projects. It works with all sorts of inputs, including text/CSV/Excel/XML/JSON files, web services, databases and works nicely with large files and datasets. It creates data profiles to help you understand your data. Java 10 is required to run DPR. It can be run in either batch mode or used interactively via a web interface or command line.
It uses a JSON/YAML config file to define the processing. Read Configuration documentation.
    run: {log: "${!taskname}-${!datetime}.log", params: {filename: "MyApp.2018-0320.1.log"}},
    project: { 
        processes: [
            {name: ParseLogFile, steps: [
                {type: importTextFile, file: "${!basedir}\\inputs\\${filename}", lineDelimiter: ">>>>"},
                {type: parseField, field: Line, delimiter: "|", fieldDefn: [
                    {name: Severity}, {name: Thread}, {name: LogTime}, {name: Msg}
                {type: filter, includes: [
                    {field: Severity, operator: in, value: [SEVERE, WARN, INFO]},
                    {conditions: [
                        {field: Msg, operator: contains, value: "Got Request:"}, 
                        {field: Severity, operator: equals, value: INFO}

Getting Started

  1. Pre-requisites - Java 10 is required to run DPR. Your JAVA_HOME should be set to point to the Java 10 runtime.
  2. Extract it to a directory on your machine - for the purposes of our documentation let's assume you have extracted to
    on Windows or to
    on Mac OS / Linux. You should setup your directory structure as such:
      bin\ -> this contains the program scripts dpr.cmd (for Windows) and dpr (for Unix) and can be added to your path
      libs\  -> this contains program library files
  3. Update your path - this will allow you to invoke the program with just `dpr` in your working directory.
  4. Windows

    Adding to PATH: Open up the system properties (WinKey + Pause), select the "Advanced" tab, click the "Environment Variables" button, and then add or select the PATH variable in the user variables with the value `c:\users\dan\dpr-1.0.0\bin`. The same dialog can be used to set JAVA_HOME to the location of your JDK, e.g. C:\Program Files\Java\jdk-10.0.2
    Open a new command prompt (Winkey + R then type cmd) and run dpr -v to verify the installation.

    Mac OS / UNIX

    Adding to PATH
    export PATH=/opt/dpr-1.0.0/bin:$PATH
  5. Create a project directory for each set of related files and work that you will be using DPR for At the base of that project directory, let's say c:\users\dan\mywork\LogAnalysis, you will create a dpr-config.yml file which will contain the settings for your project. This file is essentially the program that drives the processing.

Running DPR

Once you have created your dpr-config.yml file based on the documentation below, you can simply run the process by invoking the dpr program from the command line. This assumes that dpr is in your path. If you'd like to have the logs show up in a log file, create a `logs` directory in your project directory. Start in your project directory as such:
cd c:\users\dan\mywork\LogAnalysis
Next you can either run dpr in console mode as shown below along w/ some of the typical commands.
> help
> run ParseLogFile
> quit
Or you can run dpr with specifying what you would like it to do as such:
dpr run ParseLogFile
DPR can also be operated from a web UI by running it as follows:
dpr ui