Data Crunchers
{unlocking the value in your data}
A Case Study - Helping to illustrate how it works
Your Requirement -
You have a data set containing the list of sales for the last 3 months – some 1,000,000+ records. Each record in the text file represents a sale defining - date of sale, sale amount, product barcodes, sales person, customer salutation, customer surname, customer email address.
You also have 2 more supporting data sets:
-
a text file that contains the customer first name, surname, gender, email address and cell phone number;
-
a text file that contains the brand, model, barcode, cost price and sales price;
You decide you want to send an SMS – offering a 15% discount – to each customer that bought a Brand A product in the last month (April) where total sale amount was greater than R1500.00.
At the same time the manufacturer of Brand A has asked for a report detailing the number of units of each model sold during the month of April – and what percentage were sold to females.
Your data looks like this:
File 1
03-Apr-15,0000154322,9780201379624|9780201479642|9780201869635,JBloggs,Mr,Tmensah,mensa@gmail.com
File 2
Thomas,Mensah,male,mensa@gmail.com,+27793889990
File 3
Brand A ModelX 978020186963500000600000000089999
Consolidating & Extraction
The data highlighted below will need to be extracted:
File 1
03-Apr-15,0000154322,9780201379624|9780201479642|9780201869635,JBloggs,Mr,Tmensah,mensa@gmail.com
File 2
Thomas,Mensah,male,mensa@gmail.com,+27793889990
File 3
Brand A ModelX 978020186963500000600000000089999
Note:
-
In file 1 and 2 the fields are comma separated and are identified accordingly.
-
In file 3 each field occurs at a specific position in the record and adheres to a specific length. The fields are identified according to this pattern.
Merging & Transforming
The records that are common to all 3 data sets will be merged and transformed and output to a single temporary file.
File 1
03-Apr-15,0000154322,9780201379624|9780201479642|9780201869635,Mr,Tmensah,mensa@gmail.com
File 2
male,mensa@gmail.com,+27793889990
File 3
Brand A ModelX 9780201869635
1st Temporary Output File
03-Apr-15,0000154322,Brand A ,ModelX ,Mr,T,mensah,male,+27793889990
Validation & Formatting
Specific fields of each record in the 1st temporary file are validated against a defined format. If validation fails, then action will be taken to format the offending field as required.
1st Temporary Output File
03-Apr-15,0000154322,Brand A ,ModelX ,Mr,T,mensah,male,+27793889990
2nd Temporary Output File
April;R1543,22;Brand A ;ModelX;Mr;T;Mensah;male;0027793889990
Sorting & Filtering
The 2nd temporary file is checked for duplicate records. Any duplicate records are removed and the data then sorted on the date field - ascending order to ensure that all records with a “date of sale” of April will be grouped together and presented at the top of the final output file.
The data will then be filtered and the data output to 2 final files – one to meet requirement 1 and the other requirement 2.
2nd Temporary Output File
April;R1543,22;Brand A ;ModelX;Mr;T;Mensah;male;0027793889990
Requirement 1 – Final Output File
April;R1543,22;Brand A;Mr;T;Mensah;0027793889990
{Filter: Where "product barcodes" = "Brand A" and "date of sale" = "April" and "sale amount" > R1500,00}
Requirement 2 – Final Output File
April;Brand A;ModelX;male
{Filter: Where “model" != “null" and "date of sale" = "April"}
Reporting
As per Requirement 2 – the Final Output File is translated and output as a report.
Requirement 2 – Final Output File
April;Brand A;ModelX;male
Automation
As per Requirement 1 – the Final Output File and report file is then emailed to an email address and / or uploaded to an SFTP directory of your choice.
You may want to run the same campaign for 6 consecutive months – in which case the Data-Crunchers process can be scheduled to automatically execute on the 1st of each month – pull the input files from an email or SFTP directory and output the files to an email address / SFTP directory.