Analytics Hands-On

Activity

In this lab, we're going to manually construct a data warehouse and do some analysis of the data in the data warehouse. Our customer for this lab, American U Cruise Lines, would like us to craft a data warehouse that can easily answer the following three questions:

  1. Which port and ship brought in the most revenue (fares)?
  2. Which port in Italy has the highest fares?
  3. In 2003, which director was in charge of the highest number of cruises?

Description of Data

Our customer is a little behind the times and they do not have a database system. Instead, they do everything in Excel. One Excel workbook contains all of the information for the business. The workbook contains six worksheets: SHIP, CRUISE, PORT, VISIT, PASSENGER, and VOYAGE to keep track of ships, cruises, ports, and passengers. A “cruise” is a particular sailing of a ship on a particular date. For example, the seven-day journey of the ship Pride of Tampa that leaves on June 13, 2003, is a cruise.

Data Design

Open the workbook below to get started with this activity. Review the information in the individual worksheets to familiarize yourself with the data. See if you can figure out the connections between the data in the different worksheets. Next week, we'll work on the actual database design for this workbook. For this week, you should just start to think about the data design that is being used.

AUCruise.xls

Creating a Data Warehouse

The difference between "transactional data" and "analytical data" is not easy to explain. The best way for you to understand the difference is to create a data warehouse from a set of transactional data. To do so, follow these steps:

  1. Create a new worksheet in the existing workbook. Name it "DW". Save the workbook to your G drive or a USB drive.
  2. Start on the VOYAGE worksheet.
  3. Copy all the data (including the column headings) from VOYAGE to your new worksheet. VOYAGE will be the basis for the rest of the information in your warehouse. We'll use VOYAGE to build out the rest of the data.
  4. The data for VOYAGE includes a Passenger Number. For each row in VOYAGE (in your new DW worksheet), you are going to build out the corresponding data from PASSENGER. To do so, find the corresponding data in the PASSENGER worksheet. (For example, the first passenger is 13797. Find 13797 in the PASSENGER worksheet. Copy all the data for passenger Adkins to your new worksheet.) Be sure to separately copy the column headings for PASSENGER into the DW worksheet (e.g. PASSENGERNUM , PASSENGERNAME, SOCSECNUM, ADDRESS, PHONE).
  5. Now do the same for CRUISE. Copy the headings and the corresponding details for each CRUISE in the DW worksheet.
  6. Continue on with the same process for SHIP.
  7. Things get complicated as we bring in the corresponding VISIT data. Can you see why I say this? Can you figure out how to bring in all the corresponding VISIT data for each CRUISE?
  8. No need to do anything with PORT. We'll skip that for this activity.

HINT: You will have 22 rows in your worksheet when you are done. All 22 rows will have data in all columns (no empty cells).

Analyzing a Data Warehouse

Now that we have a "flat" data warehouse with all the related data in one place, we can start to do some analysis. We could work directly in Excel to do so, but for this course lets use an Analytics product. We're going to use Tableau. You can download a trial version of the software here. Once downloaded, launch Tableau and fill in the required information to start your trial.

Click on "Connect to Data" on the left side of the page
Select Microsoft Excel & Click OK
Browse and find your XLS or XLSX file
Leave on "single table" and choose DW & Click OK
Choose "no" for the question about creating an extract
Choose "no" for the question about remembering the connection

You are now presented with "dimensions" and "measures" for the data in your warehouse. The functionality is similar to Excel Pivot Tables (but more powerful). You can drag and drop "dimensions" or "measures" onto the "rows" and "columns" at the top of the page.

Say we want to know how many passengers visit each port.

First drag & drop PORTNAME from dimensions to COLUMNS.
Now drag PASSENGERNUM to ROWS. This chart is actually adding (SUM) the actual Passenger Number. This is meaningless. What we want is a COUNT of the Passengers.
Click on the right arrow next to PASSENGERNUM and find "Measure".
Change the measure to COUNT.

Say we wanted to dig deeper and look at passengers by port by ship. We can easily add this new dimension by dragging "SHIPNAME" to the right of "PORTNAME".

Answering the Questions

Can you figure out how to answer the questions posed at the start of this activity? They are repeated here for your convenience:

  1. Which port and ship brought in the most revenue (fares)?
  2. Which port in Italy has the highest fares?
  3. In 2003, which director was in charge of the highest number of cruises?

Turning in the Lab

To complete this lab, upload your workbook containing the DW worksheet and provide your responses to the three questions. If you have time, include the chart you created in Tableau in your submission.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License