SQL Lab w/Sam's Club

Activity 1: How much do you remember?

Let's start off with a little quiz! (Don't worry, it won't be graded by me.) Take the 20 question quiz below. Skip the questions that cover material outside of what you did in ITEC-200. Questions to skip are 3, 4, 5, 13, and 16 to 19.

http://www.w3schools.com/quiztest/quiztest.asp?qtest=SQL

Activity 2: The Sam's Club Database

This week and next, we're going to use a "snapshot" of the data from a production system used by Sam's Club. Following is a description of the database:

Sam's Club, a division of Wal-Mart Stores, Inc., is a warehouse club that specializes in selling to small businesses. A membership-based store, Sam's Club offers goods and services for consumers and business owners as well as affordable luxury merchandise. Sam's Club keeps prices low by selling merchandise in bulk and at very low profit margins.

The Sam's Club Database contains retail sales information gathered from sales at Sam's Club stores. The process used to gather this information begins with a Sam's Club member gathering all of the items they intend to purchase during the current visit to Sam's Club. The member then proceeds to a register to check out. A Sam's Club associate scans the member's Sam's Club card, at which point a visit number (visit_nbr) is generated and stored in the store_visits table. The associate proceeds by scanning each item with a barcode reader. When all of the items have been scanned, summary information about each individual type of product (i.e. 6 packages of soap) purchased during that visit is recorded in the item_scan table. When payment is tendered for items purchased on that visit, summary information for the total order (transaction time & date, amount spent, number of unique items purchased, etc) is recorded in the store_visits table. Other tables are used to store information about stores, products, and members.

This retail sales information in the UA_SAMSCLUB database was provided to the Walton College of Business by Wal-Mart Stores, Inc. The database contains sales transaction data for 18 Sam's Club stores between January 1 and January 31, 2000. The UA_SAMSCLUB database consists of 6 tables with more than 55 million rows populated and ready for use.

Print this document to start your exploration of the Sam's Club Database

Activity 3: Write SQL

We'll use "SQL Assistant" at the University of Arkansas to write SQL against the Sam's Club database. To login, go here:

http://enterprise.waltoncollege.uark.edu/sqlassistantweb12.0/

Click on "Walton College Teradata"
Enter the Username and Password from this spreadsheet
For "default database" use "UA_SAMSCLUB"
Click "ok"

I will provide a quick demonstration the SQL Assistant during the lab. Basically, you write the SQL in the box that says "query". Once you've completed your SQL, click on the third button that says "execute" when you mouse over it. For this lab, you'll be learning about this database by writing queries.

NOTE: Turn in this lab by cutting/pasting the SQL and the answers to each of the 15 questions into a Word document.

Let's start with the basics:

1. How many stores are there?
2. How many store visits?
3. How many members?
4. How many items?
5. How many item scans?

Hint: Use COUNT(*) for these queries

Now we'll dig a little deeper to under the data:

6. How many stores have store visits? In other words, out of the total number of stores, how many actually have STORE_VISITS? For this query, you will only work with the STORE_VISITS table. You'll still use COUNT but now you'll also use DISTINCT for uniqueness.
7. How many members have store visits? Use the same technique from the previous question.
8. How many different items were sold during this period? You'll use ITEM_SCAN for this query and use the same technique as the previous two questions.

Let's build a more complex query one step at a time:

9. First, let's find (count) only "Advantage" members in the MEMBER_INDEX. "Advantage" members have a code of "V" for member_type.
10. Now let's find (count) the number of STORE_VISITS by MEMBERS who are "Advantage" members. This is your first "join". You will be bringing together data from two tables, STORE_VISITS and MEMBERSHIP_INDEX, to answer this question.
11. Add more criteria to restrict your results. Use the same query from the previous question but now only look for purchases that took place at Store 20.
12. One more criteria and we are there. We only want transactions that took place early in January 2000. We are only interested in shoppers spending their holiday money between Jan 1st and Jan 10th. Hint: Use BETWEEN '20000101' and '20000110'
13. Instead of just using "COUNT" for your query, add up the amount of money spent by these members. Hint: Use SUM of Total_Visit_Amt
14. Now we'll build a separate query that we'll ultimately bring together with this one. Let's start by finding only items that are "Electronics". Start by counting the number of items in the "Electronics" category (category_nbr of 5).
15. Can you figure out how to bring these two queries together to find the total amount of money spent by 'Advantage' members between Jan 1 and Jan 10 of 2000 in Store 20 on Electronics?

Turn in this lab by cutting/pasting the SQL and the answers to each of the 15 questions into a Word document.

Important: Next week, we'll be using this same database and we'll be looking at what happens "behind the scenes" to execute these queries. If you are confused by any of these queries, please take time over the next week to understand the SQL. Next week will make no sense if you don't understand the SQL we are using.

The Sam's Club Data Warehouse

Before we leave, let's take a closer look at the Sam's Club database. Under "UA_SAMSCLUB" you can expand by clicking the "+". Expand the "tables" under UA_SAMSCLUB. Notice the last table called "WAREHOUSE". Expand "WAREHOUSE" and look at "columns". How does this "warehouse" compare to the data warehouse we created in the lab last week? Be prepared to discuss the Sam's Club data warehouse at the start of our lab next week.

Extra Credit Opportunity

Earlier in this class, we've talked about the importance of "good data" and the problems of "bad data". For an extra 50 points on this lab (in other words, the lab will be worth 150 instead of 100), write a paper providing an analysis of "bad data" in the Sam's Club database. Use the table titled "Attribute Details (Metadata)" in conjunction with writing queries to uncover inconsistencies, inaccuracies, and other issues.

For full extra credit, also look closely at the database design (entity-relationship diagram) to uncover possible integrity issues (e.g. data in one place doesn't match data in another place). For example, the table "STORE_VISITS" has a field called "TOT_SCAN_CNT". This field should contain a count of the items from ITEM_SCAN. In other words, it's a field that was calculated and placed in the STORE_VISITS table. Is it possible for this value to be wrong? What happens if a row is deleted from ITEM_SCAN? Who makes sure that STORE_VISITS is updated to reflect this deletion? What happens if the two don't match? Find other examples like this and provide your best guess as to the reason the Sam's Club database team included these fields in their database design.

This Extra Credit is due by the start of class next week.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License