Decision Modeling/Data Analysis Practice Exam

© 2004 Prof. Stephen Huxley

The following questions are designed to illustrate the types of questions and material that may be covered on the final exam.  Material from the first part of the course has not been included because you already have access to that on the practice midterm exam (scroll down to the practice midterm).  Pre-midterm material will account for 50 of the 350 points on the final.

You should not consider this practice exam as a definitive guide to the real exam because in a test situation, questions always seem more difficult even though they may have come from the same test bank. Furthermore, the questions below are "fill-in-the-blank" type, but questions on the exam may be multiple choice or essay.

Many professors do not give practice exams precisely because some students think the real exam will be essentially the same as the practice exam, then whine when it is not. Therefore, you should not stop your preparations for the exam just because you do well on this practice exercise. The real exam may seem more difficult – the operative concept is “travel at your own risk!” J

1. Control chart limits may be made as narrow or wide as you want by simply selecting larger or smaller sample sizes.  True or False?  Why or why not?

 

2. Give two reasons by small samples of only 3, 4, or 5 are common in real life industrial use of Xbar charts.

 

3. Control charts are usually based on plus or minus "3 sigma " (that is, 3 standard errors), but quality would be improved by using 2 or 1 "sigma" charts.  Why aren't such charts used?

 

4. "Lot plots" are constructed by grouping raw data observations into a histogram and may be used to check the quality of raw materials by  ____________________.

 

5. Once the special causes of variation in a process have been discovered and eliminated, the long-term goal of anyone managing a process will be to:  ________

 

6-8.  The Rustbuster Chemical Co. produces paint to prevent metal corrosion and uses statisical quality control to monitor the machine which fills the cans.  A new can size is being introduced and the first production run (when everything was double-checked for accuracy and adjustment) yielded an average net contents of 2 liters.  The quality control manager has specified that 5 cans be checked for weight after every 100 cans has been produced and that "3-sigma" X-bar and Range charts be set up accordingly.  When grouped into samples of 5 each, the average difference between the heaviest and lightest can was .156 liters.  Answer the following questions based on this information (use the Lecture Notes for the factors).

 

6. The upper control limit for the X-bar chart would be closest to:

A. 2.58     B. 2.47     C. 2.16     D. 2.09     E. 2.00

 

7. If a sample mean fell above the upper control limit, approximately what percentage of the cans would be defective in terms of weight?

A. 5%     B. 2%     C. 0.3%     D. cannot be calculated from data given     E. none of these

 

8. The upper control limit for the Range chart would be closest to:

A. .16      B. .33      C. .47      D. .50      E. .57

 

9. A total of twenty jobs are waiting to be done by a department. If a computer could do a billion calculations per second, then how long would it take it to figure all possible sequences in which to do the jobs?

 

10. If minimizing maximum tardiness of any one job in a batch of jobs waiting to be done by one worker is the goal of the scheduler, then the jobs should be done according to:

11. Most common measures of productivity are based on the concept of _______________ divided by _________________.  If multiple measures of both are involved, the appropriate analysis to determine efficiency is _________.

12. If scheduling is to be done scientifically, at least three essential pieces of information needed are needed.  Name them.

 

13. Name the three sets of parameters that all linear programming formulations typically contain:

 

14. The purpose of sensitivity analysis in linear programming is to:

15-18:  The Countchachange Department Store Corp. is planning to build a store in a new suburban shopping center. The manager assigned to the project wants to develop some rough guidelines for the architect who will attempt to design the building to fit specific needs as much as possible. The guidelines are to include the total number of square feet the store should contain overall, and how many square feet should be custom tailored to the requirements of each of the store's three major departments: Clothing, Jewelry, and Furniture (assume there are no other departments). The architect has indicated that construction of clothing space will cost $100 per square foot, jewelry space $300 per square foot, and furniture space $200 per square foot.

Since clothing will be the new store's merchandising specialty, the manager wants to make certain that the clothing department gets at least twice as much floor space as the other two departments combined. To maintain diversity, however, he wants each of the other departments to get at least 10 percent of the total store floor space. Total construction costs for the new store must be no more than $1 million. Past records indicates that profit contribution per square foot per day are $.80 for clothing departments, $.75 for jewelry departments, and $.60 for furniture departments.

Answer the following questions based on a linear programming formulation to determine how many square feet should be devoted to each department to maximize revenues. (Note: Xc = square feet devoted to clothing, Xj to jewelry, and Xf to furniture.)

15. What approximate form would the objective function take for this problem?   ____________________________________

16. How many constraints does this problem have (excluding non-negativity restrictions)? ___________

17. Indicate the missing coefficient in the following constraint for this problem (if it is a constraint):      ___?___Xc - 2Xj - 2Xf >= 0

18. Indicate the missing coefficient in the following constraint for this problem (if it is a constraint):      -.1Xc + ___?___Xj - .1Xf >= 0

19. Customers arriving at a food counter randomly order either meal A, B, or C with probabilities of .50, .30, and .20 respectively.  Assuming you are using a table of random integers between 00 and 99, set up a correspondence key by assigning the appropriate random numbers for each meal.

Type    Prob.

A         .50

B          .30

C         .20

 

20. What function could be used on Excel to generate the random numbers, and what function would convert the random numbers into customer type?

 

21. Using your correspondence key, use the following random numbers to simulate the first five customers:  85, 13, 54, 41, 67 (no warm up period)

Customer

Number            RN       Type

1                      85

2                      13

3                      54

4                      41

5                      67

22. What is the difference between Monte Carlo simulation and “deterministic” simulation?

23.  If there are three independent random variables in a situation, then three separate random numbers would have to be used to use a Monte Carlo simulation approach to the problem.  True or false?

24. Consider the following outcomes of a DEA analysis of the branch offices of a company: 

X:  All branches receive an efficiency rating of 100 percent.

Y:  No branches receive an efficiency rating of 100 percent.

Z:  All branches receive an efficiency rating of 100 percent but the company is still losing out compared to competitors.

A. Only X is possible.

B. Only Y is possible

C. Only Z is possible.

D. both X and Z are possible

E. both Y and Z are possible

 

25. In personnel scheduling, the first set of data that must be specified is the:

A. number of personnel available

B. number of personnel required during each time period

C. employee names

D. days off pattern

E. none of these

26. In classic decision analysis, the null hypothesis is always set up so as to __________.  Give examples.

A firm is contemplating the strategy to pursue for next year.  Strategy 1 is to introduce a revolutionary new product at a much higher price.  Strategy 2 is to make moderate changes in its existing product.  Strategy 3 is make no changes except the color of the package and adding the word “new.”  Payoffs for each strategy will depend on what happens to the national economy.  N1 denotes improvement due to a major recovery and expansion, N2 denotes little change, and N3 denotes decline due to a worsening economy.  The expected consequences of each combination is given in the following payoff table:

N1                   N2                   N3                  

Strategy 1         $500,000         $100,000         -$50,000

Strategy 2         $300,000         $250,000         0

Strategy 3         $100,000         $100,000         $100,000

 

27. Assuming the firm adopts a cautious approach, which strategy is best?

 

28.  Assuming the firm adopts the strategy that minimizes the maximum regret it will feel afterward, which strategy is best?

 

29. If the firm believes there is an equal chance of all three possibilities occurring, what strategy would they select?

 

30.  A consultant has offered to make a forecast of the economy and guarantee the forecast by paying the firm the difference between what it would have earned without the consultant’s services and what it actually earned.  What is the maximum the firm should be willing to pay for this information?

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Answers: 

1. True because they are based on the standard error, which equals the population standard deviation divided by the square root of the same size.  So a larger sample size will result in a smaller standard error and therefore narrower control limits.  The probabilities of getting a point outside the control limits remains the same, however.  Larger sample sizes simply scale things down.

 

2. a) It is easier and cheaper to use small samples & b) it is assumed that the population of all parts produced by a process follow a normal distribution.

 

3. Because more production time would be wasted looking for nonexistent problems due to false "out-of-control" signals.           

 

4. Looking for non-normal shapes, truncations, out-of-specification data, or other types of odd patterns.

 

5. reduce common causes of variation in the  process or system itself.  Until special causes are under control, it is difficult to attack the common causes because the special causes represent a confounding influence that hinders our ability to understand what is really needed to improve the process.  "Business process re-engineering" is the name given to changing processes.

 

6. D (Xbar + A2Rbar) = 2+.58(.156)

 

7. D (An item is defective if it falls outside the specification limits from the blue print.  Without knowledge of the specification limits, we cannot know if an item is defective or not.  The specification limits are based on the design of the product and  independent of the control limits, which are based on the inherent variability of the production process and how many pieces are inspected in each sample, i.e. the sample size, n.)

 

8. B (D4(Rbar) = 2.11(.156)  

 

9. (20! calculations)/(no. of seconds in year*1 billion) = 2.4329x1018/3.1536x1016 = approximately 77 years

 

10.  Due Date Rule

 

11. Output/input; multiple input/output situations require DEA (Data Envelopment Analysis)       

 

12. A list of jobs to do, how long each one will take, and the due date for each one.

 

13.Tthe objective function coefficients, the left hand side constraint coefficients, and the right hand side constraint limits.

 

14. Indicate how sensitive the final solution is to changes in any of the three sets of original parameters, assuming all else equal.  Most attention is focused on the right hand side constraint limits because the economic meaning of shadow prices, useful in evaluating the marginal value of adding resources.  Second most important is sensitivity to changes in the objective function coefficients.  Analysis of changes in the left hand side constraint coefficients is rare, primarily because it represents a fundamental change in the design of the product or the technology used to produce the product, and the “all else equal” assumption.

 

15-18:

LP Formulation: Max .8Xc + .75Xj + .6Xf
subject to:
100Xc + 300Xj +200Xf <= 1,000,000
Xc >= 2(Xj +Xf) or Xc - 2Xj - 2Xf >= 0
Xj >= .1(Xc + Xj +Xf ) or -.1 Xc + .9Xj - .1Xf >= 0
Xf >= .1(Xc + Xj +Xf) or -.1 Xc - .1Xj + .9Xf >= 0

 
Note: The solution: from Solver would be:
Sq Ft Each Xc = 6154            Xj =769           Xf =769

Total SqFt: 7692        

Max Rev: $5,962

19.  Assuming we are using a table of RN:  A – 50% - 00-49, B – 30% - 50-79, C – 20% - 80-99.

 

20. =RAND(), vlookup functions

 

21.

Number           RN       Type

1                      85        C

2                      13        A

3                      54        B

4                      41        A

5                      97        C

 

22. Monte Carlo deals with random variables, deterministic simulation does not.

 

23. True.  Three separate correspondence tables would have to be set up – what a pain!

 

24. D (All branches could be efficient compared to each other but poor compared to outside competition.  DEA measures efficiency only within the system of branches included.  If the analyst could discover the equivalent data for the outside competitors, then a more inclusive analysis could be achieved.  But how to get the competitors to share that information truthfully is the big challenge.)

 

25. B (Without a target for what the schedule should match, you have no way to evaluate how good any schedule is.)

 

26. avoid the worst possible error.  Examples: 

a. In the American legal system, the worse mistake (“Type 1 Error”) is thought to be punishing an innocent person;  the lesser mistake (“Type 2 Error”) is freeing a guilty person.  To avoid making the worse mistake, everyone is assumed innocent and will go free unless “proven” guilty.

b. Most people carry a spare tire because the worse mistake would be to have a flat tire without a spare.  The lesser mistake is use up trunk space to carry one when you never actually have to use it.  To avoid the worse mistake, most people carry a spare.

c. In the American pharmaceutical industry, the worse mistake is assumed to release a medicine that does not really work.  The lesser mistake to prevent good medicines from reaching the public.  So drug companies must “prove” their medicines work before they receive approval to make claims and sell them.  Other governments feel differently.

 

27. Caution implies using the maximin strategy.  The minimum payoff for each strategy is -$50,000 for Strategy 1, 0 for 2, $100,000 for 3, so to maximize the minimum, it should choose Strategy 3.

 

28. Afterward, the firm will know what it should have done.  If it chooses Strategy 1, the maximum regret it could feel is $150,000 (if either N2 or N3 happen).  If it chooses Strategy 2, the maximum regret it could feel is $200,000.  If it chooses Strategy 3, the maximum regret is $400,000.  So to minimize its maximum regret, it should choose Strategy 1.

 

29. If each forecast has an equal probability of 1/3 of occurring, calculate the expected value from each strategy by multiplying each payoff by 1/3, then pick the strategy with the highest expected value.  In this case, both Strategies 1 and 2 have expected payoffs of $183,333, so either one may be chosen (EMV for Strategy 3 is $100,000).

 

30.  The firm’s expected payoff without the advice is $183,333 based on equal probabilities.  If the consultant picks N1, then the firm will select Strategy 1 and get a payoff of $500,000.  If N2, the firm will select Strategy S2 and get a payoff of $250,000.  If N3, the firm will select Strategy 3 and get $100,000.  If the firm uses the same probabilities of each of these happening, then the expected value of this situation with perfect information is $283,333.  So the maximum the firm should be willing to pay is $283,333 - $183,333 = $100,000.  (The consultant would likely point out that the firm could lose up to $50,000 and so would want to negotiate a fee of $283,000 – (-50,000) or $313,000.) 

 

 

 

_______________________________________________________________________________________________________________

Decision Modeling/Data Analysis Practice Midterm
The following questions are designed to illustrate the types of questions and material that may be covered on the midterm exam. You should not consider this practice midterm as a definitive guide to the real exam because in a test situation, questions always seem more difficult even though they may have come from the same test bank. Furthermore, the questions below are "fill-in-the-blank" type, questions on the exam may be multiple choice or essay. Many professors do not give practice exams precisely because some students think the real exam will be essentially the same as the practice exam, then whine when it is not. Therefore, you should not stop your preparations for the exam just because you do well on this practice exercise. The real exam may seem more difficult!

 

1. The essence of statistical analysis is comparison between: _____________________________________

2. Which of the following statements have something wrong with them (the answer could be A, B, neither, or both):
A. "The correlation between a voter's religion and his or her political party is r = .45."
B. "Since the R Squared between grades and hours studied per week is .06, we can conclude that a one hour increase in study hours will result in a .06 increase in GPA."

3. The regression line for the relationship between X and Y is Y = 10 + 5X. A review of one actual data point for X = 9 was Y = 65. The mean of Y is 25. Compute the conceptual equivalent for the unadjusted R square for this single point.

4. The null hypothesis regarding the R square for any regression is ______________

5. Most medical experiements use the experimental design known as: ____________________________________

6. You read that Scholastic Aptitude Test scores in high school explain only 9% of the variation in students' later grades in college. Therefore, you could estimate the approximate values of the "Adjusted R Squared" = __________, the "Multiple R" = _________, and the standard error of estimate = ______________.

7. The most common form of multiple nonlinear regression equations found in economic or business research studies is ____________

8. "Multicolinearity" occurs whenever
A. any of the X variables are correlated strongly with each other
B. the Y variable is correlated strongly with only one of the X variables included
C. the Y variable is not correlated strongly with any of the X variables
D. the R squared value is not statistically significant
E. none of these

9. According to the Central Limit Theorem, what distribution must the population have before we can assume that the means of large samples selected from that population would follow a normal distribution?  What is a 'large' sample?

10. Whenever possible, the null hypothesis should be set up so as to:
A. minimize sample error
B. avoid making a Type I error
C. avoid making a Type II error
D. minimize sample size
E. none of these

Answer the next question based on the following printout:

SUMMARY OUTPUT

 

 

 

 

 

 

Regression Statistics

 

 

 

 

 

 

Multiple R

0.818128

 

 

 

 

 

R Square

0.669334

 

 

 

 

 

Adjusted R Square

RR

 

 

 

 

 

Standard Error

101.2928

 

 

 

 

 

Observations

10

 

 

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

 

df

SS

MS

F

Significance F

 

Regression

2

145380.9

72690.47

7.084686

0.020790513

 

Residual

7

71821.57

10260.22

 

 

 

Total

9

217202.5

 

 

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

-515.327

269.0008

 

0.09694

-1151.412939

120.758

X Variable 1

33.7963

9.746901

 

0.010443

10.74855502

56.84404

X Variable 2

73.29268

50.02493

TT

0.186307

-44.99739541

191.5828

11. From the above printout, we can estimate the value of RR to be ______________, the value of TT to be _______________

12. Does the following statement represent a logical fallacy? If so, what is the formal name of this fallacy?

"If Darwin's extrapolation is correct that random mutation and natural selection within one type of animal is the same process that would start with a single life form and lead to entirely new types of life forms, then I should see a variety of different plant and animal species. I do see a variety of species, therefore Darwin must be correct."

13. The most commonly used form of nonlinear multiple equations is the:

A. quadratic

B. power

C. cubic

D. exponential

14. Review the tables below, then answer the questions.

A recent newspaper headline reported that death rates of patients undergoing operations were higher in public hospitals than in private hospitals, namely 3 percent versus 2 percent, respectively. A statistician was hired to search for lurking variables. The first thing she did was to separate out the death rates for patients who were in good condition healthwise when they were admitted from those who were in poor condition. Do these figures suggest a "Simpson's Paradox" situation? Why or why not? Explain fully.

 

 Good Condition

 Poor Condition

 

 Public

 Private

 Public

 Private

 Died

 6

 8

 57

 8

 Survived

 594

 492

 1443

 192

15. Bill is a politician who wants to be reelected very much. He ratings are going down, however, and he is worried. His support is eroding quickly and his campaign director believes that unless he has over 60 percent of the vote right now, he will lose the election by the time the election takes place in three weeks. A TV campaign could help but is very expensive and his campaign funds are low. He plans to have a pollster estimate his current support to see how close he is to 60 percent.

a. What would be the two mistakes Bill could make regarding the TV campaign?

b. Which would be the worse mistake? Why?

c. Which of the following should be Bill's null hypothesis be regarding the 60 percent?

A. Ho: P <= 60%

B. Ho: P >=60%

C. Ho: P = 60%

D. Ho: P not = 60%

E. None of these

Consider the regression printout from Excel shown below (ficticious data). Answer the following five questions based on your analysis of the printout. The variables were:

Y = quantity of units produced during month t by a manufacturing firm

X1 = Inventory at end of Period t - 1

X2 = Anticipated sales in period t + 1

SUMMARY OUTPUT

 

 

 

 

 

 

 

 

 

 

 

Regression Statistics

 

 

 

 

 

Multiple R

0.970192167

 

 

 

 

R Square

0.94127284

 

 

 

 

Adjusted R Square

0.924493652

 

 

 

 

Standard Error

42.68771303

 

 

 

 

Observations

10

 

 

 

 

 

 

 

 

 

 

ANOVA

 

 

 

 

 

 

df

SS

MS

F

Significance F

Regression

2

204446.8141

102223.407

56.09763792

4.90836E-05

Residual

7

12755.68591

1822.240844

 

 

Total

9

217202.5

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

82.67643636

612.7538014

0.134926028

0.89646812

 

X Variable 1

482.9224421

46.47003698

10.39212519

1.65934E-05

 

X Variable 2

-680.6792932

102.0377531

-6.670857331

0.000284983

 

16. Does the regression equation make sense in terms of the coefficients? Why or why not?

17. Is the quantified impact of the variables significant?

18. Is the R squared value significantly different from zero?

19. Is the equation a good fit?

20. What non-linear equational form would make sense in this case?

*******************

21. If a constant is multiplied times the X variable values in a simple linear regression, what will be altered in the regression results?

22. In forecasting, the "time series analysis approach" is based on the idea of:
A. searching for repeating patterns
B. determining the independent variables that cause the Y to behave the way it does
C. eliminating non-significant explanatory variables
D. qualitative analysis
E. None of these

23. Name the four fundamental factors associated with the time series analysis approach to forecasting and indicate the two most easily estimated:

24. Name the two general categories of forecast error: a. ___________ b. _________

25. The most intuitive measure of forecast error in terms of magnitude is:

Back to the top

Answers:
1.what our theory leads us to expect the data to show and what the data actually show.
2. Both - A is wrong because correlations can be calculated only between variables that are numerical in nature, whereas political party affiliation is an attribute. B is wrong because the speaker is confusing R Square with the coefficient in the regression analysis
3. Since the predicted value = 10 + 5(9) = 55 or 30 above average, and the actual value = 65 or 40 above the average. Thus, 30/40 = 75% of Y's variation is explained by the regression.
4. R Squared = 0
5. randomized double blind experiment.
6. R Squared = .09, Multiple R = .30, Standard Error of Estimate cannot be calculated from the information given.
7. The power form: Y = aX1^b1X2^b2X3^b3... where X1^b1 indicates X1 raised to the power of b1.
8. A
9. any distribution, regardless of its shape. A "large" sample is generally considered to be any randomly selected sample that has 30 or more observations (although some scientists prefer at least 125 - the bottom line is the larger the better).
10. B - It is a matter of logic to set up the null hypothesis so as to avoid the worst error because you will take action based on believing that the null is true unless the data indicate otherwise. The classic example is law, where we think it is worse to convict an innocent person than free a guilty one, so we believe everyone is innocent and will free them unless the evidence shows 'beyond the shadow of a doubt' that the person is guilty.

11. the value of the Adjusted R Square RR cannot be calculated directly be we know its value will be a few percentage points less than the unadjusted R Square of .669334, so it is likely to be about 60 percent. The value of the "t stat" TT can be calculated directly as the coefficient divided by the Standard Error (of the Coefficient) = 73.29268/50.02493 = 1.465123
12. Yes, it is fallacious reasoning. The formal name is "Affirming the Consequent." Its formal structure is "If A then B. B therefore A." It ignores the possibility of other factors, C or D or... that could also lead to B. In this case, there are other explanations for the diversity of life equally plausible to Darwin's theory.

13. B (all variables must be transformed to their logarythmic values first)

14. The news report says 3% died in public hospitals vs. 2% in private. Compare these to the percentages that died in each when the lurking variable of initial patient condition is separated out:

 

 Good Condition

 Poor Condition

 

 Public

 Private

 Public

 Private

 Died

 6

 8

 57

 8

 Survived

 594

 492

 1443

 192

 Total:

 600

 500

 1500

 200

 As Percent:

 1%

 1.6%

 3.8%

 4%

Note that when separated out, a smaller percentage of public patients die. Since this is a reversal of the total, Simpson's paradox is present.

15.a.

1. To spend the money when he should not (because he will win anyway).

2. Not to spend the money when he should (because he will lose unless he spends it).

b. Worse mistake is not spend it when he should because he wants to be elected rather than save money.

c. A. Null hypothesis: P <= 60% (he will therefore spend the money)

16. No - theoretically, the coefficient should be negative for X1 (produce less when inventories are high) and positive for X2 (higher future sales should cause us to produce more).

17. Yes - very puzzling. Why should a variable that has the wrong sign theoretically be statistically significant?

18. Yes - ditto

19. Yes - Why should the fit be so good given that the variables are opposite what theory says they ought to be? Has someone punched in the wrong data?

20. Nothing obvious - there is no reason to choose one form over another, so use the simplest, namely the linear form.

21. The b coefficients will be altered, but not the Adusted R square, Significant F, or P Values.

22. A

23. Trend, Cycle, Seasonality, Randomness; Trend and Seasonality are most easily estimated.

24. magnitude and bias

25. MAPE (Mean Absolute Percentage Error, commonly expressed as "plus or minus 3 percent")

Back to the top