Decision Modeling/Data Analysis Practice Exam
© 2004 Prof. Stephen Huxley
The following questions are designed to illustrate
the types of questions and material that may be covered on the final exam. Material from the first part of the
course has not been included because you already have access to that on the
practice midterm exam (scroll down to the practice midterm). Pre-midterm material will account for
50 of the 350 points on the final.
You should not consider this practice exam as a definitive guide to the
real exam because in a test situation, questions always seem more difficult
even though they may have come from the same test bank. Furthermore, the
questions below are "fill-in-the-blank" type, but questions on the
exam may be multiple choice or essay.
Many professors do not give practice exams precisely because some
students think the real exam will be essentially the same as the practice exam,
then whine when it is not. Therefore, you should not stop your preparations for
the exam just because you do well on this practice exercise. The real exam may
seem more difficult – the operative concept is “travel at your own risk!” J
1. Control chart limits may be made as narrow or wide as you want by simply selecting larger or smaller sample sizes. True or False? Why or why not?
2. Give two reasons by small samples of only 3, 4, or 5 are common in real life industrial use of Xbar charts.
3. Control charts are usually based on plus or minus "3 sigma " (that is, 3 standard errors), but quality would be improved by using 2 or 1 "sigma" charts. Why aren't such charts used?
4. "Lot plots" are constructed by grouping raw data observations into a histogram and may be used to check the quality of raw materials by ____________________.
5. Once the special causes of variation in a process have been discovered and eliminated, the long-term goal of anyone managing a process will be to: ________
6-8.
The Rustbuster Chemical Co. produces paint to prevent metal corrosion
and uses statisical quality control to monitor the machine which fills the
cans. A new can size is being
introduced and the first production run (when everything was double-checked for
accuracy and adjustment) yielded an average net contents of 2 liters. The quality control manager has
specified that 5 cans be checked for weight after every 100 cans has been
produced and that "3-sigma" X-bar and Range charts be set up
accordingly. When grouped into
samples of 5 each, the average difference between the heaviest and lightest can
was .156 liters. Answer the
following questions based on this information (use the Lecture Notes for the
factors).
6. The upper control limit for the X-bar chart
would be closest to:
A. 2.58 B. 2.47 C. 2.16 D. 2.09 E. 2.00
7. If a sample mean fell above the upper control
limit, approximately what percentage of the cans would be defective in terms of
weight?
A. 5% B. 2% C. 0.3% D. cannot be calculated from data given E. none of these
8. The upper control limit for the Range chart
would be closest to:
9. A total of twenty jobs are waiting to be done by a department. If a computer could do a billion calculations per second, then how long would it take it to figure all possible sequences in which to do the jobs?
10. If minimizing maximum tardiness of any one job in a batch of jobs waiting to be done by one worker is the goal of the scheduler, then the jobs should be done according to:
11. Most common measures of productivity are based on the concept of _______________ divided by _________________. If multiple measures of both are involved, the appropriate analysis to determine efficiency is _________.
12.
If scheduling is to be done scientifically, at least three essential pieces of
information needed are needed.
Name them.
13.
Name the three sets of parameters that all linear programming
formulations typically contain:
14. The purpose of sensitivity analysis in
linear programming is to:
15-18: The Countchachange Department Store Corp. is planning to build a store in a new suburban shopping center. The manager assigned to the project wants to develop some rough guidelines for the architect who will attempt to design the building to fit specific needs as much as possible. The guidelines are to include the total number of square feet the store should contain overall, and how many square feet should be custom tailored to the requirements of each of the store's three major departments: Clothing, Jewelry, and Furniture (assume there are no other departments). The architect has indicated that construction of clothing space will cost $100 per square foot, jewelry space $300 per square foot, and furniture space $200 per square foot.
Since clothing will be the new store's merchandising specialty, the manager wants to make certain that the clothing department gets at least twice as much floor space as the other two departments combined. To maintain diversity, however, he wants each of the other departments to get at least 10 percent of the total store floor space. Total construction costs for the new store must be no more than $1 million. Past records indicates that profit contribution per square foot per day are $.80 for clothing departments, $.75 for jewelry departments, and $.60 for furniture departments.
Answer the following questions based on a linear programming formulation to determine how many square feet should be devoted to each department to maximize revenues. (Note: Xc = square feet devoted to clothing, Xj to jewelry, and Xf to furniture.)
15. What approximate form would the objective function take for this problem? ____________________________________
16. How many constraints does this problem have (excluding non-negativity restrictions)? ___________
17. Indicate the missing coefficient in the following constraint for this
problem (if it is a constraint): ___?___Xc - 2Xj - 2Xf >= 0
18. Indicate the missing coefficient in the following constraint for this
problem (if it is a constraint): -.1Xc + ___?___Xj - .1Xf >= 0
19. Customers arriving at a food counter randomly order either meal A, B,
or C with probabilities of .50, .30, and .20 respectively. Assuming you are using a table of
random integers between 00 and 99, set up a correspondence key by assigning the
appropriate random numbers for each meal.
Type Prob.
A .50
B .30
C .20
20.
What function could be used on Excel to generate the random numbers, and what
function would convert the random numbers into customer type?
21.
Using your correspondence key, use the following random numbers to simulate the
first five customers: 85, 13, 54,
41, 67 (no warm up period)
Customer
Number RN Type
1 85
2 13
3 54
4 41
5 67
22. What is the difference between Monte Carlo simulation and “deterministic” simulation?
23. If there are three independent random variables in a situation, then three separate random numbers would have to be used to use a Monte Carlo simulation approach to the problem. True or false?
24. Consider the following outcomes of a DEA analysis of the branch offices of a company:
X: All branches receive an efficiency rating of 100 percent.
Y: No branches receive an efficiency rating of 100 percent.
Z: All branches receive an efficiency rating of 100 percent but the company is still losing out compared to competitors.
A. Only X is possible.
B. Only Y is possible
C. Only Z is possible.
D. both X and Z are possible
E. both Y and Z are possible
25. In personnel scheduling, the first set of data that must be specified is the:
A. number of personnel available
B. number of personnel required during each time period
C. employee names
D. days off pattern
E. none of these
26. In classic decision analysis, the null hypothesis is always set up so as to __________. Give examples.
A firm is contemplating the strategy to pursue for next year. Strategy 1 is to introduce a revolutionary new product at a much higher price. Strategy 2 is to make moderate changes in its existing product. Strategy 3 is make no changes except the color of the package and adding the word “new.” Payoffs for each strategy will depend on what happens to the national economy. N1 denotes improvement due to a major recovery and expansion, N2 denotes little change, and N3 denotes decline due to a worsening economy. The expected consequences of each combination is given in the following payoff table:
N1 N2 N3
Strategy 1 $500,000 $100,000 -$50,000
Strategy 2 $300,000 $250,000 0
Strategy 3 $100,000 $100,000 $100,000
27. Assuming the firm adopts a cautious approach, which strategy is best?
28. Assuming the firm adopts the strategy that minimizes the maximum regret it will feel afterward, which strategy is best?
29. If the firm believes there is an equal chance of all three possibilities occurring, what strategy would they select?
30. A consultant has offered to make a forecast of the economy and guarantee the forecast by paying the firm the difference between what it would have earned without the consultant’s services and what it actually earned. What is the maximum the firm should be willing to pay for this information?
Answers:
1.
True because they are based on the standard error, which equals the population
standard deviation divided by the square root of the same size. So a larger sample size will result in
a smaller standard error and therefore narrower control limits. The probabilities of getting a point
outside the control limits remains the same, however. Larger sample sizes simply scale things down.
2. a) It is easier and cheaper to
use small samples & b) it is assumed that the population of all parts
produced by a process follow a normal distribution.
3. Because more production time
would be wasted looking for nonexistent problems due to false
"out-of-control" signals.
4. Looking for non-normal shapes,
truncations, out-of-specification data, or other types of odd patterns.
5. reduce
common causes of variation in the
process or system itself.
Until special causes are under control, it is difficult to attack the
common causes because the special causes represent a confounding influence that
hinders our ability to understand what is really needed to improve the
process. "Business process
re-engineering" is the name given to changing processes.
6.
D (Xbar + A2Rbar) = 2+.58(.156)
7.
D (An item is defective if it falls outside the specification limits from the
blue print. Without knowledge of
the specification limits, we cannot know if an item is defective or not. The specification limits are based on
the design of the product and
independent of the control limits, which are based on the inherent
variability of the production process and how many pieces are inspected in each
sample, i.e. the sample size, n.)
8.
B (D4(Rbar) = 2.11(.156)
9.
(20! calculations)/(no. of seconds in year*1 billion) = 2.4329x1018/3.1536x1016
= approximately 77 years
10. Due Date Rule
11.
Output/input; multiple input/output situations require DEA (Data Envelopment
Analysis)
12. A list of jobs to do, how long each one will
take, and the due date for each one.
13.Tthe objective
function coefficients, the left hand side constraint coefficients, and the
right hand side constraint limits.
14. Indicate
how sensitive the final solution is to changes in any of the three sets of
original parameters, assuming all else equal. Most attention is focused on the right hand side constraint
limits because the economic meaning of shadow prices, useful in evaluating the
marginal value of adding resources.
Second most important is sensitivity to changes in the objective
function coefficients. Analysis of
changes in the left hand side constraint coefficients is rare, primarily
because it represents a fundamental change in the design of the product or the
technology used to produce the product, and the “all else equal” assumption.
15-18:
LP Formulation: Max .8Xc + .75Xj + .6Xf
subject to:
100Xc + 300Xj +200Xf <= 1,000,000
Xc >= 2(Xj +Xf) or Xc - 2Xj - 2Xf >= 0
Xj >= .1(Xc + Xj +Xf ) or -.1 Xc + .9Xj - .1Xf >= 0
Xf >= .1(Xc + Xj +Xf) or -.1 Xc - .1Xj + .9Xf >= 0
Note: The solution: from Solver would be:
Sq Ft Each Xc = 6154 Xj
=769 Xf
=769
Total SqFt: 7692
Max Rev: $5,962
19. Assuming we are using a table
of RN: A – 50% - 00-49, B – 30% -
50-79, C – 20% - 80-99.
20. =RAND(), vlookup functions
21.
Number RN Type
1 85 C
2 13 A
3 54 B
4 41 A
5 97 C
22. Monte Carlo deals with random variables, deterministic simulation
does not.
23. True. Three separate correspondence tables would have to be set up
– what a pain!
24. D (All branches could be
efficient compared to each other but poor compared to outside competition. DEA measures efficiency only within the
system of branches included. If
the analyst could discover the equivalent data for the outside competitors,
then a more inclusive analysis could be achieved. But how to get the competitors to share that information
truthfully is the big challenge.)
25. B (Without a target for what
the schedule should match, you have no way to evaluate how good any schedule
is.)
26. avoid the worst possible
error. Examples:
a. In the American legal system,
the worse mistake (“Type 1 Error”) is thought to be punishing an innocent
person; the lesser mistake (“Type
2 Error”) is freeing a guilty person.
To avoid making the worse mistake, everyone is assumed innocent and will
go free unless “proven” guilty.
b. Most people carry a spare tire
because the worse mistake would be to have a flat tire without a spare. The lesser mistake is use up trunk
space to carry one when you never actually have to use it. To avoid the worse mistake, most people
carry a spare.
c. In the American pharmaceutical
industry, the worse mistake is assumed to release a medicine that does not
really work. The lesser mistake to
prevent good medicines from reaching the public. So drug companies must “prove” their medicines work before
they receive approval to make claims and sell them. Other governments feel differently.
27. Caution implies using the
maximin strategy. The minimum
payoff for each strategy is -$50,000 for Strategy 1, 0 for 2, $100,000 for 3, so
to maximize the minimum, it should choose Strategy 3.
28. Afterward, the firm will know
what it should have done. If it
chooses Strategy 1, the maximum regret it could feel is $150,000 (if either N2
or N3 happen). If it chooses
Strategy 2, the maximum regret it could feel is $200,000. If it chooses Strategy 3, the maximum
regret is $400,000. So to minimize
its maximum regret, it should choose Strategy 1.
29. If each forecast has an equal
probability of 1/3 of occurring, calculate the expected value from each
strategy by multiplying each payoff by 1/3, then pick the strategy with the
highest expected value. In this
case, both Strategies 1 and 2 have expected payoffs of $183,333, so either one
may be chosen (EMV for Strategy 3 is $100,000).
30. The firm’s expected payoff without the advice is $183,333
based on equal probabilities. If
the consultant picks N1, then the firm will select Strategy 1 and get a payoff
of $500,000. If N2, the firm will
select Strategy S2 and get a payoff of $250,000. If N3, the firm will select Strategy 3 and get
$100,000. If the firm uses the
same probabilities of each of these happening, then the expected value of this
situation with perfect information is $283,333. So the maximum the firm should be willing to pay is $283,333
- $183,333 = $100,000. (The
consultant would likely point out that the firm could lose up to $50,000 and so
would want to negotiate a fee of $283,000 – (-50,000) or $313,000.)
_______________________________________________________________________________________________________________
Decision Modeling/Data Analysis Practice
Midterm
The following questions are designed to illustrate the types of
questions and material that may be covered on the midterm exam. You should not
consider this practice midterm as a definitive guide to the real exam because
in a test situation, questions always seem more difficult even though they may
have come from the same test bank. Furthermore, the questions below are
"fill-in-the-blank" type, questions on the exam may be multiple
choice or essay. Many professors do not give practice exams precisely because
some students think the real exam will be essentially the same as the practice
exam, then whine when it is not. Therefore, you should not stop your preparations
for the exam just because you do well on this practice exercise. The real exam
may seem more difficult!
1. The essence of statistical analysis is comparison between: _____________________________________
2. Which of the following statements have something wrong with them (the
answer could be A, B, neither, or both):
A. "The correlation between a voter's religion and his or her political
party is r = .45."
B. "Since the R Squared between grades and hours studied per week is .06,
we can conclude that a one hour increase in study hours will result in a .06
increase in GPA."
3. The regression line for the relationship between X and Y is Y = 10 + 5X. A review of one actual data point for X = 9 was Y = 65. The mean of Y is 25. Compute the conceptual equivalent for the unadjusted R square for this single point.
4. The null hypothesis regarding the R square for any regression is ______________
5. Most medical experiements use the experimental design known as: ____________________________________
6. You read that Scholastic Aptitude Test scores in high school explain only 9% of the variation in students' later grades in college. Therefore, you could estimate the approximate values of the "Adjusted R Squared" = __________, the "Multiple R" = _________, and the standard error of estimate = ______________.
7. The most common form of multiple nonlinear regression equations found in economic or business research studies is ____________
8. "Multicolinearity" occurs whenever
A. any of the X variables are correlated strongly with each other
B. the Y variable is correlated strongly with only one of the X variables
included
C. the Y variable is not correlated strongly with any of the X variables
D. the R squared value is not statistically significant
E. none of these
9. According to the Central Limit Theorem, what distribution must the population have before we can assume that the means of large samples selected from that population would follow a normal distribution? What is a 'large' sample?
10. Whenever possible, the null hypothesis should be set up so as to:
A. minimize sample error
B. avoid making a Type I error
C. avoid making a Type II error
D. minimize sample size
E. none of these
Answer the next question based on the following printout:
|
SUMMARY OUTPUT |
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
Multiple R |
0.818128 |
|
|
|
|
|
|
R Square |
0.669334 |
|
|
|
|
|
|
Adjusted R Square |
RR |
|
|
|
|
|
|
Standard Error |
101.2928 |
|
|
|
|
|
|
Observations |
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
Regression |
2 |
145380.9 |
72690.47 |
7.084686 |
0.020790513 |
|
|
Residual |
7 |
71821.57 |
10260.22 |
|
|
|
|
Total |
9 |
217202.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
-515.327 |
269.0008 |
|
0.09694 |
-1151.412939 |
120.758 |
|
X Variable 1 |
33.7963 |
9.746901 |
|
0.010443 |
10.74855502 |
56.84404 |
|
X Variable 2 |
73.29268 |
50.02493 |
TT |
0.186307 |
-44.99739541 |
191.5828 |
11. From the above printout, we can estimate the value of RR to be ______________, the value of TT to be _______________
12. Does the following statement represent a logical fallacy? If so, what is the formal name of this fallacy?
"If Darwin's extrapolation is correct that random mutation and natural selection within one type of animal is the same process that would start with a single life form and lead to entirely new types of life forms, then I should see a variety of different plant and animal species. I do see a variety of species, therefore Darwin must be correct."
13. The most commonly used form of nonlinear multiple equations is the:
A. quadratic
B. power
C. cubic
D. exponential
14. Review the tables below, then answer the questions.
A recent newspaper headline reported that death rates of patients undergoing operations were higher in public hospitals than in private hospitals, namely 3 percent versus 2 percent, respectively. A statistician was hired to search for lurking variables. The first thing she did was to separate out the death rates for patients who were in good condition healthwise when they were admitted from those who were in poor condition. Do these figures suggest a "Simpson's Paradox" situation? Why or why not? Explain fully.
|
|
Good Condition |
Poor Condition |
||
|
|
Public |
Private |
Public |
Private |
|
Died |
6 |
8 |
57 |
8 |
|
Survived |
594 |
492 |
1443 |
192 |
15. Bill is a politician who wants to be reelected very much. He ratings are going down, however, and he is worried. His support is eroding quickly and his campaign director believes that unless he has over 60 percent of the vote right now, he will lose the election by the time the election takes place in three weeks. A TV campaign could help but is very expensive and his campaign funds are low. He plans to have a pollster estimate his current support to see how close he is to 60 percent.
a. What would be the two mistakes Bill could make regarding the TV campaign?
b. Which would be the worse mistake? Why?
c. Which of the following should be Bill's null hypothesis be regarding the 60 percent?
A. Ho: P <= 60%
B. Ho: P >=60%
C. Ho: P = 60%
D. Ho: P not = 60%
E. None of these
Consider the regression printout from Excel shown below (ficticious data). Answer the following five questions based on your analysis of the printout. The variables were:
Y = quantity of units produced during month t by a manufacturing firm
X1 = Inventory at end of Period t - 1
X2 = Anticipated sales in period t + 1
|
SUMMARY OUTPUT |
|
|
|
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
Multiple R |
0.970192167 |
|
|
|
|
|
R Square |
0.94127284 |
|
|
|
|
|
Adjusted R Square |
0.924493652 |
|
|
|
|
|
Standard Error |
42.68771303 |
|
|
|
|
|
Observations |
10 |
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
2 |
204446.8141 |
102223.407 |
56.09763792 |
4.90836E-05 |
|
Residual |
7 |
12755.68591 |
1822.240844 |
|
|
|
Total |
9 |
217202.5 |
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
|
|
Intercept |
82.67643636 |
612.7538014 |
0.134926028 |
0.89646812 |
|
|
X Variable 1 |
482.9224421 |
46.47003698 |
10.39212519 |
1.65934E-05 |
|
|
X Variable 2 |
-680.6792932 |
102.0377531 |
-6.670857331 |
0.000284983 |
|
16. Does the regression equation make sense in terms of the coefficients? Why or why not?
17. Is the quantified impact of the variables significant?
18. Is the R squared value significantly different from zero?
19. Is the equation a good fit?
20. What non-linear equational form would make sense in this case?
*******************
21. If a constant is multiplied times the X variable values in a simple linear regression, what will be altered in the regression results?
22. In forecasting, the "time series analysis approach" is
based on the idea of:
A. searching for repeating patterns
B. determining the independent variables that cause the Y to behave the way it
does
C. eliminating non-significant explanatory variables
D. qualitative analysis
E. None of these
23. Name the four fundamental factors associated with the time series analysis approach to forecasting and indicate the two most easily estimated:
24. Name the two general categories of forecast error: a. ___________ b. _________
25. The most intuitive measure of forecast error in terms of magnitude is:
Back to the topAnswers:
1.what our theory leads us to expect the data to show and what the
data actually show.
2. Both - A is wrong because correlations can be calculated only between
variables that are numerical in nature, whereas political party affiliation is
an attribute. B is wrong because the speaker is confusing R Square with the
coefficient in the regression analysis
3. Since the predicted value = 10 + 5(9) = 55 or 30 above average, and the
actual value = 65 or 40 above the average. Thus, 30/40 = 75% of Y's variation
is explained by the regression.
4. R Squared = 0
5. randomized double blind experiment.
6. R Squared = .09, Multiple R = .30, Standard Error of Estimate cannot be
calculated from the information given.
7. The power form: Y = aX1^b1X2^b2X3^b3... where X1^b1 indicates X1 raised to
the power of b1.
8. A
9. any distribution, regardless of its shape. A "large" sample is
generally considered to be any randomly selected sample that has 30 or more
observations (although some scientists prefer at least 125 - the bottom line is
the larger the better).
10. B - It is a matter of logic to set up the null hypothesis so as to avoid
the worst error because you will take action based on believing that the null
is true unless the data indicate otherwise. The classic example is law, where
we think it is worse to convict an innocent person than free a guilty one, so
we believe everyone is innocent and will free them unless the evidence shows
'beyond the shadow of a doubt' that the person is guilty.
11. the value of the Adjusted R Square RR cannot be calculated directly be
we know its value will be a few percentage points less than the unadjusted R
Square of .669334, so it is likely to be about 60 percent. The value of the
"t stat" TT can be calculated directly as the coefficient divided by
the Standard Error (of the Coefficient) = 73.29268/50.02493 = 1.465123
12. Yes, it is fallacious reasoning. The formal name is "Affirming the
Consequent." Its formal structure is "If A then B. B therefore
A." It ignores the possibility of other factors, C or D or... that could
also lead to B. In this case, there are other explanations for the diversity of
life equally plausible to Darwin's theory.
13. B (all variables must be transformed to their logarythmic values first)
14. The news report says 3% died in public hospitals vs. 2% in private. Compare these to the percentages that died in each when the lurking variable of initial patient condition is separated out:
|
|
Good Condition |
Poor Condition |
||
|
|
Public |
Private |
Public |
Private |
|
Died |
6 |
8 |
57 |
8 |
|
Survived |
594 |
492 |
1443 |
192 |
|
Total: |
600 |
500 |
1500 |
200 |
|
As Percent: |
1% |
1.6% |
3.8% |
4% |
Note that when separated out, a smaller percentage of public patients die. Since this is a reversal of the total, Simpson's paradox is present.
15.a.
1. To spend the money when he should not (because he will win anyway).
2. Not to spend the money when he should (because he will lose unless he spends it).
b. Worse mistake is not spend it when he should because he wants to be elected rather than save money.
c. A. Null hypothesis: P <= 60% (he will therefore spend the money)
16. No - theoretically, the coefficient should be negative for X1 (produce less when inventories are high) and positive for X2 (higher future sales should cause us to produce more).
17. Yes - very puzzling. Why should a variable that has the wrong sign theoretically be statistically significant?
18. Yes - ditto
19. Yes - Why should the fit be so good given that the variables are opposite what theory says they ought to be? Has someone punched in the wrong data?
20. Nothing obvious - there is no reason to choose one form over another, so use the simplest, namely the linear form.
21. The b coefficients will be altered, but not the Adusted R square, Significant F, or P Values.
22. A
23. Trend, Cycle, Seasonality, Randomness; Trend and Seasonality are most easily estimated.
24. magnitude and bias
25. MAPE (Mean Absolute Percentage Error, commonly expressed as "plus or minus 3 percent")
Back to the top