Terms in this set (120)
In an agile approach of analytics what is the first step of the process?
Perform business discovery
In an ETL process, data is loaded into a final target database such as:
Data warehouse
What are the four types of data analytical method?
Descriptive, explanatory, predictive and prescriptive
Which of the following is an example of secondary data?
Firm's proprietary data
Which of the following data analysis models use optimization techniques?
Prescriptive analytics
Predictive analytics may be applied to __________, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance.
Prescriptive analytics
Target is examining their online sales data during the pandemic to understand what happened. Which kind of analytical technique are they using?
Descriptive analytics
Costco wants to know how to stock their warehouses for a future pandemic and are using current sales data to help them project the needs. Which kind of analytical technique are they using?
Predictive analytics
Your professor is considering purchasing a self-driving car that can figure out the best route and the optimum safe way to drive there without human intervention. What kind of analytics is the car using to do this?
Prescriptive analytics
Which of the following question(s) can be better answered using data in order to reach an evidence-based conclusion?
All of the answer selections are correct. Who will win the NBA championship? What is the purchase pattern(s) of our customers? How many students will enroll for an online class in the Spring?
In for a chart to minimize graphical complexity, the data-ink ratio must be:
close to 1
Which of the following violates the principle of data visualization?
The data-ink ratio should be higher than 1
When the lie-factor of a graphical chart is more than 1,
the size of the effect shown in the graph is bigger than the actual effect in the data.
What best describes the nature of a rose diagram?
Plots data using a circular historical plot
Which of the following statements is a reason not to use a table for data visualization?
Tables cannot easily show trends
Deleting the grid lines in a chart
Increases the data-ink ratio
Which of the following statement(s) about charts is false?
None of the other answers are false. A chart should minimize graphical complexity. A chart should tell a story. A chart should have graphical integrity.
In order for a chart to have graphical integrity, the lie factor must be:
close to 1
Which are useful principles for data visualization?
The graph suggests a possible true effect
Which of the following statement(s) about charts is true?
Data ink can sometimes help tell a richer story
According to statistical notation, what does ∑ stand for?
to act as a summation operator
The difference between the first and third quartiles is referred to as the ____
interquartile range
Which of the following describes the standard deviation?
It is the square root of the variance.
Which of the following describes a positively skewed histogram?
a histogram that tails off towards the right
Which of the following is true for a median?
For an even number of observations, the median is the mean of the two middle numbers
Which of the following is an example of a measure of dispersion?
variance
Standard deviation of a normal data distribution is a ______
measure of data dispersion
What are the three principles of describing data?
Center, spread and shape
The ________ is the observation that occurs most frequently.
mode
Which of the following is an example of a sample?
The number of IT employees out of all employees working in an office of Google
Which of the following is a difference between the t-distribution and the standard normal (z) distribution?
The t-distribution has a larger variance than the standard normal distribution
The central limit theorem states that even if the population is not normally distributed, the
distribution of the sample mean will still be normal when the sample size is large
Which of the following is a continuous random variable?
The time to complete a specific task
Which of the following proposition describes an existing theory or belief?
Null hypothesis
When sample size increases
Confidence interval decreases
The WPC Sports Company has noted that the size of individual "customer order" is normally distributed with a mean of $100 and standard deviation of $12. If a soccer team of 16 players were to make the next batch of orders, what would be the standard error of the mean?
sigma/sqrt(n) = 12/sqrt(16) = 12/4 = 3
You are collecting data via an online survey to improve education standard at ASU. Which of the following methods will not result in data collection bias?
Anonymously data collection by hiding ASU brand in the survey question.
Which of the following is a Type-I error?
The null hypothesis is actually true, but the hypothesis test incorrectly rejects it
In order to reject the null hypothesis, the p-value must be less than the
Alpha
What is the confidence level when the level of significance is 0.07?
0.930
The unexplained variance in the regression analysis is also known as:
Residual variance
A correlation coefficient between "college entrance exam" grades and scholastic achievement was found to be -1.08. On the basis of this, you would tell the university that:
They should hire a new statistician.
What would be the null hypothesis for testing a linear regression model with profit as the dependent variable and sales as the independent variable?
There is no linear relationship between profit and sales.
Which of the following assumptions is not true for multiple linear regression?
There will be a multi-collinearity effect.
Which of the following is true about multi-collinearity?
It is measured using a measure called variance inflation factor (VIF).
A manager wishes to predict the annual cost (y) of an automobile based on the number of miles (x) driven. The following model was developed: y = $1500 + 0.36x. If a car is driven 15000 miles in a year, the model predicts the annual cost of the car to be:
$6900
Which of the following statement is true based on the following regression equation?IQ = 4.0 + Reading Label * 5.6
A unit point change in reading label will increase IQ by 5.6 point.
The correlation coefficient between the age of a vehicle and the money spent to repair it is 0.9. Which of the following statement is true?
81% of the variation in the money spent on repairs is explained by the age of the vehicle
A market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income, and household neighborhood (urban, suburban, and rural). The "neighborhood" variable in this model is ________
an independent variable
The value of R-Squared always falls between ________ and ________, inclusive.
0 and 1
In logistic regression analysis, instead of Y as a dependent variable, we use a function of Y called ________.
Logit
Logistic regression is a specialized type of regression analysis that is designed to predict ________ variables.
a binary categorical
A loan officer wants to know if the next customer is likely to default or not on a loan. How can she assess the risk of extending the loan to that customer?
By utilizing a multiple logistic regression model developed by an in-house analyst
Odds ratio is defined as ________, where p is the probability of success.
p/1-p
If you want to find out if body weight, calorie intake, fat intake and age have an influence on the probability of having a heart attack (yes or no), which of the following kind of analysis will help determine the answer?
Multiple logistic regression
The ________ is often used to describe the performance of a classification model applied to a set of test data for which the true outcomes are known.
Confusion matrix
In classification analysis, we typically split the data into two mutually exclusive sets, known as ________, to investigate the strength of the developed model.
Training and validation/testing
In classification problems, the primary source for accuracy estimation of the model is ________
Confusion matrix
In logistic regression, the dependent variable y is defined as:
Log (p/1-p)
In classification analysis, we are determining the probability of an observation ________
To be part of a certain class or not
Which of the following is a definition of distance between two clusters in a complete linkage clustering?
The distance between the most distant pair of objects, one from each group
Which of the following category of data mining you would use for Spam filtering of emails?
Supervised
Which of the following is a step of agglomerative hierarchical clustering?
By joining two clusters that are closest to each other
Which of the following is true of hierarchical clustering?
The data partition does not occur in a single step
In a cluster analysis, the distance between the clusters should be:
Maximized
Which of the following is true about k-means clustering
We choose the value for k before doing the clustering analysis
In the Target story discussed in the lecture, why did Target send the teen daughter maternity ads?
Target analytics model suggested she was pregnant based on her buying habit
Which of the following is a false statement?
To predict sales from transactional data one should perform clustering analysis.
Which of the following statements below is false about supervised/unsupervised data analysis?
Data is not labeled for supervised analysis
Which of the following is not an application of clustering analysis?
Crime prediction analysis
Extract function in ETL reads data from
specified source database
Which of the following is not a requirement for an ETL architecture?
data quality
The final stage of an ETL process is:
Load
Which of the following is not one of the processes involved in data cleaning?
Encrypting
Which of the following is an ETL vendor?
Teradata
In data extraction process for an ETL tool, which of the following is not an example of legit data source?
Competitions' data
One of the processes in ETL is
Load
Data transformation involves
data splitting and aggregation
In loading phase of an ETL tool, the transformed data gets loaded into an end target usually the _______.
Data warehouse
Which of the following is not a standard practice in "Data Transformation" process of an ETL tool?
Data extraction from ERP
The SQL code to extract only first_name information for all records of the "Actor" table below is:
SELECT first_name FROM Actor;
"Google Doc" is an example of _______ in a could computing environment.
SaaS
_______ ensures that related data exist in parent table before allowing an entry into a child table.
Referential integrity
Which of the following tools help in periodic managerial decision-making?
OLAP
You are creating a database to store temperature and wind data from various airport. Which of the following fields is the most likely candidate to use as the basis for a Primary Key in the Airport Table?
Airport Code
Which of the following is a cloud service provider?
VMWare
When you are asked to design a database for the airline ticket reservation system, based on an Entity-Relationship Data model, which of the following could be an example of "entity"?
Traveler
Which of the following is an important task of a database management system?
Provides support such as performing maintenance and routine backups.
Which of the following is not a component of the relational database?
Analysis
When you access information from two different tables connected by an identifier key, the SQL keyword you should use is _______.
INNER JOIN
A/B testing can help marketers to
All of the answers are correct. Increase more clicks to their website. Increase more likes to their social media sites. Increase more sales.
An experiment is said to be double-blinded if _________
neither the subject nor those working with the subject is aware of who is being given which treatment
The first step for any kind of A/B testing is
to develop a test plan for what you want to test.
After factoring out the effect of other variables known to affect SAT, such as socioeconomic status, researchers found that music students had a higher SAT score than non-music students. This is an example of __________
Observational Study
A sample study is mostly done
to estimate the parameters of the population.
Regular consumption of organic food will keep you in a good mood. In this example, the confounder could be
Money
Which of the following is true about A/B testing?
To increase conversion rate of your website traffic, A/B testing can be beneficial.
In the experimental design example "IQ Water", students are called _______.
experimental units
Which of the following statements is NOT true about experimental studies to compare two treatments?
It is not easy to control uncertainties in the comparison.
A _______________ is a relationship between two variables that appear to have interdependence or association with each other but actually do not.
spurious correlation
Over-reliant on the first piece of information is called ____________
Anchoring bias
Gamblers' fallacy is ____________.
a clustering illusion
When you keep eating the food you don't like precisely because you already bought the food, you are committing _____________.
sunk-cost fallacy
Which of the following statements is true?
Experimentation is a way of analytical thinking
Which of the following biases cannot be categorized as a cognitive bias?
None of the answer selections are correct. Groupthink. Anchoring Bias. Sunk cost fallacy.
A person who is convinced he is gaining admission to Harvard by merely applying is suffering from:
Overconfidence
When you buy a new car, you value it more than the price you paid because of:
Endowment effect bias
Which of the following is not a drawback of analytical decision making?
None of the answer selections are correct. Delayed action. Lack of flexibility. Frustration in teams.
You bought a top of the line laptop because your friends were so enthusiastic about theirs. Which kind of bias is in action here?
Bandwagon effect
What kinds of bias could show up when collecting data?
All of the answer selections are correct. Self-selection bias. Sampling bias. Framing effect.
Which of the following statements below is true about supervised/unsupervised machine learning?
Supervised learning require labeled data for training
In developing spam filter algorithms, we need
Labeled data of both spam and non-spam emails
Artificial Intelligence _______
Is a broad science of mimicking human abilities
AI is not embraced everywhere in every industry because _______
It can be operationally expensive
An ideal machine learning process needs
All other answer are true. Highly granular data. Extremely diverse data. Large volume of data.
Which of the following is an example of unsupervised machine learning?
Clustering
Which of the following is an example of association rule learning?
How frequently an item set occurs in a transaction
Which of the following statements is not true about artificial neural networks
In the hidden layer of the networks, input data is hidden
Which of the following examples is not an application of AI?
Predicting the exam score by scanning the appropriate text book
Which of the following techniques is a modern update of artificial neural networks?
Deep learning
