Oracle HCM Cloud Placements and Training
Data Science

Data Science Interview Questions

Pinterest LinkedIn Tumblr

IQ Stream Technologies is one of the top quality Data Science training in Bangalore with highly experienced and skilled professional trainers delivering advanced data science and python training with online practical lessons. Also, explore Python and Machine Learning interview questions here.

Predictive Modeling Questions:
1. (Given a Dataset) Analyze this dataset and give me a model that can predict this response variable.
2. What could be some issues if the distribution of the test data is significantly different than the distribution of the training data?
3. What are some ways I can make my model more robust to outliers?
4. What are some differences you would expect in a model that minimizes squared error, versus a model that minimizes absolute error? In which cases would each error metric be appropriate?
5. What error metric would you use to evaluate how good a binary classifier is? What if the classes are imbalanced? What if there are more than 2 groups?
6. What are various ways to predict a binary response variable? Can you compare two of them and tell me when one would be more appropriate? What’s the difference between these? (SVM, Logistic Regression, Naive Bayes, Decision Tree, etc.)
7. What is regularization and where might it be helpful? What is an example of using regularization in a model?
8. Why might it be preferable to include fewer predictors over many?
9. Given training data on tweets and their retweets, how would you predict the number of retweets of a given tweet after 7 days after only observing 2 days worth of data?
10. How could you collect and analyze data to use social media to predict the weather?
11. How would you construct a feed to show relevant content for a site that involves user interactions with items?
12. How would you design the people you may know feature on LinkedIn or Facebook?
13. How would you predict who someone may want to send a Snapchat or Gmail to?
14. How would you suggest to a franchise where to open a new store?
15. In a search engine, given partial data on what the user has typed, how would you predict the user’s eventual search query?
16. Given a database of all previous alumni donations to your university, how would you predict which recent alumni are most likely to donate?
17. You’re Uber and you want to design a heatmap to recommend to drivers where to wait for a passenger. How would you approach this?
18. How would you build a model to predict a March Madness bracket?
19. You want to run a regression to predict the probability of a flight delay, but there are flights with delays of up to
20. hours that are really messing up your model. How can you address this?

Programming Questions:
1. Write a function to calculate all possible assignment vectors of 2n users, where n users are assigned to group 0 (control), and n users are assigned to group 1 (treatment).
2. Given a list of tweets, determine the top 10 most used hashtags.
3. Program an algorithm to find the best approximate solution to the knapsack problem1 in a given time.
4. Program an algorithm to find the best approximate solution to the travelling salesman problem2 in a given time.
5. You have a stream of data coming in of size n, but you don’t know what n is ahead of time. Write an algorithm that will take a random sample of k elements. Can you write one that takes O(k) space?
6. Write an algorithm that can calculate the square root of a number.
7. Given a list of numbers, can you return the outliers?
8. When can parallelism make your algorithms run faster? When could it make your algorithms run slower?
9. What are the different types of joins? What are the differences between them?
10. Why might a join on a subquery be slow? How might you speed it up?
11. Describe the difference between primary keys and foreign keys in a SQL database.
12. Given a COURSES table with columns course_id and course_name, a FACULTY table with columns faculty_id and faculty_name, and a COURSE_FACULTY table with columns faculty_id and course_id, how would you return a list of faculty who teach a course given the name of a course?
13. Given a IMPRESSIONS table with ad_id, click (an indicator that the ad was clicked), and date, write a SQL query that will tell me the click-through-rate of each ad by month.
14. Write a query that returns the name of each department and a count of the number of employees in each:
EMPLOYEES containing: Emp_ID (Primary key) and Emp_Name
EMPLOYEE_DEPT containing: Emp_ID (Foreign key) and Dept_
ID (Foreign key)
DEPTS containing: Dept_ID (Primary key) and Dept_Name

Probability Questions:

1. Bobo the amoeba has a 25%, 25%, and 50% chance of producing 0, 1, or 2 offspring, respectively. Each of Bobo’s descendants also have the same probabilities. What is the probability that Bobo’s lineage dies out?
2. In any 15-minute interval, there is a 20% probability that you will see at least one shooting star. What is the probability that you see at least one shooting star in the period of an hour?
3. How can you generate a random number between 1 – 7 with only a die?
4. How can you get a fair coin toss if someone hands you a coin that is weighted to come up heads more often than tails?
5. You have an 50-50 mixture of two normal distributions with the same standard deviation. How far apart do the means need to be in order for this distribution to be bimodal?
6. Given draws from a normal distribution with known parameters, how can you simulate draws from a uniform distribution?
7. A certain couple tells you that they have two children, at least one of which is a girl. What is the probability that they have two girls?
8. You have a group of couples that decide to have children until they have their first girl, after which they stop having children. What is the expected gender ratio of the children that are born? What is the expected number of children each couple will have?
9. How many ways can you split 12 people into 3 teams of 4?
10. Your hash function assigns each object to a number between 1:10, each with equal probability. With 10 objects, what is the probability of a hash collision? What is the expected number of hash collisions? What is the expected number of hashes that are unused.
11. You call 2 UberX’s and 3 Lyfts. If the time that each takes to reach you is IID, what is the probability that all the Lyfts arrive first? What is the probability that all the UberX’s arrive first?
12. I write a program should print out all the numbers from 1 to 300, but prints out Fizz instead if the number is divisible by 3, Buzz instead if the number is divisible by 5, and FizzBuzz if the number is divisible by 3 and 5. What is the total number of numbers that is either Fizzed, Buzzed, or FizzBuzzed?
13. On a dating site, users can select 5 out of 24 adjectives to describe themselves. A match is declared between two users if they match on at least 4 adjectives. If Alice and Bob randomly pick adjectives, what is the probability that they form a match?
14. A lazy high school senior types up application and envelopes to n different colleges, but puts the applications randomly into the envelopes. What is the expected number of applications that went to the right college
15. Let’s say you have a very tall father. On average, what would you expect the height of his son to be? Taller, equal, or shorter? What if you had a very short father?
16. What’s the expected number of coin flips until you get two heads in a row? What’s the expected number of coin flips until you get two tails in a row?

Statistical Inference Questions:

1. In an A/B test, how can you check if assignment to the various buckets was truly random?
2. What might be the benefits of running an A/A test, where you have two buckets who are exposed to the exact same product?
3. What would be the hazards of letting users sneak a peek at the other bucket in an A/B test?
4. What would be some issues if blogs decide to cover one of your experimental groups?
5. How would you conduct an A/B test on an opt-in feature?
6. How would you run an A/B test for many variants, say 20 or more?
7. How would you run an A/B test if the observations are extremely right-skewed?
8. I have two different experiments that both change the sign-up button to my website. I want to test them at the same time. What kinds of things should I keep in mind?
9. What is a p-value? What is the difference between type-1 and type-2 error?
10. You are AirBnB and you want to test the hypothesis that a greater number of photographs increases the chances that a buyer selects the listing. How would you test this hypothesis?
11. How would you design an experiment to determine the impact of latency on user engagement?
12. What is maximum likelihood estimation? Could there be any case where it doesn’t exist?

Data Analysis Questions:

1. (Given a Dataset) Analyze this dataset and tell me what you can learn from it.
2. What is R2?
What are some other metrics that could be better than R2 and why?
3. What is the curse of dimensionality?
4. Is more data always better?
5. What are advantages of plotting your data before performing analysis?
6. How can you make sure that you don’t analyze something that ends up meaningless?
7. What is the role of trial and error in data analysis? What is the the role of making a hypothesis before diving in?
8. How can you determine which features are the most important in your model?
9. How do you deal with some of your predictors being missing?
10. You have several variables that are positively correlated with your response, and you think combining all of the variables could give you a good prediction of your response. However, you see that in the multiple linear regression, one of the weights on the predictors is negative.
What could be the issue?
11. Let’s say you’re given an unfeasible amount of predictors in a predictive modeling task. What are some ways to make the prediction more feasible?
12. Now you have a feasible amount of predictors, but you’re fairly sure that you don’t need all of them. How would you perform feature selection on the dataset?

13. Your linear regression didn’t run and communicates that there are an infinite number of best estimates for the regression coefficients. What could be wrong?
14. You run your regression on different subsets of your data, and find that in each subset, the beta value for a certain variable varies wildly. What could be the issue here?
15. What is the main idea behind ensemble learning? If I had many different models that predicted the same response variable, what might I want to do to incorporate all of the models? Would you expect this to perform better than an individual model or worse?
16. Given that you have wifi data in your office, how would you determine which rooms and areas are underutilized and overutilized?
17. How could you use GPS data from a car to determine the quality of a driver?
18. Given accelerometer, altitude, and fuel usage data from a car, how would you determine the optimum acceleration pattern to drive over hills?
19. Given position data of NBA players in a season’s games, how would you evaluate a basketball player’s defensive ability?
20. How would you quantify the influence of a Twitter user?
21. Given location data of golf balls in games, how would construct a model that can advise golfers where to aim?
22. You have 100 mathletes and 100 math problems. Each mathlete gets to choose 10 problems to solve. Given data on who got what problem correct, how would you rank the problems in terms of difficulty?

For more:

Comments are closed.

Enquiry Now
close slider


    Call Now ButtonCall Now