This practice exam is intended to help you study for your upcoming exam and to get used to the
exam format on Gradescope. Please note the following important information.
Frequently Asked Questions
1. Is the format and length of the actual exam the same as the practice exams?
Not necessarily. Your actual exam will be a mix of ROUGHLY 5-10 multiple choice and 3-4
free response questions. You will have 60 minutes for the exam + 15 minutes for exam upload to
Gradescope. In addition, while the questions in this practice exam will give you a general idea
of the types of ideas that we will examine you on in the exam, the questions on the actual
exam will of course not be the same as the ones here.
2. Can I use a calculator during the exam?
No. In addition, the exam is closed-book, closed-notes, to be done individually. The only reference
that you can use is the reference sheet attached at the end of the exam.
3. Can I use the JupyterHub during the exam?
No. This exam is pen and paper”-based exam that focuses on the reasoning aspects of our
course.
4. Can I see the reference sheet ahead of the exam
Yes! It is on the last page of this practice exam pdf.
5. Can I print the reference sheet ahead of the exam
No, the reference sheet will be included in the exam, so you do not need to print them in advance.
In fact, we ask that you do not print them in advance; this is so that everyone is taking the exam
under conditions that are as similar as possible.
6. Will a solution of this sample exam be posted?
Yes, but not until Saturday March 27. The reason is to encourage you to wrestle with the exam
questions on your own and to discuss them with classmates rst.
When we’re reading the solutions before working on them on your own, it is easy to feel that
we are prepared because we are able to understand the solution. However, learning does not
happen without hands-on work where you yourself wrestle with the problems until you
really get it and are able to articulate your understanding to others. This brings us to the
last question:
7. How do I best use this sample exam to help me study?
Simply working with the sample exam is not enough. My suggestion is to start by examining how
you did in homework and labs. If there are particular ideas you struggled with, take note of them,
review them, and discuss them with your classmates, TAs, or the instructor. Keep in mind that as
you are working on these problems, you should understand why the procedure or idea makes
sense; do not simply go thorugh the motion.
Please also practice uploading your exam as you go, so that you don’t have to worry about
technical snags during the exam. Bring your work to lab and use the opportunity to seek help
from your TA. At home, try to re-do the problems that you didn’t do correctly, to see if you
truly understand them. Explaining your ideas and articulating your understanding to classmates
in study group settings will also help you gain a more solid understanding.
Best wishes on your preparations for this exam.
Core UA 107: Probability, Statistics, and Decision-Making
Exam 2 – PRACTICE EXAM
I trust that you will honor each other’s hard work and intention to learn, and that you will rise to the
challenge by upholding the highest level of academic and personal integrity during this exam. In fairness
to students who do abide by the honor code, any violation will be reported and there will be severe
consequences to any violation.
Please carefully review the exam rules and instructions below. My very best wishes to everyone in this
exam!
Q1 Exam Rules and Instructions; Statement of Academic Integrity
Q1.1 Exam Rules and Instructions
1. This exam is closed book and closed notes. You may not consult any references during
this quiz.
2. This exam is to be done individually. You may not give nor receive assistance to/from
anyone during this exam.
3. You may not use the JupyterHub, a calculator, or any other computing device/assistance
during this quiz.
4. You may not share the exam questions or your responses in any format (including
pdfs and screenshots) to anyone at anytime, including before, during, and after
the exam.
5. You have 60 minutes to complete this exam. After 60 minutes, we will require you to stop
working. You will have an additional 15 minutes to scan and upload your work.
6. Please stay on our class Zoom link and remain clearly visible on camera until you nish
submitting your exam.
Please acknowledge that you have carefully read the above exam rules and instructions
by typing your name on the provided box in Gradescope – Exam 2.
Q1.2 Statement of Academic Integrity
“I pledge that I am observing the NYU honor code and the quiz rules listed above. In particular,
I am neither giving nor receiving unauthorized assistance during this examination.”
Please type your name in the provided box in Gradescope – Exam 2 to rearm that
you are observing NYU honor code and following the exam rules listed above.
2
Q2 Multiple Choice
Q2.1 (2 points) Which of the statements below is true about the correlation coecient, r ?
A. The correlation coecient r measures the strength of any type of association between
two numerical variables.
B. If r is close to -1, the linear association between the two numerical variables is very
weak.
C. The correlation coecient r is always a number between ?1 and 1.
D. If r = 1, this means that the plot of the points lie on a line of slope 1.
E. None of the above
Solution: Option A is incorrect because r measures only the strength of linear associations
between two numerical variables. Option B is incorrect because r close to -1 means that
the linear association is very strong, and the two variables are negatively correlated (the
larger one variable is, the smaller the other one). Option D is incorrect; if r = 1, then
data points lie exactly on the line, but the line can have any positive slope.
Q2.2 (2 points) Which of the following has the smallest (most negative) value of r?
A.
B.
C.
D.
Solution: Option A has no linear association, and a best-t line will be downward-sloping.
Option B has no linear association, and a best-t line will be approximately horizontal.
Option C has a linear association, and a best-t line will be upward-sloping. Option D has
a linear association, and a best-t line will be downward-sloping. So, Option D has the
most negative value of r, because the best-t line is upward-sloping and there is a relatively
strong linear association.
Page 3
Q2.3 (2 points) Which of the following is the correct equation for the given line?
A. y = ?3x + 3
B. y = ?x + 3
C. y = 3x ? 3
D. y = 3x + 3
E. None of the above
Solution: The y-intercept is at b = 3, the line is downward-sloping, so the slope must
be negative. In particular, the line passes through the point (x1; y1) = (0; 3) and the
point (x2; y2) = (2; 0) (among others), so the slope is m = y2?y1
x2?x1
= 0?3
1?0 = ?3. So, the
equation of the line is y = mx + b: y = ?3x + 3.
Q2.4 (2 points) In the gure below, what is the Mean Square Error between the points and the
line?
A. 0
B. 6
5
C. 10
5
D. 12
5
E. None of the above
Solution: Vertical distance from the rst point (x1; y1) = (?1; 4) to the line is 2,
the vertical distance from the second point (x2; y2) = (0; 4) to the line is 1,
the vertical distance from the third point (x3; y3) = (2; 0) to the line is 0,
the vertical distance from the fourth point (x4; y4) = (2;?2) to the line is 1,
and the vertical distance from the last point (x5; y5) = (3;?4) to the line is 2.
We take the average of the sum of the square of the vertical distances between the points
and the line:
22 + 12 + 02 + 12 + 22
5
=
10
5
= 2:
Page 4
Q2.5 (2 points) Consider the following line. We can view this line as a model that captures the
pattern from the given data points. We can use the line to predict the value of the y variable
given the value of an x variable. If x = 2:5, what would be the line’s prediction of the value
of y?
A. y = ?3
B. y = ?4:5
C. y = 0:25
D. y = 4:5
E. None of the above
Solution: The point on the line with x = 2:5 has a y coordinate of y = ?4:5.
Q2.6 (2 points) Consider a coin (with two sides, Heads and Tails) that we know to be fair. Suppose
that we tossed the coin three times and the outcomes are Heads, Heads, Tails.
We decided to toss the coin one more time. Which of the following statement is true ?
A. The outcome of the next coin toss must be Tails.
B. The outcome of the next coin toss cannot be predicted with certainty but is more
likely to be Tails.
C. The outcome of the next coin toss cannot be predicted with certainty and is equally
likely to be Heads or Tails.
D. The outcome of the next coin toss cannot be predicted with certainty but is more
likely to be Heads.
E. Statements A-D are all false
Solution: Because the event is random, the next coin toss cannot be predicted with
certainty. Because the coin is fair, the next coin toss is equally likely to be Heads or Tails,
regardless of the outcomes of previous coin tosses.
Page 5
Q2.7 (2 points) Consider a coin (with two sides, Heads and Tails) that we know to be fair. Suppose
that we tossed the coin three times.
Which of the following statements is true?
A. It is equally likely to get 3 Heads as it is to get 3 Tails.
B. It is less likely to get 3 Heads than it is to get Heads, Heads, Tails (in this order)
C. It is more likely to get Heads, Heads, Tails (in this order) than it is to get Heads,
Tails, Heads (in this order).
D. It is more likely to get 2 Heads and 1 Tails than it is to get 2 Tails and 1 Heads
E. None of the above
Solution: When tossing a coin three times, there are 8 possible outcomes, and these 8
outcomes are equally likely: HHH, HHT, HTH, THH, HTT, THT, TTH, TTT.
So, the probability of 3 Heads is 1/8, which is the same as the probability of 3 Tails. (So
A is correct.)
The probability of HHT is 1/8, which is the same as the probability of HHH. (So, B is
incorrect.)
The probability of HHT is 1/8, which is the same as the probability of HTH . (S0, C is
incorrect.)
The probability of 2 Heads and 1 Tails (HTH, THH, or HHT) is 3/8. The probability of
2 Tails and 1 Heads (TTH, THT, HTT) is also 3/8. So, D is incorrect.
Q2.8 (2 points) Suppose that we toss a 6-sided die (numbered with whole numbers 1 through 6)
and a 20-sided die (numbered with whole numbers 1 through 20).
How many dierent outcomes are there?
A. 6 + 20 = 26
B. 20 ? 6 = 14
C. 6 20 = 120
D. 25
E. None of the above
Solution: There are 6 outcomes of the 6-sided die, which can be paired with any one of
the 20 outcomes of the 20-sided die. So, there are 6 20 outcomes.
1 2 3 4 : : : 19 20
1 (1; 1) (1; 2) (1; 3) (1; 4) : : : (1; 19) (1; 20)
2 (2; 1) (2; 2) (2; 3) (2; 4) : : : (2; 19) (2; 20)
3 (3; 1) (3; 2) (3; 3) (3; 4) : : : (3; 19) (3; 20)
4 (4; 1) (4; 2) (4; 3) (4; 4) : : : (4; 19) (4; 20)
5 (5; 1) (5; 2) (5; 3) (5; 4) : : : (5; 19) (5; 20)
6 (6; 1) (6; 2) (6; 3) (6; 4) : : : (6; 19) (6; 20)
Page 6
Q2.9 (2 points) Suppose that we toss a 6-sided die (numbered with whole numbers 1 through 6)
and a 20-sided die (numbered with whole numbers 1 through 20), then add the results of two
die.
When we sum up the results of tossing the two die, how many dierent outcomes are there?
A. 6 + 20 = 26
B. 20 ? 6 = 14
C. 6 20 = 120
D. 25
E. None of the above
Solution: There are 6 outcomes of the 6-sided die, which can be paired with any one of
the 20 outcomes of the 20-sided die, then added together. The smallest possible sum is
1+1 = 2; the larger possible sum is 6+20 = 26. So, the outcomes are all whole numbers
from 2 to 26, inclusive. There are 25 of these numbers.
1 2 3 4 : : : 19 20
1 2 3 4 5 : : : 20 21
2 3 4 5 6 : : : 21 22
3 4 5 6 7 : : : 22 23
4 5 6 7 8 : : : 23 24
5 6 7 8 9 : : : 24 25
6 7 8 9 10 : : : 25 26
Q2.10 (2 points) Suppose that we toss a fair 6-sided die (numbered with whole numbers 1 through
6) and a fair 20-sided die (numbered with whole numbers 1 through 20), then add the results
of the two die.
What is the probability that the sum is 8?
A. 1=25
B. 6=25
C. 8=25
D. 6=120
E. 8=120
F. None of the above
Solution: There are 6 outcomes of the 6-sided die, which can be paired with any one of
the 20 outcomes of the 20-sided die. So, there are 6 20 = 120 possible pairs of results,
and each occur with equal probability of 1=120.
There are six dierent ways to get a sum of 8 (we are listing the result of the 6-sided die
rst):
1+7, 2+6, 3+5, 4+4, 5+3, 6+2 (note that 7+1 is not a possibility since the 6-sided die
only goes up to 6).
Since there are six possible ways and each way occur with probability 1/120, then there is
a probability of 6/120 to get a sum of 8.
1 2 3 4 : : : 19 20
1 2 3 4 5 : : : 20 21
2 3 4 5 6 : : : 21 22
3 4 5 6 7 : : : 22 23
4 5 6 7 8 : : : 23 24
5 6 7 8 9 : : : 24 25
6 7 8 9 10 : : : 25 26
Page 7
Q2.11 (2 points) Which of the python commands below correctly simulate the following random
event?
The sum of a random toss of a fair 6-sided die (numbered with whole numbers 1 through 6)
and a fair 8-sided die (numbered with whole numbers 1 through 8).
A. a = np.random.choice( [1, 2, 3, 4, 5, 6] , 1 )
b = np.random.choice( [1, 2, 3, 4, 5, 6, 7, 8 ] , 1 )
outcome = a + b
B. a = np.random.choice( [1, 2, 3, 4, 5, 6] , 1 )
b = np.random.choice( [1, 2, 3, 4, 5, 6, 7, 8 ] , 1 )
outcome = a * b
C. outcome = np.random.choice( [ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] , 1 )
D. outcome = np.random.choice( [1, 2, 3, 4, 5, 6] + [ 2, 3, 4, 5, 6, 7, 8 ] , 1 )
E. None of the above
Solution: C is incorrect, because the probability the dierent sums are not all equal.
D is incorrect because adding the two lists of outcomes rst is not the same as sampling
randomly from each list.
Q2.12 (2 points) Which of the python commands below correctly simulate rolling a fair 8-sided die
1000 times?
A. np.random.choice([1, 2, 3, 4, 5, 6, 7, 8 ],1000,replace=True,p=[1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8, 1/8])
B. np.random.choice([1, 2, 3, 4, 5, 6, 7, 8 ],1000,replace=False)
C. np.random.uniform([1, 2, 3, 4, 5, 6, 7, 8 ],1000,replace=True)
D. np.random.uniform( 1, 8 , 1000, replace = True )
E. None of the above
Solution: We need to sample with replacement, so B is incorrect.
np.random.uniform generates real numbers (not just whole numbers), so C and D are
incorrect.
A is correct. It is not necessary to state p = … , so this is redundant, but correct.
Page 8
Q2.13 (2 points) Suppose that we have a data frame called employees that contains salaries of all
New York City government employees. The data frame contains a column called salary,
which contains the salary of each employee.
In this case, we have the information to nd out what the true average salary of all New York
City government employees.
However, suppose that we would like to simulate the process of sampling from a population to
learn about about the average salary of NYC government employees by doing the following:
Take one simple random sample of size 100 (i.e., randomly select the salaries of 100
employees)
Compute the sample average
Which of the python commands below correctly accomplishes the above task?
A. my_sample = np.random.choice( employees[‘salary’] , 100, replace = True )
sample_average = np.mean( my_sample )
B. my_sample = np.random.choice( employees[‘salary’] , 100, replace = False )
sample_average = np.mean( my_sample )
C. sample_average=np.random.choice(np.mean(employees[‘salary’]),100,replace=False)
D. my_sample = np.random.choice( salary , 100, replace = False )
sample_average = np.mean( my_sample )
E. None of the above
Solution: To simulate a simple random sample, we must sample without replacement.
Q2.14 (2 points) Consider Question 13. Suppose that after sampling, the sample average that we
computed in the above question is 60000. Suppose also that we also computed the sample
standard deviation, and this number is 10000.
Which of the python commands below would help us compute the 80% condence interval?
(Where LOW and HIGH tells us the low and high endpoints of this interval.)
A. LOW = scipy.stats.norm.cdf( 0 , 60000, 10000 )
HIGH = scipy.stats.norm.cdf( 0.8 , 60000, 10000 )
B. LOW = scipy.stats.norm.ppf( 0.1 , 60000, 10000 )
HIGH = scipy.stats.norm.ppf( 0.9 , 60000, 10000 )
C. LOW = scipy.stats.norm.ppf( 0.2 , 60000, 10000 )
HIGH = scipy.stats.norm.ppf( 0.8 , 60000, 10000 )
D. None of the above
Page 9
Q2.15 (2 points) Consider Question 13.
Suppose that after sampling (which might be a dierent sample than the one in Question 14),
we found that the 70% condence interval to be from 55000 to 67000.
Which of the following is a correct interpretation of this result?
A. 70% of all NYC government employees have salaries between 55000 and 67000.
B. If we repeat the sampling process many times, 70% of the times, the sample average
will fall between 55000 and 67000.
C.
D. With 70% probability, the 70% condence interval contains the true average
salary.
E. None of the above
Q2.16 (2 points) Consider Question 13.
Suppose that we repeat the sampling process 100000 many times, recorded the sample aver-
ages, and created a histogram to visualize the sampling distribution (i.e., the distribution of
the sample averages).
Which of the following statements is False?
A. The sampling distribution is approximately normal distribution regardless of the dis-
tribution of the population’s salary variable.
B. The sampling distribution is an approximately normal distribution only if the
population’s salary variable also has a normal distribution.
C. The sampling distribution is an approximately normal distribution and the peak of
the bell curve is located at a value that is close to the true average
D. None of the above
Solution: The central limit theorem tells us that the sampling distribution is approximately
normal even when the distribution of the variable itself is not normal.
Page 10
Q2.17 (2 points) Suppose that we have a random number generator that produces a random number
between 0 and 2, following a probability density function f(x) whose graph is below.
Suppose that we randomly generated one number according to the given probability distribu-
tion. Which of the following statements is true?
A. The probability that the number is between 0 and 0.5 is higher than the probability
that the number is between 1.5 and 2.
B. The probability that the number is between 0 and 0.5 is less than the prob-
ability that the number is between 0.5 and 2.
C. The probability that the number is exactly 1 is 1.
D. The probability that the number is exactly 0.5 is 0.5.
E. None of the above
Solution: The area under the graph above the given range of values is equal to the
probability that the number falls in that range.
A is incorrect, because the area under the graph between 0 and 0.5 is the same as the area
under the graph between 1.5 and 2. (Both are triangles with the same height and base.)
B is correct, because the area under the graph between 0 and 0.5 is less than the area
under the graph between 0.5 and 2.
C and D are incorrect: With a continuous probability distribution, the probability of getting
any number exactly is 0. (Why? WHat is the area of the region under the graph exactly
above each number? There is no area” because there is no width (i.e., width is 0),
therefore the probability is 0. The y-value is density, not the actual probability. The
takeaway of this question is that for a continuous probability distribution, area =
probability.)
Page 11
Q3 Free Response
Suppose that we have a random number generator that produces a random number between 0 and
2, following a probability density function f(x) whose graph is below.
What is the probability that a number between 0.5 and 1.5 is generated?
Solution: We want to nd the area under the curve between 0.5 and 1.5. There are dierent
ways to do this; you can use whichever way as long as it’s correct.
One way is to note that the shape of this region is that of a rectangle with a triangle above it.
The area of the rectangle is width height = (1:5 ? 0:5) 0:5 = 1 0:5 = 0:5.
The area of the triangle is 1
2 base height = 1
2 (1:5?0:5)(1?0:5) = 1
2 10:5 = 1=4
So, the probability is the sum of the two areas: 0:5 + 1=4 = 0:75.
A second way is to note that the total area under the curve is 1, and that the area under the
curve between 0.5 and 1.5 is the dierence between 1 and the area of the two trianges on the
sides that are not included.
The area of the left rectangle is 1
2 base height =1
2 0:5 0:5 = 0:125. The area of the
right rectangle is 1
2 base height = 1
2 0:5 0:5 = 0:125.
So, the area under the curve between 0.5 and 1.5 is
1 ? 0:125 ? 0:125
which is 0.75.
Page 12
Q4 Free Response
A (ctional) professor records student grades out of 100 and uses the following table to assign letter
grades:
Numerical Grade Letter Grade
90 Grade 100 A
85 Grade < 90 A-
80 Grade < 85 B+
70 Grade < 80 B
65 Grade < 70 C
0 Grade < 65 F
Suppose that there are 200 students in the class, and the professor observes the following facts
about the students’ grades:
The grades are essentially normally distributed
The mean is 80
The standard deviation is 5
Please estimate (1) the number of students who receive B+’s and (2) the number of students who
receive B’s.
Please include your work and reasoning. You may include sketches/pictures to help explain your
work.
Solution: We know that in a normal distribution roughly 0:68 of all data points are within
one standard deviation of the mean. Moreover, because the bell curve is symmetric around the
mean, then 0:68
2 = 0:34 of all data points are between ? and ; and 0:68
2 = 0:34 of all data
points are between and + .
So, 0:34 200 students have grades between 75 and 80, and 0:34 200 students have grades
between 80 and 85.
So, the number of students who receive B+’s (numerical grades between 80 and 85) is approx-
imately 0:34 200.
We also know that in a normal distribution, roughly 0:954 of all data points are within two
standard deviation of the mean. Moreover, because the bell curve is symmetric around the
mean, then 0:954
2 = 0:477 is between ?2 and ; and 0:954
2 = 0:477 is between and +2.
So, 0:474200 students have grades between 70 and 80, and 0:477200 students have grades
between 80 and 90.
So, the number of students who receive B’s (numerical grades between 70 and 80) is approxi-
mately 0:474 200.
Page 13
Probability, Statistics, and Decision-Making (NYU Core-UA-107)
Python Reference Sheet
Basic Functions and Commands
print( NAME ) : to print out the value stored in NAME
type( NAME ) : to find the type of value stored in NAME
Functions on Lists and Arrays
[ VALUE1, VALUE2, … ] : to put values into one list
LISTNAME[ N ] : to extract the value at index N in the list LISTNAME
len( LISTNAME ) : to find the number of entries in the list LISTNAME
max( LISTNAME ) : to find the largest value in the list LISTNAME
min( LISTNAME ) : to find the smallest value in the list LISTNAME
sum( LISTNAME ) : to find the sum of values in the list `LISTNAME’
np.mean( LISTNAME ) : to find the mean of values in the list or array `LISTNAME’
np.median( LISTNAME ) : to find the median of values in the list or array `LISTNAME’
np.std( LISTNAME ) : to find the standard deviation of values in the list or array `LISTNAME’
np.percentile( LISTNAME, P ) : to find the P th percentile of the values in the list or array `LISTNAME’
Data Frames
pd.read_csv( ‘FILENAME.csv’) : to “read” a csv file and import it as a data frame in python
pd.read_csv( ‘http://webaddress.etc’ ) : to “read” a csv file from a URL (web address) and import it as a data frame in python
pd.DataFrame( data = { ‘COLNAME1’ : LIST1, ‘COLNAME2’: LIST2, … } ) : to create a data frame (“from scratch”), where the first column is COLNAME1
and consists of values in LIST1 , etc.
DATAFRAMENAME[ ‘COLUMNNAME’ ] : the list containing all entries in the column COLUMNNAME of the data frame DATAFRAMENAME
DATAFRAMENAME.iloc[ : , COLINDEX ] : the list containing all entries in the column whose index is COLINDEX of the data frame DATAFRAMENAME
DATAFRAMENAME[ ‘COLUMNNAME’ ][ ROWINDEX ] : the entry in row index ROWINDEX and column called COLUMNNAME
DATAFRAMENAME.iloc[ ROWINDEX, COLINDEX ] : the entry in row index ROWINDEX and column index COLINDEX
DATAFRAMENAME[ CRITERIA ] : a data frame containing only rows of DATAFRAMENAME that meets the given criteria. For example
DATAFRAMENAME[ DATAFRAMENAME[‘COLUMNNAME’] == VALUE ] : a data frame containing only rows of DATAFRAMENAME where the value of COLUMNNAME is
exactly equal to VALUE
Attributes and Methods of Data Frames
DATAFRAMENAME.shape : to find the number of rows and columns of the data frame DATAFRAMENAME
DATAFRAMENAME.columns : to find the column names of the data frame DATAFRAMENAME
DATAFRAMENAME.dtypes : to find the types of data of each column of the data frame DATAFRAMENAME
DATAFRAMENAME.head() : to preview the first few rows of the data frame DATAFRAMENAME
DATAFRAMENAME.tail() : to preview the last few rows of the data frame DATAFRAMENAME
DATAFRAMENAME.sample() : to preview a random sample of rows of the data frame DATAFRAMENAME
DATAFRAMENAME.sort_values( ‘COLUMNNAME’ ) : to sort the rows of the data frame DATAFRAMENAME based on values in the column COLUMNNAME (the default
is in ascending order)
DATAFRAMENAME.sort_values( ‘COLUMNNAME’, ascending = False ) : to sort the rows of the data frame DATAFRAMENAME based on values in the column
COLUMNNAME in descending order
DATAFRAMENAME[‘COLUMNNAME’].value_counts() : for counting the frequency of each value in the COLUMNNAME column of the data frame DATAFRAMENAME
DATAFRAMENAME[‘COLUMNNAME’].describe() : to find the quartiles, mean, median, and standard deviation of values in the column COLUMNNAME in the data
frame DATAFRAMENAME
DATAFRAMENAME[‘COLUMNNAME’].describe(percentiles = LISTOFPERCENTILES ) : to find the percentiles specified in LISTOFPERCENTILES , mean, median,
and standard deviation of values in the column COLUMNNAME in the data frame DATAFRAMENAME
Data Visualization
sns.displot( data = DATAFRAMENAME, x = COLUMNNAME ) : to create a bar chart or histogram for values in column COLUMNANME of the data frame
DATAFRAMENAME . Variations:
sns.displot( data = DATAFRAMENAME, x = COLUMNNAME, discrete = True )
sns.displot( data = DATAFRAMENAME, x = COLUMNNAME, stat = ‘probability’ )
sns.displot( data = DATAFRAMENAME, x = COLUMNNAME, stat = ‘density’ )
sns.histplot( data = DATAFRAMENAME, x = COLUMNNAME ) : to create a histogram for values in column COLUMNNANME of the data frame DATAFRAMENAME
sns.boxplot( data = DATAFRAMENAME, x = COLUMNNAME ) : to create a boxplot for values in column COLUMNNAME of the data frame DATAFRAMENAME
sns.relplot( data = DATAFRAMENAME, x = COLNAME1 , y = COLNAME2 ) : to create a scatterplot for values in columns COLNAME1 and COLNAME2 of the
data frame DATAFRAMENAME
Linear Regression
sns.regplot(data = DATAFRAMENAME, x = ‘COLNAME1’, y = ‘COLNAME2’) : to visualize a linear regression line to fit data in a scatterplot for values in
columns COLNAME1 and COLNAME2 of the data frame DATAFRAMENAME
from sklearn.linear_model import LinearRegression
MODELNAME = LinearRegression().fit( X, Y )
MODELNAME.coef_ : the slope of the best fit line
MODELNAME.intercept_ : the y-intercept of the best fit line
Probability and Sampling
np.random.choice( LIST, N , replace = TRUEORFALSE, p = PROBLIST ) : randomly select N elements from the list (where TRUEORFALSE is either True
or False
np.random.uniform( LOW, HIGH, N ) : randomly generate N numbers between LOW and HIGH
scipy.stats.norm.cdf( VALUE, MEAN, STDDEV) : Computes the area under the curve to the left of VALUE , for a normal distribution with the specified MEAN
and STDDEV
scipy.stats.norm.ppf( AREA , MEAN, STDDEV) : Finds a value such that the area under the curve to the left of this value is equal to the desired AREA , for a
normal distribution with the specified MEAN and STDDEV
[non-python] Normal Distribution with mean and standard deviation
Range Area under the curve
Within one standard deviation from the mean (i.e. from ?-? to ?+?) ≅ 0.683
Within two standard deviation from the mean (i.e. from ?-2? to ?+2?) ≅ 0.954
Within three standard deviation from the mean (i.e. from ?-3? to ?+3?) ≅ 0.997
μ σ
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more