Several previous postings and conversations on this forum, e.g. [1-7], are concerned with numerical errors or accuracy issues in Google Spreadsheets. The posting  is mine and at first I only repeated observations made in earlier conversations that had become closed for further replies. The initial posting in  concerned the cumulative normal distribution function, NORMDIST, but then every other day I would look at another function in the Google Spreadsheets program and find further errors or problems, which I then reported on that same conversation. The conversation  has meanwhile become a bit a unwieldy and I like to condense my observations together in this fresh posting.
Errors in the cumulative normal distribution function NORMDIST and NORMSDIST were identified by @malemi in . Note that NORMDIST takes four arguments conventionally denoted (number, mean, stdev, C). Argument C is an indicator; C=0 for the distribution function and C=1 for the cumulative distribution function. Arguments mean and stdev specify a shift and a scale. I concentrate on the case of normalized arguments, mean=0 and stdev=1, and the cumulative distribution function. For that case we have NORMSDIST as a short-hand: NORMSDIST(x) = NORMDIST(x,0,1,1).
The implementation of NORMDIST (NORMSDIST) is very seriously wrong. The function returns a negative value for x=-5 and a value greater than 1 for x=5; it is oscillatory about 0 for x.le.-5 and oscillatory about 1 for x.ge.5. As analyzed under  near 2012-03-31 it is not simply that an inaccurate approximation is used within the code; rather there seems to be a gross coding error. Different approximations are used on different intervals and I conjecture that the interval identification got messed up somehow.
The cumulative lognormal distribution, LOGNORMDIST, is simply related to the cumulative normal distribution function: lognormdist(x,0,1) = normdist(ln(x),0,1,1), = normsdist(ln(x)). This appears to be how it is computed in the Google Spreadsheets program and therefore LOGNORMDIST has the same accuracy and sign issues as does NORMDIST or NORMSDIST. For the cumulative distribution we find (see , around 2012-04-14):
=LOGNORMDIST(145,0,1) returns 1.00000003243862 (a number larger than 1)
=LOGNORMDIST(0.0069,0,1) returns -0.00000002866487 (a number less than 0)
Based on the earlier experience with NORMDIST we can expect that the cumulative function LOGNORMDIST(x,0,1) is at best of single precision accuracy and it deteriorates to basically garbage for approximately x<0.01 while 1-LOGNORMDIST(x,0,1) turns to garbage for approximately x>100.
The discrete probability distributions POISSON, BINOMDIST and HYPGEOMDIST are all messed up in a similar way, unrelated to the NORMDIST issue. The problems were identified in , [3-5] and under  around 2011-11-19 with a further look on 2012-04-14. The root of the issue is that the mathematical definition of these distributions involves a product of factorials and power and exponential functions. For example:
POISSON(n,x,0) = pow(x,n)*exp(-x)/n!
BINOMDIST(k,n,p,0) = pow(p,k)*pow(1-p,n-k)*n!/(k!*(n-k)!)
The final argument is 0 or 1; 0 for the probability and 1 for the cumulative probability and the issue affects both the density and the cumulative distribution function. The expression for HYPGEOMDIST is a bit messier, but there too there is an argument n that represents the total population and a factor n! occurs in the mathematical expression.
For quite modest values of the arguments the individual factors can overflow or underflow when standard floating point arithmetic is used, even when the desired final result is a number that is not very small. If the mathematical expressions are employed factor by factor in standard IEEE 64-bit floating point arithmetic then one should expect overflow for n! whenever n.gt.170. It is unreasonable to calculate the functions this way; there should be nothing amiss with invoking POISSON or BINOMDIST or HYPGEOMDIST for population sizes very much larger than 170. In fact, in Google Spreadsheets the actual limit is even less than 170; it is 136, and this appears to be due to the use of Stirling’s approximation factor by factor to compute the factorial and to some other issues. Anyone interested in the gory details can find it explored under .
The inverse of the cumulative normal distribution function is implemented in the Google Spreadsheets function CRITBINOM. An error in the documentation was noted in . In a reply there and also under  I noted that the implementation is affected by the same overflow errors identified for BINOMDIST and other discrete probabilities.
Variance, correlation and regression functions
The calculation the various variance (or standard deviation) functions in Google Spreadsheets is inaccurate in a manner that was fixed in EXCEL with the 2003 release. This error (inaccurate computation) affects the functions VAR, VARA, VARP, VARPA, DVAR, DVARP and the functions STDEV, STDEVA, STDEVP, STDEVPA, DSTDEV and DSTDEVP. I’ll explain it here only for STDEV, following the posting under  of 2011-11-15.
GDocs spreadsheets computes stdev(1e8,1e8+1)=0. This is wrong. The numbers 1e8 and 1e8+1 are exactly representable and the computed stdev should be invariant under a common shift. The correct and expected result is therefore the same as that of stdev(0,1), which is 0.707106781... I point out that LibreOffice Calc delivers the correct answer even for much larger shift away from the origin. Apparently in GDocs spreadsheets the standard deviation of N data points x[0..N-1] is being calculated as
It is mathematically unobjectionable and numerically unsound.
Various functions for correlation and regression are similarly inaccurate due to (one may assume) a failure to shift the arguments to the vicinity of 0 before doing the linear algebra. Under , in postings between 2011-11-17 and 2011-11-23, I investigate LINEST, CORREL, TREND, LOGEST and GROWTH and identify similar errors in all.
The complementary error function is closely related to NORMSDIST and Google Spreadsheets offers ERFC. They might have expressed one in terms of the other according to the identity NORMSDIST(-x) = ERFC(x/sqrt(2))/2 but this was not done. At first (around 2012-03-31 under ) I assumed that ERFC would just be the system-supplied erfc, but it turns out not to be the case. The Google Spreadsheets function ERFC is messed up in its own way as explored under  around 2012-04-21.
For argument 5.922 and larger ERFC returns 0. It appears that the computation of ERFC in Google Spreadsheets employs the identity ERFC(x)+ERF(x)=2 the wrong way around; they compute ERF(x) (using a hidden implementation of ERF) where the result value is close to 2 and then compute ERFC(x) = 2-ERF(x).
Near the origin the Google Spreadsheets function ERFC has another problem; it is basically of single precision accuracy and it fails the mathematical property that erfc(0)=1. Witness:
=ERFC(0) returns 1.00000003
In addition, for no reason that I can imagine, ERFC(x) returns #NUM! for x.lt.0. The mathematical function erfc is perfectly well defined for x.lt.0.
Google Spreadsheets computes =FACT(136) to be 3.66e232 while =FACT(137) retuns #NUM!. But the overflow limit for IEEE 64-bit arithmetic is about 1.7977e308. There should be no problem with =FACT(137). Indeed, in LibreOffice Calc we compute =FACT(170) to be 7.2574e306 and =FACT(171) returns #NUM! as is reasonable. The issue is explored under  around 2011-11-24, where I discuss also some other strange behavior near the overflow limit. In addition to the Stirling approximation issue it appears that the effective overflow limit in Google Spreadsheets is a factor 1e15 lower than the IEEE 64-bit overflow limit. (I regard this as a curiosity and not as a major issue.)
Google Spreadsheets seems generally to use approximations of less than full accuracy. Under  (2011-11-24) I look at the log-gamma function, GAMMALN, and I find a loss of about 7 bits relative to the implementation in LibreOffice Calc (which appears to offer full 64-bit accuracy).
In Google Spreadsheets the multinomial (MULTINOMIAL) is computed as a ratio of factorials, which again results in gratuitous overflow. MULTINOMIAL should be computed as a product of binomials using a recursion in the number of arguments. See the discussion under  around 2011-11-17.
The binomial coefficients have their own unreasonable overflow failure: =COMBIN(1e8,2) fails, as does =COMBIN(5e15,1). This is obviously less severe than the problem with the use of factorials in the discrete probability functions.
Elementary functions and documentation
The MROUND function is described as follows [*]: "MROUND(number, multiple): The result is the nearest integer multiple of the number." There is not much that can be done with this except apply common sense to decide that probably it is meant that the value of MROUND(x,y) is x rounded to the nearest integer multiple of y. Indeed =MROUND(2,5) returns 0 and =MROUND(3,5) returns 5. But why on earth should =MROUND(-2,5) return #NUM!?
The ROUNDDOWN function is described as follows: "ROUNDDOWN(number, count): Rounds the given number. Count (optional) is the number of digits to be rounded down to. If the count parameter is negative, only the whole number portion is rounded. It is rounded to the place indicated by the count." For ROUNDUP we read: "ROUNDUP(number, count): Rounds the given number up. Count (optional) is the number of digits to which rounding up is to be done. If the count parameter is negative, only the whole number portion is rounded. It is rounded to the place indicated by the count." I would not have guessed from that description that for the case of negative first argument rounddown rounds up and roundup rounds down.
=ROUNDDOWN(-0.5,0) returns 0
=ROUNDUP(-0.5,0) returns -1.
Finally one further description [*]. "CEILING(number, significance, mode): Rounds the given number to the nearest integer or multiple of significance. Significance is the value to whose multiple of ten the value is to be rounded up (.01, .1, 1, 10, etc.). Mode is an optional value. If it is indicated and non-zero and if the number and significance are negative, rounding up is carried out based on that value." They might as well say: "Something having to do with rounding, try it out to see what it does". The FLOOR function has the same issue.
Functions that still need investigation
I am not aware of any close look at the random number generator that is used in Google Spreadsheets; functions RAND and RANDBETWEEN.
It could be interesting to see how the various errors in the statistical and mathematical functions propagate into the financial analysis functions or in various financial analysis templates that have been built upon Google Spreadsheets.
As I wrote under , I think that the Google Docs and Spreadsheets team is making a big mistake with their casual attitude towards these errors in the accuracy of the Spreadsheets product. Until a few years ago Microsoft Excel was the standard awful example that would be invoked to demonstrate that spreadsheets could not be used for serious computation [9-14]. Inter alia such articles offered, I think, the implication that Microsoft Corp is not worthy of respect from scientists and engineers.
It appears to me that in the past few years Microsoft has risen quite a bit in reputation among scientists and engineers and Google is losing its privileged status in public perception. Public recognition that the numerical behavior of Google Spreadsheets is inferior (severely inferior, in fact) in 2011 (now 2012) to that of Microsoft Excel in 1997 will not help Google's standing. And the errors in Google Spreadsheets have been noticed outside this forum. Ref.  by Keeling and Pavur is relevant; in their abstract they write “Google Docs spreadsheet, while convenient, has deficiencies and should not be used for scientific statistical analysis.” (Their article appeared on-line 2012-01-24, otherwise I might have noticed it when I started on .) Probably Ref.  by McCullough and Yalta is relevant too, but all that I have been able to see of it to-date is the title.
I conclude with an observation that might be the topic of further development, e.g. with regard to the nature of present-day computer science education. It appears to be possible in the years 2007-2012 for a highly respected company that is able to attract the supposedly highest qualified engineers to develop a spreadsheet product without any concern for the numerical quality of the product and without any of their engineers (computer scientists, one assumes) noticing the gross faults in the numerical implementation. I find it disturbing.
 (2010-03-24) BINOMDIST not working ?
 (2010-10-14) NORMDIST throws negative value
 (2010-10-14) BINOMDIST does not accept TRIALS~140
 (2011-02-12) binomdist arbitrarily limits maximum argument - please correct function or help file to be consistent
 (2011-04-01) Statistical functions binomdist and hygeomdist give overflow messages with reasonable parameter values
 (2011-09-01) NORMDIST throws negative value STILL AFTER 1 YEAR!!!
 (2011-11-13) normdist throws negative value still
 (2012-03-15) Description of CRITBINOM function in Spreadsheet is incorrect.
 L. Knüsel: “On the Accuracy of Statistical Distributions in Microsoft Excel 97”, Computational Statistics & Data Analysis Vol. 26 (1998) pp. 375-377.
 A. Talha Yalta: The Accuracy of Statistical Distributions in Microsoft ® Excel 2007. Working Paper No 10-06 TOBB University of Economics and Technology Department of Economics (Ankara, Turkey).
(See also , but I don't have convenient access to the full text there)
 B.D. McCullough and David A. Heiser: "On the accuracy of statistical procedures in Microsoft Excel 2007"
 Leo Knüsel: "On the accuracy of statistical distributions in Microsoft Excel 2003"
 A. Talha Yalta: "The accuracy of statistical distributions in Microsoft® Excel 2007"
 M. G. Almiron et al.: "On the Numerical Accuracy of Spreadsheets"
 Kellie B. Keeling and Robert J. Pavur: “Statistical Accuracy of Spreadsheet Software”, The American Statistician, Vol. 65 (2011) pp. 265-273.
 A. Talha Yalta and B. D. McCullough, "Spreadsheets in the Cloud: Not Ready Yet.” (On 2012-05-06 this is listed as “working paper” or “under review” on the authors’ home pages; I haven’t seen the content, but I understand that Google Spreadsheets is featured.)