/docs/community?hl=en
/docs/community?hl=en
11/13/11
Original Poster
Bas Braams

normdist throws negative value still

Earlier relevant postings [1-3] are closed for further replies, so let me record here that NORMDIST still throws negative values (or values above 1.0) at the outer ends of its range. The example used in the earlier references is =normdist(-6,40,8,1) and also today the result is -0.000000008336133. More generally, using normalized variables, normdist(x,0,1,1) returns a negative value for x=-5.0 and it is oscillatory for smaller values of x; for x=5.0 the value exceeds 1 and it oscillates around 1 for larger values of x. Five sigma is not insanely far out, and even if it were I would say that it all points to a problem of quality control. In [1] @malemi mentions an article by Richard Stallman and I think that the reference [4] is to the point. It is hard to imagine that LibreOffice Calc would have similar problems.

2011-11-13
Community content may not be verified or up-to-date. Learn more.
All Replies (30)
Ted J
11/13/11
Ted J
Hi Bas,
  I'll see what I can do.  Do you mind creating a sharing a spreadsheet that shows the error and then shows the desired behavior?
11/14/11
Original Poster
Bas Braams
Just create an empty spreadsheet and enter =normdist(-6,40,8,1) into the A1 cell for the classic example, or enter =normdist(-5,0,1,1) for the normalized example (replace -5 by the "x" of your choice). Someone else may want to graph the behavior of normdist(x,0,1,1) for x in the vicinity of the range [-6,-5]. I believe that the desired behavior may be demonstrated by doing the same in LibreOffice Calc.

I trust that I don't have to explain that normdist should only take values in the range (0,1). It is acceptable if the numerical range is the closed range [0,1], but it is not acceptable to have values outside that closed range.
AD:AM
11/14/11
AD:AM
<<  Someone else may want to graph the behavior of normdist(x,0,1,1) for x in the vicinity of the range [-6,-5]. >>


I'll (re-?)escalate the bug.

Cheers
Adam
11/14/11
Original Poster
Bas Braams
As a matter of interest I note that GDocs spreadsheets also has the complementary error function and the implementation of that one looks, upon very cursory inspection, to be correct; it is probably just the standard system-supplied erfc. The Spreadsheets coders simply neglected to use erfc and the appropriate transformation to compute normdist(x,med,dev,1). Functions erf and erfc are also available in Google Calculator (part of the web search).

The appropriate identity is between normdist(-x,0,1,1) and erfc(x/sqrt(2))/2. For the case x=5.0 we find:

Google Docs spreadsheets: =normdist(-5.0,0,1,1) gives -2.3620e-7 (note, negative value)
LibreOffice Calc: =normdist(-5.0,0,1,1) gives 2.8665e-7
Google Docs spreadsheets: =erfc(5.0/sqrt(2))/2 gives 2.8665e-7
Google web search (Calculator): erfc(5.0/sqrt(2))/2 gives 2.8665e-7

(all numbers rounded to 5 significant digits).
11/16/11
Original Poster
Bas Braams
In the spirit of this conversation it may also be observed that according to GDocs spreadsheets,
stdev(1e8,1e8+1)=0
Note that 1e8 and 1e8+1 are exactly representable, and note that
stdev(0,1)=0.707106781...
In LibreOffice and in the real world, of course,
stdev(1e8,1e8+1)=0.707106781...
It is not difficult to guess and verify what is going on; indeed, I guessed it except for the precise placement of the factor N before doing the experiment. The standard deviation of N data points x[0..N-1] is apparently being calculated as
sqrt((N*sum(x^2)-(sum(x))^2)/(N*(N-1)))
It is mathematically unobjectionable and numerically unsound, as witness the experiment just shown.
11/17/11
Original Poster
Bas Braams
It looks like this exercise can be continued in any way that one can think of trying.

Linear regression: x-values {0,1,2}, y-values {0,0,1}; the best fit (not required to go through the origin) has a slope of 0.5. Now shift the x-range. Of course, the slope of the best fit should still be 0.5 and a careful implementation will get that right. This is really not asking for much.

In the following I don't pay attention to the y-intercept, which is also returned by LINEST.

GDocs spreadsheets:
=LINEST({0,0,1},{0,1,2}); we find a slope of 0.5 indeed.
=LINEST({0,0,1},arrayformula({0,1,2}+1e7)); we find a slope of 0.49710.
=LINEST({0,0,1},arrayformula({0,1,2}+1e8)); we get a server error; presumably a division by 0.

LibreOffice Calc
=LINEST({0,0,1},{0,1,2}+1e8); slope of 0.5 exactly.
=LINEST({0,0,1},{0,1,2}+1e14); slope of 0.5 still.
It only fails at a shift of 1e15, but then we've really gone beyond 64-bit precision.
Ted J
11/17/11
Ted J
Thanks for all of this info.  We've escalated this to the Docs team.  Feel free to keep posting relevant info!
11/17/11
Original Poster
Bas Braams
Here is a combinatorial one for a change.

=MULTINOMIAL(100,0) returns 1, and that is correct.
=MULTINOMIAL(1000,0) fails, numerical error. The correct result would again be 1.

The failure is not reasonable. The multinomial should simply computed as a product of binomials; a recursion in the number of arguments. For just two arguments there is nothing to it: multinomial(x,y) = combin(x+y,x), or just as well combin(x+y,y). Apparently this kind of recursion is not used. I note that the binomial is computed correctly.

In the case of the multinomial also LibreOffice Calc (version 3.4.3) fails.
11/19/11
Original Poster
Bas Braams
I don't really have anything new to say today; the numerical defects are quite predictable. The main purpose of today's report is to get some keywords into the conversation that aren't there yet.

Function VAR (and, I will guess, its relatives VARA, VARP and VARPA) has the same problem as function STDEV (and, I will guess again, STDEVA, STDEVP and STDEVPA). Demonstration:

=VAR(0,1) returns 0.5, which is correct.
=VAR(1e8,1e8+1) returns 0, which is an error; the correct result is 0.5.

Function CORREL has the same problem has function LINEST, and I would worry about all other functions that perform regression or correlation analysis.

=CORREL({0,0,1},{0,1,2}) returns 0.866025..., and I'l believe it.
=CORREL({1e8,1e8,1e8+1},{0,1,2}) returns an error: division by 0.
=CORREL({0,0,1},{1e8,1e8+1,1e8+2}) likewise returns division by 0.

The presumed common source of these problems with correlation and regression is failure to shift the data before doing the linear algebra. Take note, please, that in these examples we are still far away from inevitable limits due to 64-bit floating point arithmetic. If the calculations fail only for a shift of 1e14 or 1e15 then I have no complaint.

Someone might be interested to check the accuracy of some of the statistical distribution functions. Also it may be useful to see how the issues with correlation and regression are reflected in the various financial analysis functions.
AD:AM
11/19/11
AD:AM
It would appear that GSheets is using a similar (the same?) "one-pass" algorithm to pre-2003 Excel, which suffered the same issues.

<< Functions that involve sums of squares (...) In summary, if you use an earlier version of Excel, you must be concerned about round-off errors in cases where data contains many significant digits but the variance is small. The article about VAR presents contrived examples of this; you can expect that these round-off problems do not naturally occur frequently in real data. >>

19 MORE
8/3/13
Original Poster
Bas Braams
None of the errors described here have been fixed and none have been acknowledged as errors by Google staff. The lack of concern from the Google side about these errors (of which the binary search error and the normsdist/erfc complex of errors stand out the most) has led me to the conjecture [1] that the Spreadsheets project has been effectively abandoned by the Apps team; it just hasn't been announced yet. Please see [2] for a more concise version of the present sequence of reports.

[1] (2013-04-14) Prediction: Docs text editor and spreadsheets to be frozen and abandoned in favor of Quickoffice and other products

[2] (2012-09-26) Errors in the spreadsheets program
Were these replies helpful?
How can we improve them?
 
This question is locked and replying has been disabled. Still have questions? Ask the Help Community.

Badges

Some community members might have badges that indicate their identity or level of participation in a community.

 
Google Employee — Google product team members and community managers
 
Community Specialist — Google partners who help ensure the quality of community content
 
Platinum Product Expert — Community members with advanced product knowledge who help other Google users and Product Experts
 
Gold Product Expert — Community members with in-depth product knowledge who help other Google users by answering questions
 
Silver Product Expert — Community members with intermediate product knowledge who help other Google users by answering questions
 
Product Expert Alumni — Former Product Experts who are no longer members of the program
Community content may not be verified or up-to-date. Learn more.

Levels

Member levels indicate a user's level of participation in a forum. The greater the participation, the higher the level. Everyone starts at level 1 and can rise to level 10. These activities can increase your level in a forum:

  • Post an answer.
  • Having your answer selected as the best answer.
  • Having your post rated as helpful.
  • Vote up a post.
  • Correctly mark a topic or post as abuse.

Having a post marked and removed as abuse will slow a user's advance in levels.

View profile in forum?

To view this member's profile, you need to leave the current Help page.

Report abuse in forum?

This comment originated in the Google Product Forum. To report abuse, you need to leave the current Help page.

Reply in forum?

This comment originated in the Google Product Forum. To reply, you need to leave the current Help page.