The Brandeis GPS blog

Insights on online learning, tips for finding balance, and news and updates from Brandeis GPS

Tag: data-set

Standing At The Mean

Sam Halperin  is currently a Programming Instructor at Thinkful. He is a 2011 graduate of Brandeis Graduate Professional Studies Master of Science in Software Engineering. He is working on a doctorate in Computer Science, and also blogs at

Experimentation enabled by advances in low-cost consumer virtual reality hardware and software.

A few months ago, after a long hacking session with a genetic algorithm (an algorithm that evolves a solution from “chromosomes” over time),Pic1 Unity Game Engine (a 3D video game engine) and an Oculus Rift immersive display, I had what I think is a unique experience:   Creating a data set with the GA, writing a renderer that transformed the data into geometry, hues and color values, and piping the output to a head mounted display, I was able to don the goggles and somewhat literally walk around and stand at the mean of the data set and look around.  For me, this view into the data was a transformative personal experience, if not a scientifically valid approach to understanding data.

Weeks later a second experiment emerged, this time using sensor data attached to a stationary bicycle to drive the view-camera in a virtual environment.   This apparatus had been part of a somewhat Quixotic quest for a virtual reality based active gaming Sampost2experience.  Once implemented, it represented the faintest surface scratch into the vast requirements of art, engineering, sound, theatre and animation that actually make up a production game, but a uniquely satisfying experiment.

The most recent experiment in this set leveraged design training and demonstrated the architectural visualization pipeline from consumer-grade modeller (SketchUp) to virtual reality experience.  This product, like the other two, was also the “first 20%” of effort, (see The Pareto Principle), but uniquely satisfying. The video from the work has been retweeted many times and had over 1800 views since it has been up, and I have received numerous requests for collaboration on similar projects. (

Clearly there is a growing mass movement representing a desire for this type of virtual reality technology.  The defining factor in my experience thougsampic3h, as differs from virtual reality experimentation in the past, was that this work didn’t require access to a university
lab, defense contractor or space agency. This access is possible due to a sea change in VR technology driven by the release of the Oculus Rift Head Mounted Display.

Beginning with the release of the Oculus Rift, and followed closely by other projects, VR technology is beginning to permeate as a consumer level technology.  My bike-vr project is actually one of a few similar experiments documented in the various online communities surrounding the technology.  There is a growing community of VR hackers (perhaps a better term is maker) throughout the world, and the level of experimentation has grown exponentially.

My involvement in this work is only beginning, but I am tremendously optimistic that the technology itself represents a positive force for our ability to visualize problems, to communicate with each other, and to be present in environments that we wouldn’t normally be able to experience — across history, geography, scale and any other limits.

Question: What is the value of “being present” and experiencing virtual environments in this way?  What is the value of “standing at the mean”, and how does it differ from viewing a place, a time or a dataset on a traditional computer monitor?  What are the drawbacks?

Answer: The experience of presence with this type of display is so powerful that it can actually make the viewer nauseous, experiencing a sort of simulator sickness approaching seasickness.   At the same time, intelligently engineered virtual environments, built with this in mind can fool the brain in a more positive direction, producing joy, fright, sadness, even the perception of temperature changes.  This is not an experience that is common to interaction with a smartphone or tablet.

Current VR work of interest is quite vibrant and diverse, spanning topics such as “redirected walking” techniques for navigating large virtual environments by walking around small laboratories[1], the study of “oculesics”, where eye movements are tracked and communicated across networks to enhance communication[2], and the exploration of very large datasets using large laboratory installations ringed by huge arrays of displays[3].

See Also

  • [1] Suma, E. A., Bruder, G., Steinicke, F., Krum, D. M., & Bolas, M. (2012). A taxonomy for deploying redirection techniques in immersive virtual environments. Virtual Reality Short Papers and Posters (VRW), 2012 IEEE, 43–46. doi:10.1109/VR.2012.6180877
  • [2] Steptoe, W., Wolff, R., Murgia, A., Guimaraes, E., Rae, J., Sharkey, P., … & Steed, A. (2008, November). Eye-tracking for avatar eye-gaze and interactional analysis in immersive collaborative virtual environments. In Proceedings of the 2008 ACM conference on Computer supported cooperative work (pp. 197-200). ACM.
  • [3] Petkov, K., Papadopoulos, C., & Kaufman, A. E. (2013). Visual exploration of the infinite canvas. Virtual Reality (VR), 2013 IEEE, 11–14. doi:10.1109/VR.2013.6549349

Click here to subscribe to our blog!


Is an Average of Averages Accurate? (Hint: NO!)

by: Katherine S Rowell author of “The Best Boring Book Ever of Select Healthcare Classification Systems and Databases” available now!

Originally posted:

Today a client asked me to add an “average of averages” figure to some of his performance reports. I freely admit that a nervous and audible groan escaped my lips as I felt myself at risk of tumbling helplessly into the fifth dimension of “Simpson’s Paradox”– that is, the somewhat confusing statement that averaging the averages of different populations produces the average of the combined population. (I encourage you to hang in and keep reading, because ignoring this concept is an all too common and serious hazard of reporting data, and you absolutely need to understand and steer clear of it!)

hand drawing blue arrowImagine that we’re analyzing data for several different physicians in a group. We establish a relation or correlation for each doctor to some outcome of interest (patient mortality, morbidity, client satisfaction). Simpson’s Paradox states that when we combine all of the doctors and their results, and look at the data in aggregate form, we may discover that the relation established by our previous research has reversed itself. Sometimes this results from some lurking variable(s) that we haven’t considered. Sometimes, it may be due simply to the numerical values of the data.

First, the “lurking variable” scenario. Imagine we are analyzing the following data for two surgeons:

  1. Surgeon A operated on 100 patients; 95 survived (95% survival rate).
  1. Surgeon B operated on 80 patients; 72 survived (90% survival rate).

At first glance, it would appear that Surgeon A has a better survival rate — but do these figures really provide an accurate representation of each doctor’s performance?

Deeper analysis reveals the following: of the 100 procedures performed by Surgeon A,

  • 50 were classified as high-risk; 47 of those patients survived (94% survival rate)
  • 50 procedures were classified as routine; 48 patients survived (96% survival rate)

Of the 80 procedures performed by Surgeon B,

  • 40 were classified as high-risk; 32 patients survived (80% survival rate)
  • 40 procedures were classified as routine; 40 patients survived (100% survival rate)

When we include the lurking classification variable (high-risk versus routine surgeries), the results are remarkably transformed.

Now we can see that Surgeon A has a much higher survival rate in the high-risk category (94% v. 80%), while Surgeon B has a better survival rate in the routine category (100% v. 96%).

Let’s consider the second scenario, where numerical values can change results.

First, imagine that every month, the results of a patient satisfaction survey are exactly the same (Table 1).


The Table shows that calculating an average of each month’s result produces the same result (90%) as calculating a Weighted Average (90%). This congruence exists because each month, the denominator and numerator are exactly the same, contributing equally to the results.

Now consider Table 2, which also displays the number of responses received from a monthly patient-satisfaction survey, but where the number of responses and the number of patients who report being satisfied differ from month to month. In this case, taking an average of each month’s percentage allows some months to contribute to or affect the final result more than others. Here, for example, we are led to believe that 70% of patients are satisfied.


All results should in fact be treated as the data-set of interest, where the denominator is Total Responses (2,565) and the numerator is Total Satisfied (1,650). This approach correctly accounts for the fact that there is a different number of values each month, weights them equally, and produces a correct satisfaction rate of 64%. That is quite a difference from our previous answer of 6% — almost 145 patients!

How we calculate averages really does matter if we are committed to understanding our data and reporting it correctly. It matters if we want to identify opportunities to improve, and are committed to taking action.

As a final thought about averages, here is a wryly amusing bit of wisdom on the topic that also has the virtue of being concise. “No matter how long he lives, a man never becomes as wise as the average woman of 48.” -H. L. Mencken.

I’d say that about sums up lurking variables and weighted averages — wouldn’t you?

– See more at:


Protected by Akismet
Blog with WordPress

Welcome Guest | Login (Brandeis Members Only)