Breakout Ib: Data-mining and Visualization of Social Networks


Breakout I.b (Bren 3526):

 

 

One of the most important uses of information technology today is to facilitate data-mining and pattern-recognition across large amounts of data, with a result that is often presented for thought in the form of a visualization. However, the data-mining of vast social networks, whether in real-time or asynchronously, presents challenges that are at once technological, social, and legal. Visualization is also an area that, for many developers and almost all users, is poorly understood in terms of its theory, logic, history, typologies, and constraints/affordances. How do we data-mine social-computing information today?

 

 

High level comments from notes of the discussion (Ben):

There's no particular order per se:

The visualization side of the table brought up the concept that visualization research typically have two main challenges.  One is how can we give the data of interest the right "graphical expression," i.e. how to represent it graphically. This is a very generic question, not terribly specific to social networks.

1. How do we visualize interactions between users? Basic connections between users is simple to show via edges in a graph. But how can we visualize more complex interactions between users.

2. Another perspective is that we want to use visualization as a navigational guide, a way-finding interface through the data, rather than a simple representation of the data. This type of "data curation" is key, and there are lots of related questions...

3. For large scale data, how do we summarize data in such a way to make data curation and interactive browsing manageable?

4. Large scale data needs to be indexed so that data can be processed in near real-time using commodity resources. How do we find the right data representations to be able to answer the "right" questions

5. How does privacy settings in social networks affect data gathering? What can we do to understand not just what people show, but more interestingly, what people hide? We essentially need baseline data sets to measure this.

6. Large datasets are being gathered from online social networks, but data is often incomplete or approximate because of privacy concerns and limits of crawling technology and resources. How does this impact researchers in social sciences making use of this data? How much "quality loss" can be tolerated because the scale of the data itself is much larger? Does scale make up for fidelity? 


Here's a rough list of things we talked about at the data mining breakout...

There's no particular order per se:

The visualization side of the table brought up the concept that visualization research typically have two main challenges.  One is how can we give the data of interest the right "graphical expression," i.e. how to represent it graphically. This is a very generic question, not terribly specific to social networks.

1. How do we visualize interactions between users? Basic connections between users is simple to show via edges in a graph. But how can we visualize more complex interactions between users.

2. Another perspective is that we want to use visualization as a navigational guide, a way-finding interface through the data, rather than a simple representation of the data. This type of "data curation" is key, and there are lots of related questions...

3. For large scale data, how do we summarize data in such a way to make data curation and interactive browsing manageable?

4. Large scale data needs to be indexed so that data can be processed in near real-time using commodity resources. How do we find the right data representations to be able to answer the "right" questions

5. How does privacy settings in social networks affect data gathering? What can we do to understand not just what people show, but more interestingly, what people hide? We essentially need baseline data sets to measure this.

6. Large datasets are being gathered from online social networks, but data is often incomplete or approximate because of privacy concerns and limits of crawling technology and resources. How does this impact researchers in social sciences making use of this data? How much "quality loss" can be tolerated because the scale of the data itself is much larger? Does scale make up for fidelity?