class: center, middle, inverse, title-slide # Social Network Analysis ## Network Data, Centrality, Transitivity, Clustering ### Paulo Serôdio ### 2019-05-20 --- # Goals of this Seminar + Introduction to Core Social Network Concepts + Overview of the field and the tools + Mathematical foundations + SNA Data & Survey Design + Centrality + Social Capital + Cohesion + Subgroups + Equivalence (Role & Position) + Hypotheses testing + Introduction to network analysis in R --- # Structure of the Course + **Monday** : Introduction, Global & Local Network properties + Algebra + Graph theory + Network data + Centrality & Centralization + Transitivity & Clustering + **Tuesday** : Social Capital, Brokerage and Equivalences + **Wednesday** : Cohesion, hypothesis testing and inferential networks --- # Objectives of the course - Build intuition - Expose key concepts - Highlight big questions - provide abstract examples - Some pointers to other studies - *NOT* a substitute for technical work --- # Introduction + Name + Affiliation + Discipline + SNA Experience/Knowledge + Phenomena of interest --- <img src="assets/img/image3.gif" width="70%" /> --- # shortest path Amersham to Woolwich Arsenal <img src="assets/img/image4.png" width="70%" /> --- And people understand them intuitively --- <img src="assets/img/image5.png" width="90%" style="display: block; margin: auto;" /> --- <img src="assets/img/image6.png" width="100%" /> --- <img src="assets/img/image7.jpeg" width="70%" /> --- # Growth in Multiple Areas + Pop Culture + Kevin Bacon + Online “social network sites” + Business Practitioners + New consulting tools + Knowledge management + Academics + In multiple fields from communication to epidemiology ![](assets/img/image8.jpeg){width=10px} --- # What Defines SNA? + Phenomenon studied + distinctive type of data + Perspective taken + Perhaps one perspective, but multiple theories + Methodological toolkit + new concepts, new tools --- # Reasoning about Networks + What can achieve from studying networks? + Patterns and statistical properties of network data; + Design principles and models; + Understand the organisation of networks; + How can we reason about networks? + **Empirical** : study data; measure and quantify; + **Mathematical** Models: graph theory & stats, distinguish surprising from expected phenomena + **Algorithms** : for hard computational challenges --- # how mathematicians reason about networks - Mathematicians are concerned with the abstract structure of a graph - Mathematicians define operations to analyze and manipulate graphs. Moreover, they develop theorems based upon structural axioms. <img src="assets/img/image9.png" width="50%" style="display: block; margin: auto;" /> --- # how physicists reason about networks - Physicists are concerned with modeling real-world structures with networks. - Physicists define algorithms that compress the information in a network to more simple values (e.g. statistical analysis). <img src="assets/img/image10.png" width="50%" style="display: block; margin: auto;" /> --- <img src="assets/img/image12.png" width="100%" style="display: block; margin: auto;" /> --- # Much of the World has a Graphical/Network Structure - **Social networks**: define how persons interact (collaborators, friends, kins). - **Biological networks**: define how biological components interact (protein, food chains, genes). - **Transportation networks**: define how cities are joined by air and road routes. - **Dependency networks**: define how software modules use each other. - **Communication networks** - **Language networks**: define the relationships between words. --- # History of SNA + 1736- Euler + 1930s- Sociometry + 1940s Psychologists + 1950s & 60s Anthropologists + 1970s Rise of Sociologists + Small Worlds, Strength of weak ties + 1980s IBM computation + Computer programs developed + 1990s Ideas spread + UCINET released, spread of network analyis to multiple fields, social capital, embedded ties + 2000s Physicists jump on the bandwagon --- # What is a Network? + A set of dyadic ties all of the same type, among a set of actors ( nodes ) <img src="assets/img/image14.png" width="100%" style="display: block; margin: auto;" /> --- # Popular Social Network Theories + Small World Phenomenon (6 degrees of separation) + Strength of Weak Ties (information diffusion; job market) + Embeddedness (“What would economic life be like if people didn’t have social relationships?”) + Social Capital (cooperation and social networks have value) --- # Relations Matter Attributes vs. Relations (Discovery of HIV: sexual contact among gay men with unusual cancer, traced by Darrow at the CDC) <img src="assets/img/image15.png" width="50%" style="display: block; margin: auto;" /> --- # Structure Matters Medieval trade in Russian rivers ![](assets/img/image17.jpeg) --- ![](assets/img/image18.jpeg) --- # Why Study Networks? + Prevent the spread of disease + Make the world a better place + Improve organizational effectiveness --- # Prevent the spread of disease <img src="assets/img/image19.png" width="70%" style="display: block; margin: auto;" /> --- # Improve Organizational Outcomes ![](assets/img/image20.png) --- # Make the World A Better Place ![](assets/img/image21.png) --- # Tools & Software + DISCLAIMER: + This course focuses on **R** software and, if there is time, **Gephi** . They are not the only tools out there for social network analysis and visualization, but, they are very popular and have a nice balance of usability and capability --- # Other (beginner) Software tools + UCINET + NetDraw + *The** social network analysis software + Domain specific + SIENA (time series analysis of networks) + Pajek (Better at computational analysis of really large networks) + E-Net (analyzing ego networks) + KeyPlayer (influencing or disrupting networks) --- # Defining & Describing a network + In social network analysis, we draw on two major areas of mathematics regularly: + **Matrix Algebra** + Tables of numbers + Operations on matrices enable us to draw conclusions we couldn’t just intuit + **Graph Theory** + Branch of discrete math that deals with collections of ties among nodes and gives us concepts like paths --- # Network vs. Case Perspective + One of the biggest differences between the SNA perspective and more traditional social science perspectives is the nature of the data + Instead of individual cases, where we collect the same information for a bunch of people + Here, we collect information about the interaction of pairs of people --- # Mainstream Logical Data Structure + 2-mode rectangular matrix in which rows (cases) are entities or objects and columns (variables) are attributes of the cases + Analysis consists of correlating columns + Emphasis on explaining one variable <table class="table" style="font-size: 20px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Age </th> <th style="text-align:left;"> Education </th> <th style="text-align:left;"> Salary </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # Network Logical Data Structures <img src="assets/img/image22b.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – simple undirected <img src="assets/img/image23.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – complex <img src="assets/img/image24.jpeg" width="100%" style="display: block; margin: auto;" /> --- # representing networks – directed networks <img src="assets/img/image25.jpeg" width="100%" style="display: block; margin: auto;" /> --- # representing networks – bipartite networks <img src="assets/img/image26.jpeg" width="100%" style="display: block; margin: auto;" /> --- # describing networks <img src="assets/img/image27.jpeg" width="100%" style="display: block; margin: auto;" /> --- # describing networks <img src="assets/img/image28.jpeg" width="100%" style="display: block; margin: auto;" /> --- # describing networks <img src="assets/img/image29.png" width="100%" style="display: block; margin: auto;" /> --- # describing networks <img src="assets/img/image30.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – link types <img src="assets/img/image31.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – network modes <img src="assets/img/image32.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – directed networks <img src="assets/img/image33.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – symmetric networks <img src="assets/img/image34.png" width="100%" style="display: block; margin: auto;" /> --- # representing networks – affiliation networks <img src="assets/img/image34.png" width="100%" style="display: block; margin: auto;" /> --- # Matrix Algebra In this section, we will cover: - Matrix Concepts, Notation & Terminologies - Adjacency Matrices - Transposes - Matrix Operations --- # Matrices + Symbolized by a capital letter, like A + Each cell in the matrix identified by row and column subscripts: a ij + First subscript is row, second is column <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> ID </th> <th style="text-align:left;"> Age </th> <th style="text-align:left;"> Gender </th> <th style="text-align:left;"> Income </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mary </td> <td style="text-align:left;"> a_11 </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Bill </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> John </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> a_32 </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> Larry </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> </tbody> </table> --- # Vectors + Each row and each column in a matrix is a vector + – Vertical vectors are column vectors, horizontal are row vectors + Denoted by lowercase bold letter: **y** + Each cell in the vector identified by subscript `\(x_i\)` --- # Ways and Modes + Ways are the dimensions of a matrix. + Modes are the sets of entities indexed by the ways of a matrix <img src="assets/img/image36b.png" width="85%" style="display: block; margin: auto;" /> --- # Proximity Matrices + Proximity Matrices record “degree of proximity”. + Proximities are usually among a single set of actor (hence, they are 1-mode), but they are not limited to 1s and 0s in the data. + What constitutes the *proximity* is user-defined. + Driving distances are one form of proximities, other forms might be number of friends in common, time spent together, number of emails exchanged, or a measure of similarity in cognitive structures. --- # Proximity Matrices + Proximity matrices can contain either *similarity* or *distance* (or *dissimilarity* ) data. + Similarity data, such as number of friends in common or correlations, means a larger number represents more similarity or greater proximity + Distance (or dissimilarity data) such as physical distance means a larger number represents more dissimilarity or less proximity --- # Transposes + The transpose `\(M^'\)` of a matrix `\(M\)` is the matrix flipped on its side. + The rows become columns and the columns become rows + So the transpose of an m by n matrix is an n by m matrix. --- # Transpose Example <img src="assets/img/image37b.png" width="100%" style="display: block; margin: auto;" /> --- # Dichotomizing + X is a valued matrix, say 1 to 10 rating of strength of tie + Construct a matrix Y of ones and zeros s.t. `\(y_{ij} = 1\)` if `\(x_{ij} > 5\)`, and `\(y_{ij} = 0\)` otherwise <table class="table" style="font-size: 18px; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> EVE </th> <th style="text-align:left;"> LAU </th> <th style="text-align:left;"> THE </th> <th style="text-align:left;"> BRE </th> <th style="text-align:left;"> CHA </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> EVELYN </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> LAURA </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 3 </td> </tr> <tr> <td style="text-align:left;"> THERESA </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> BRENDA </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> CHARLOTTE </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:left;"> EVE </td> <td style="text-align:left;"> LAU </td> <td style="text-align:left;"> THE </td> <td style="text-align:left;"> BRE </td> <td style="text-align:left;"> CHA </td> </tr> <tr> <td style="text-align:left;"> EVELYN </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> LAURA </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> THERESA </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> BRENDA </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> 0 </td> </tr> <tr> <td style="text-align:left;"> CHARLOTTE </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:left;"> 0 </td> </tr> </tbody> </table> --- # Symmetrizing + When matrix is not symmetric, i.e., `\(x_{ij}\)` ≠ `\(x_{ji}\)` + Symmetrize various ways. Set `\(y_{ij}\)` and `\(y_{ji}\)` to: - Maximum(x_ij, x_ji): union rule; - Minimum(x_ij, x_ji): intersection rule; - Average (x_ij+x_ji)/2 - Lowerhalf: choose `\(x_{ij}\)` when `\(i > j\)` and `\(x_{ji}\)` otherwise --- # Symmetrizing Example What rule are we using here? <img src="assets/img/figure38b.png" width="100%" style="display: block; margin: auto;" /> --- # Matrix Multiplication - Matrix products are not generally commutative (i.e., AB does not usually equal BA) - Notation: `\(C = AB\)` - only possible when the number of columns in A equals number of rows in B; these are said to be comformable. It is calculated as: `$$c_{ij} = \sum a_{ik} * b_{kj} \quad \forall k$$` --- # Matrix multiplication example i <img src="assets/img/figure38c.png" width="100%" style="display: block; margin: auto;" /> --- # Matrix multiplication example ii <img src="assets/img/figure38d.png" width="100%" style="display: block; margin: auto;" /> --- # Products of matrices & their transposes `\(XX^T\)` = product of matrix `\(X\)` by its transpose `\(X^T\)` - Computes sums of products of each pair of rows (cross-products) - Gives similarities among rows <img src="assets/img/figure39b.png" width="100%" style="display: block; margin: auto;" /> --- <img src="assets/img/figure39.png" width="100%" style="display: block; margin: auto;" /> --- # squaring an adjacency matrix <img src="assets/img/figure40.png" width="100%" style="display: block; margin: auto;" />