Here’s an interesting article just published by Matthew Das Sarma, Stanford University on May 1st 2018, for extracting relational structure from data without structure.
Here the researcher(s) discuss the approach and benefits of random walks through unstructured data to find relationships between data nodes. The walks through real-world graphs are reported to observe a power-law distribution, meaning they follow a form of language structure. It’s like a sentence structure to communicate the latent semantics of the characteristics of a thing. It’s a wonderful concept. Approaches from natural language modelling (NLP/NLM) can then be drawn upon this structure to determine the probability that a vertex appears in a given walk. An arbitrary constraint is applied to limit the depth (length) of a walk, to make the problem tractable, and a final structure can be extracted from the probability modelling. It’s wonderful & I hope it works; of course if we can find a method of automated constraints, then all the better.
Once structured, graph convolutional network (GCN) algorithms can be used to learn from the (now) graph-structured data without further feature extraction.
Attribution: Image courtesy of the author and article published on thegradient.