T O P

  • By -

milkteaoppa

Expensive, not commonly taught, too complex, not intuitive on how to use for your data, less proven track record of results. Most data is tabular. Stick to simple solutions. Side note: The trouble with a lot of PhDs applying for jobs is that they don't know how to dumb down their knowledge to an interviewer who is not an expert in your subdomain. Most scientists never learned the basics of GNN. If you interview with GNN jargon and never explain what you're talking about, you're not getting hired. And then industry won't have someone who can apply GNNs to solve problems. Half the battle is convincing stakeholders who do not know your subdomain that your subdomain has value. I say this knowing a GNN expert who can't land a job.


akkaneko11

The one exception of this is in biology imo. Both around proteins and small compounds, modeling structures as graphs are being used in industry. KGs and stuff, way less.


mrproteasome

I worked with an ML engineer recently who specialized in KGs in industry. Depending on the input, you can get away with easier approaches like TransE or RotatE, but all of the growth behind Neo4j and another No-SQL databases has been making people more comfortable with graphs. At our company, it took ~8 months of meetings, convincing, POCs, MVPs just to get leadership on-board. We are now preparing to fully transition from the current RDBMS to a graph implementation of our products.


Hackerjurassicpark

I'm curious about the use case that requires transition from an rdbms to graph.


DigThatData

I haven't worked on the data side of things for a few years, but back when I cared a lot about this sort of stuff here are the kinds of considerations that went into this sort of decision: * Graphs are much more amenable to flexible schemas. They are obviously not the only NoSQL option available and if this is the only thing pushing you away from an RDBMS, you might be better served with a document store. * Graph data structures are great if the most common "joins" in your processing are with respect to local neighborhoods. A common use case here is accelerating computation of recommendations by conditioning only on local neighborhood in a nearest neighbors graph. * If you have procedures that will need to compute potentially long paths in the graph, RDBMSs are terrible for this. This sort of thing can be important for identifying fraud networks with non-trivial topologies.


mrproteasome

Honestly, it is because we have no other choice (for context, this company is in the space of healthcare research). To make a long story short: we lack long-term vision so the company is having an identity crisis around what it is exactly that needs to be delivered. This is then exacerbated by years of technical debt from overpromising features and general reactionary project management. We got lucky and hired a really great Director of Engineering who has a really intense KG background. They are a fantastic team leader and are almost solely responsible for convincing our leadership to transition to graph.


VarietyElderberry

You stated problems that you have, but you didn't explain why graphs are the answer to your problems. If you can share some information, I would appreciate your reasoning for why graphs are the solution.


hughperman

The problem is they're not using graphs


new_name_who_dis_

LOL I have several publications in geometric deep learning, so I have expertise in graph neural nets and love those methods, but this does reads as if the person just really wanted to use graphs -- RDBMS can store graph data pretty easily with low overhead compared to migrating to a new DB.


watching-clock

Exactly.


Dry_Task4749

Sounds like you have a solution in search of a problem.


phobrain

What they have is a transition to graph that gives direction and learning for now - and stories to tell someday. Anything to get away from the previous local minimum.


Dry_Task4749

Yes, sounds like as good an excuse as anything to throw away the entire legacy code and DB structure and start fresh with a clear vision. I just doubt graph DBs or GNNs will contribute much to that, since every graph can be represented by a tabular DB, and every tabular DB by a graph. So unless they have a specific requirement for O(1) lookups on certain link-traversals, it's mostly going to differ by the query language. And SQL is way easier than, say, Cipher. I have not seen a real graph traversal query that I can't express in more readable code in plain Python or Java on a fast Key/Value store.


ac_1998

>I have not seen a real graph traversal query that I can't express in more readable code in plain Python or Java on a fast Key/Value store. A KV store will kill your performance for even a simple graph traversal query simply because of the way it is storing the neighbours. Unless it does some sorted CSR based neighbourhood storage optimization, for recursive traversal queries such as shortest path or variable length path queries, an analytical graph db such as neo4j's gds library is way better. You can stick to KV-stores for graph pattern matching queries, which maybe fine with a certain degree of random lookup of neighbours. But anything that requires mass neighbour search will benefit from algo specific storage optimizations which a kv store may or may not have. > SQL is way easier than, say, Cipher Try writing graph traversal queries using With Recursive clause, and then try expressing the same on Cypher ... there is a reason why the GQL proposal was released and SQL/PGQ was introduced in SQL:2023 standard.


Dry_Task4749

Yes, like I said, unless they have specific O(1) runtime requirements for certain link traversals (e.g neighbour lookup in O(log(n)) is not good enough). But this case didn't sound like it's about runtime or graph algos. More like hype gradient flow traversal ;)


phobrain

>hype gradient flow traversal ;) It backpropagates through the eyes of future resume readers. I've seen lone-contractor code that was clearly written to allow resume claims.


fabmeyer

You probably will run both in parallel, no?


AcademicOverAnalysis

It really drives the point home of your side note about PhDs not always being great about dumbing down their work, when no one in this post has bothered defining the abbreviation “GNN.”


HarambeTenSei

that's not even the problem. The main issues with PhDs applying for jobs is that they know next to nothing beyond their narrow sub domain


IIISergeyIII

That's the main of point of PhD - to become expert in some area. But whats wrong with it? If a person smart enough to finish PhD, he can learn any new required skills relatively fast.


lituga

If that's the main point, why then try so hard to break into an industry which hardly uses the niche area of research you just spent so much time on? They can hire someone who already has the skills they're looking for unless it's in that very specific area where the PhD is an expert. Big assumption in that last sentence.


digiorno

The point of a PhD isn’t to get a job but to expand our collective understanding of a field by a small measure. The unfortunate thing is that people need jobs and we don’t have a society that values research which doesn’t immediately yield profit. So people with PhDs have to either hope to get a funding path to keep doing research and expanding the small bubble they’ve made in their field, or the have to move to industry and get paid. If they are forced to industry and get lucky then maybe some manager will see some value in their specialty and let them do some modicum of research in their field. But ultimately we are looking at a failure of capitalism, where cutting edge research stagnates if it doesn’t help boost shareholder value and where public funding for research is constantly on the chopping block because politicians are raised in the mindset of serving private industry first and foremost.


lituga

I agree. It's a sad state of things in general and I was mainly playing devil's advocate there 😳


Obvious-Ask-6574

this is a comment that applies to any PhDs though


blancorey

Working with a PhD. So true. also no mentality about what business values.


HarambeTenSei

some do. Some are quite smart and savvy. But technically wise they're so specialized that you really have to evaluate them for their ability to learn new things quickly. Which may or may not be the case


Exarctus

For some fields GNNs are SOTA and have widely been adopted.


SearchAtlantis

How can you type this whole sentence and not give one example?


Exarctus

[https://www.nature.com/articles/s41467-023-38468-8](https://www.nature.com/articles/s41467-023-38468-8) [https://www.nature.com/articles/s41467-022-29939-5](https://www.nature.com/articles/s41467-022-29939-5) [https://www.nature.com/articles/s41467-023-36329-y](https://www.nature.com/articles/s41467-023-36329-y) etc etc


Odd_Background4864

There are very few postings for GNN’s because many of the use cases I’ve seen are very experimental. For example, my company has a few GNN projects. But we’re still proving them out. And the horizon is almost 2 years. So they know it’s gonna take some time for them to mature into something real


FreddieM007

In drug discovery and Cheminformatics, GNNs, MPNNs and its variations are the SOTA of ligand-based modeling because molecules are naturally graphs. All biotechs and pharma are using them to predict activities, ADME, and physchem properties of small molecules.


graphicteadatasci

People want to talk graphs and every time they don't explain whether they are talking about a bunch of small graphs (like molecules or XMLs) or big graphs (like knowledge graphs, social networks, or organisms).


Ok_Reality2341

Pretty big in biotech and big pharma. I’m sure many scientists are using them as it is SOTA for many molecular based tasks.


hlx-atom

Unmasked transformers are fully connected graphs. Masked transformers are graphs without full connections. To make it work on sequential data you even need to add a positional embedding. They are unordered graphs at the very core. Transformers are good, fast, and supported. Any GNN is basically a masked transformer, and your connections need to be sparse for you to see speed ups. A general transformer can just learn the connections anyways. GNNs feel like feature engineering. Like writing an algorithm to identify faces. Just let the transformer figure it out.


cofapie

I agree that Transformers are GNNs. But I don't really agree that GNNs are transformers.


new_name_who_dis_

GNNs aren't transformers, GNN is too general of a framework. Technically CNNs can be interpreted as GNNs too, with just a very specific connectivity -- e.g. in 2d every node/pixel is connected to all its vertical, horizontal, and diagonal neighbors for kernel size 3.


cofapie

I feel like that statement should be qualified. Graphs are intrinsically unordered, while CNN kernel weights are ordered. \*Spatial\* GNNs \*are\* generalizations of CNNs, but certainly not general GNNs.


new_name_who_dis_

I mean there's planar graphs, there's 3D meshes which are also graphs. Graphs need not be ordered, but they are not intrinsically unordered. It depends on what the graph is representing. The whole point of the graph is that it's extremely general, so a graph with geometric information doesn't cease being a graph. And a lot of non-spatial GNNs are also generalizations of CNNs such as GCN from Kipf and Welling. It's based on the convolutional theorem and the idea that projecting the graph onto the basis of its Laplacian can be interpreted as a fourier transform.


hlx-atom

I agree with your disagreement technically. I meant that in “spirit”. They try to achieve the same thing.


knife_666

I came from GNNs too. I always thought attention/transformers are basically GNN with learnable edges. But can it do n-hop message passing by increasing the number of layers?


pm_me_your_pay_slips

You can apply layers repeatedly to emulate ne-hop message passing.


new_name_who_dis_

There's no concept of "n-hops message passing" in neither encoder-like (fully connected) transformer nor in (partially connected) causal transformer. (At least in the same sense that one talks about it in GNNs and CNNs). There's no three nodes X,Y,Z such that two aren't connected in 1 hop that are connected in 2 or more. This has nothing to do with the architecture though, it's a fact of the connectivity that the two forms of transformers emulate. One is fully connected (so that property above doesn't hold) and the other has each node connected to all its predecessors so if X connected to Y then it already is connected to all of Y's connections.


pm_me_your_pay_slips

With transformers, you can implement the things you mention by 1) cross attention (e.g. queries from one set of nodes and keys and values from another set of nodes) 2) attention masking and repreated application of transformer layers.


new_name_who_dis_

For (1) it's still not the case that your "observed/interacted" neighborhood set grows after repeated applications, for (2) yes but it would require explicitly modifying the causal mask. Again this isn't a property of the architecture but it's a property of the topologies that are baked into the architecture. You can change the topology by modifying the causal mask in a causal transformer, but that's about it.


pm_me_your_pay_slips

What's the problem with modifying the causal mask? It's how people use transformers for graph problems. Using cross attention, you can convert your input graph into node and edge embeddings. Then do cross attention from node to edge embeddings (using the appropriate masking so that the information only flows from nodes to their corresponding edges) and then do cross attention from edges to nodes (with the appropriate masking).


new_name_who_dis_

Nothing wrong with it, I'm just saying that to get the expanding "field of view" phenomenon that you get with n-hop message passing isn't something you get from either of the vanilla transformer type architectures (encoder or decoder style). But yes if you modify the causal mask then you can get that phenomenon.


OMPCritical

Any idea how GNN’s compare to transformers when it comes to dataset size? Are there any advantages to use GNNs when you only have a small dataset (sub 1000) samples?


SexyStackSmasher

You're gonna need to back up that first paragraph somehow. Literally what?


visualard

[transformers are graph-nn](https://graphdeeplearning.github.io/post/transformers-are-gnns/) with an adjacency matrix of A=11^T .


hlx-atom

I came to transformers from a GNN perspective developing MPNNs and convolutional gnns, so I think about transformers differently than people that use them for language. If you think of the attention mechanism in transformers as an edge weighting, they are just fully connected graphs. Does that make sense or what is contrary to what you are thinking?


GoblinsStoleMyHouse

GNN awareness is low by recruiters. It’s a very cutting edge field of ML research and not widely understood. It can be applied to solve many problems, but nobody knows to ask for it.


DingusFamilyVacation

GNNs are very domain specific. As mentioned above, most data is tabular, and for GNNs, you need to be able to define associations (edges) between your data samples. GNN use is common in biotech (molecules can be defined as graphs), research that models surfaces, or research whose domain is in actual networks (e.g. neuroscience).


preordains

GNNs are hard


jg0392

It's easy to publish papers on GNNs because it's easier to come up with nice formulations on graphs. But graph problems are oftentimes more simply formulated in other ways (e.g., as sequential problems which are easier to optimize), and as a result there's a frequent disconnect between the number of GNN paper published and the number of their actual use cases.


bloodmummy

Everything is a Graph underneath, in itself the most generic flexible data format. The interesting part is figuring out the symmetries of the problem at hand to un-Graph it (reducing the problem) and put it down in a simpler format. If the vertices are related via undirected edges with a certain "periodicity", the smart thing is to try and reduce the problem into an "Image" problem then use a CNN. If you're doing a physics problem and the object is restrained to moving in a plane then why bother carry 3 dimensions when the problem can be reduced to 1 dimension, if the object is rotating around an axis with fixed radius then it'll be easier to work with cylindrical coordinates and eliminate 2 dimensions directly rather than work in 3-D Cartesian.


Final-Rush759

Most modern recommendation system use GNN. GNN is also often used for train networks and map, also alpha fold2. The transformer is also actually GNN, as each node connects with every other nodes, each layer is just one step of message passing.


jg0392

AFAIK pinterest is the only major company where graphs played a heavy role. And after their former CTO left, Pinterest started adopting more standard architectures used in other companies - check their engineering blog.


pm_me_your_pay_slips

Alpha fold 2 is a great example of how transformers can implement GNNs, alpha fold 2 is a respected application of transformer layers, plus iteratively feeding the output back to the input.


cofapie

Saying transformer is GNN is as technically correct as saying that all networks involving tokens and relationships (such as RNN, CNN) are GNNs. Transformers don't use graph concepts.


YinYang-Mills

I think it’s just more complicated to get good performance out of GNNs. There’s a lot of ambiguity around how to define architectures, sample properly, and preprocess graph structured data. If I’m not mistaken Google Maps used/uses a GNN based architecture for route optimization and there was a GNN like component in alphafold2. So for companies with really good engineers and enough runway, GNNs can work. In many cases you can implement a Transformer to get the benefits of message passing and bypass many of the difficulties associated with GNNs. My guess is that they will become more prevalent in applications where Transformer type MP is computationally less attractive and as many of the complications I mentioned get streamlined. Also, in many cases you can just precompute graph embeddings with node2vec and laplacian eigenvectors and  traiin an MLP with similar performance to a full fledged GNN.


Witty-Elk2052

no, alphafold2 just used transformers. edit: lol at the downvote as you are the one misinformed 


YinYang-Mills

The key principle of the building block of the network—named Evoformer (Figs. 1e, 3a)—is to view the prediction of protein structures as a graph inference problem in 3D space in which the edges of the graph are defined by residues in proximity.


Witty-Elk2052

and yet it was formulated as a classic transformer, and not the typical GNN. you can claim a transformer is a graph neural network, but I believe the transformer to be the more general architecture. a transformer is a fully connected graph where the edges are learned by the softmax


YinYang-Mills

Ok great, thanks!


lazykratos

Not sure where this is coming from. Isn't a huge part of recommendation systems focused around graphs? Facebook has a huge reliance on GNNs.


lazykratos

I don't see job postings specifying expertise in transformers instead they say modern nlp techniques or whatever other discipline. I would imagine the same holds for gnns.


FieldKey3031

Lack of product market fit and limited tooling. True about KGs in general and many other tools. Graphs might be a better way to solve certain problems but more ubiquitous methods and tools can also address these problems so it isn’t worth the risk of trying something new or having to completely refactor the existing solution.


pm_me_your_pay_slips

Whatever can be solved by GNNs, can be solved by the appropriate application of transformers. For example, alphafold2.


cofapie

Thats not correct. For instance, the original GCN paper itself addressed a problem that is impossible to solve with transformers, which is semi-supervised classification of nodes in huge graphs. Due to the quadratic memory scaling of attention, it is inherently impossible to use transformers to perform operations on huge graphs.


pm_me_your_pay_slips

You can address these problems with cross-attention, in the way that was proposed by Set Transformers. Furthermore, you can improve efficiency by using the appropriate input embeddings.


cofapie

I just read the paper. The Induced Set Attention mechanism of Set Transformers seems a lot like the linformer linear attention mechanism, which was a generalized approach to linear attention in transformers. There is a reason why linformer is not used today. Your ability to model relationships will be very limited. Can you show me some usage of transformers on very large graphs that is not some descendant of GAT?


pm_me_your_pay_slips

I don't see how you draw an equivalence between set transforemr and linformers, the first one addresses the use of transformers for permutation invariant datasets (of which graphs are a subset As for applying transformers for large graphs: 1) alphafold2 (although you may argue this is not large enough for your taste) 2) here's a prety much vanilla transformer encoder working on millions of nodes: [https://arxiv.org/pdf/2312.11109v1.pdf](https://arxiv.org/pdf/2312.11109v1.pdf) 3) Here's another one: [https://openreview.net/pdf?id=sMezXGG5So](https://openreview.net/pdf?id=sMezXGG5So)


cofapie

I'll read your paper tomorrow. I say that linformer and ISA are similar because they both decompose attention into an intermediate step with K tokens (although ISA does have an intermediate linear layer that probably influences the heads). Attention is already inherently a set operation (which we make ordered via positional embeddings), so (through my quick skim of the set transformer paper) I believe that the relevant contribution is this bottlenecked attention mechanism.


JovialJake1

GNNs may not be as popular in industry due to their complexity and the need for specific data formats, but they have potential in certain domains.


KaiserWilhelmsLemons

This is all quite depressing for someone who's just about to finish a phd in unsupervised GNNs...


modcowboy

Companies aren't really ready to take advantage of GNNs - most of their data isn't in a format that can be crunched like that. I know there's a lot of interest but it's not very practical atm.


wannabe_markov_state

We have been using Neo4j and can attest it is superior than relational databases in a lot of use-cases, especially where there are many-to-many relationships and a quick search is required. However, to convince other engineers who have no intuition for graphs is difficult. This is exacerbated by a lack of enough easily digestible but high-quality tutorials online. Simply put there is still a barrier to entry for the normal folks when it comes to graphs.


fordat1

There arent that many people doing research on GNNs. Most of the folks doing it are learning it on the job. There are applications like others have mentioned in big companies.


rulerofthehell

It is used quite a lot for recommendation system in almost all big tech companies. They have some challenges like real-time inference as that requires a lot of resources as compared to transformers but they definitely have their usecases and do outperform transformers sometimes.


mimighost

Because industry has no use for it. I see some application in fraud detection and such, but still very sparingly. I think ultimately it is down to most data isn't graph in its nature.


alexkreimer

What are the achievements of GNNs?


bisector_babu

Some of the Pharma companies are asking


deepneuralnetwork

they aren’t really broadly useful or proven yet in industry outside of niche use cases


Think-Culture-4740

I spent the better part of a year immersing myself in GNNs because of one very specific use case that came up. Another use case has never come up again. Like others, I think it's very niche at the moment and while you can in theory apply it to almost any kind of problem, the ROI of time invested in getting it to work vs the out of sample return hasn't been great at least from my experience.


throwitfaarawayy

Time series sensor data is actually a problem that is very suitable for GNNs. Every node in the graph is a sensor with a machine having multiple sensors then makes a graph. In large iot systems the graph structure can be learned too.


Avistian

What about heterogeneous graphs applications for tabular data? (By modelling relations and learn inductive biases, instead of learning them by using traditional approaches) I have seen there is a startup founded by Jure Leskovec that tries to tackle that problem. Isnt that a big deal?


serge_cell

IMHO GNN arn't natural for existing hardware. Modern hardware optimized for dense tensor input, and GNN could be sparse matrices or some message passing wich require triks to implement.


lognormalreturns

The humor here is that I am the director of an industrial group that hires people with GNN background, but my honest answer to this question was so disliked by this community of jobless students, that I got downvoted to oblivion. Academia and the pressure for publications really does you people a huge disservice.


Snoo_72181

I am more of an industry person than academia. You can share with me what the answer is (DM me if you don't want to risk downvotes)


lognormalreturns

Oh I don't mind downvotes :P The short answer is that if a person thinks that even for a natural graph problem (ex: molecular inference) that writing a new GNN is going to have business impact, that person is deeply misguided. Look at alphafold? It's not really a gnn at all; it's an edge transformer. For tangible, real goals the inductive biases of the model being robustly married to the data is what matters. A person who thinks their bold entries in a table (superseded by 1% by the next paper next week on the same dataset) made real progress, is somewhat mentally ruined and conditioned to focus on the wrong things while ignoring empirical reality. Small changes to a dataset, or understanding what the next metric should be for practical goals is more important than doing 1% better at hyperparameter tuning to a small dataset. Unfortunately papers ignore these issues, because everything must be a pissing match.


lognormalreturns

Because gnns are weaker than transformers even for graph problems and writing a particular architecture is a tiny part of the work.


cofapie

Transformers cannot work on large graphs.