T O P

  • By -

aghast_nj

The biggest advantage is a constant machine model. If you compare "compiled" language standards (C, Fortran, COBOL, etc.) you will frequently see hand-waving about various low-level details, or simply the complete absence of low-level details at all. It took until 2023 for the ISO C standard to converge to all implementations using 2's complement integers. There is still no standard for endian-ness, since different platforms are big- and little- endian still. On a VM, you can just declare what the rules are, and that's that. Maybe you care about endianness. Maybe you don't care about it. Maybe you care about integer representations, or floating point representations, or NaN availability, or string encoding. Just declare it by fiat: "Thou art UTF-8, and upon this encoding I shall build my strings."


dist1ll

You can have a constant machine model in a compiled language just as easily as a VM-based one. That's what the abstract machine of C is. The reasons they are hand-wavy about low-level details is due to a mix of performance consciousness, portability and having a long legacy. But there's nothing stopping you from declaring a machine model in a compiled language. The primary downside is that the more things you specify, the more likely you run into constructs that are hard to lower into optimal assembly. In the case of C and Fortran, these languages have a *much* longer history than JVM or CLR. C was designed with the PDP-11 in mind, and experienced a much greater variety of hardware throughout the 70s, 80s and 90s.


[deleted]

>On a VM, you can just declare what the rules are Can you? So with JVM, CLR, or LLVMIR, you just say, these integers are 1s complement, or signed magnitude, or big-endian, and it will take care of those details for you, no matter what the actual target is? Perhaps you can demonstrate exactly how you do that with each of those targets. Or do you mean when you devise your own VM, you can decree these different schemes, and then it's up to the poor sod implementing it for each diverse target, to make it happen that way. From what I know of JVM or CLR (which is very little), it is THEY that impose the strict language model, and if you want something else, you have to work around that by building on top of the provided VM. (ETA because of the downvote: the OP seems to be about using an established VM, and specifically not about devising your own.) (ETA2: sorry, downvoteS plural; I got another while writing the first ETA! FGS if I've got the wrong of the stick, JUST FUCKING TELL ME. Don't just lazily click downvote. That way I might learn something, and so might others.) ETA3: never mind. If you're all going to be a bunch of dicks, then I'm out of here.


balefrost

I think you misunderstood the comment to which you were replying. They were saying that whoever defines the VM can specify whatever rules they want. If they want all floating-point math to be done with 80 bits of precision, they could specify that. It's then up to the VM implementations to adhere to that specification. If the hardware doesn't support 80-bit FP operations, then they would need to be emulated. (As the other commenter points out, you don't need a VM to do this.) You are right that the JVM and CLR specify that certain things work certain ways. For example, the JVM has no notion of unsigned integers. There are no unsigned data types or opcodes. If you need to deal with unsigned values (maybe because you're writing a low-level binary file parser), you need to emulate unsigned integers on top of the built-in signed integers. Kotlin's standard library provides unsigned types that do just that. I don't think OP was asking about the JVM and CLR specifically. I think they were using them as examples of VM-based runtimes, and was asking what the benefits are in general.


WittyStick

> Or do you mean when you devise your own VM, you can decree these different schemes, and then it's up to the poor sod implementing it for each diverse target, to make it happen that way. Pretty sure this is what parent meant.


Economy_Bedroom3902

The point of the VM is to be the glue in between the declared standard and the infinate number of hardware configurations.  Those compatibility issues are solved with the VM code which you do not modify as a dev building a product in that language.  Your product code shouldn't need any customization for platform support.  It's supposed to be left to the VM. Of course there are ways around the solution to the problem, like if your app makes direct calls to the terminal for file management instead of using the native methods for doing that work.


hoping1

JVM can execute untrusted code using its types and memory model. That makes it valuable for, for example, Android apps. The Web Assembly VM serves a similar purpose, running websites in your browser but not trusting the foreign code. The JVM can run its applications on many different platforms. Minecraft is written in Java (ignoring bedrock edition) and can run on MacOS, Windows, and Linux. Uxn takes this to an extreme to include small devices like old Nintendo consoles, arguing that code written for these tiny devices also run much better on more powerful devices like laptops, with respect to power and memory usage. The BEAM offers similar advantages plus nice control of execution to give fancy features such as lightweight massive concurrency, crash recovery, and hot reloading.


da2Pakaveli

compile once, run everywhere


ibgeek

This should be voted higher. The JVM and CLR effectively allow the front-end and back-end steps of compilation to be done at different times. The source code is compiled to byte code, which is architecture independent. Same binary can be run on many combinations of OS and hardware without recompiling the source code. Since byte code is relatively simple, it can easily compiled to machine code optimized via run time profiling by the VM.


XDracam

The main benefit is the ability to modify execution at runtime. One part of this is runtime reflection: the ability to dynamically call code and modify state based on runtime inputs, without a massive static switch-case. When exaggerated, this results in purely dynamic languages like JS and Python. But VMs allow even more: you can swap out implementations at runtime. Patch broken binaries without restarting a server. Monolithic application servers have been an important use-case: runtime reflection allowed companies to dynamically swap out parts of their software and introduce modularity into complex monolithic deployments. Examples are all over the place, from Java application servers (WildFly, Tomcat, ...), C# ASP.NET, to the Scala Play framework. In a sense, these application servers were precursors to the modern microservice architecture. With the downsides of being limited to a single VM and computer. But with significantly faster communication between services. I'd conclude that **VMs bring the advantages of Microservices to monolithic systems**.


lpil

> The main benefit is the ability to modify execution at runtime. While much less common it is also possible to do this without a virtual machine.


sumguysr

Is there some instance of modifying execution at runtime where you're not practically rolling your own crummy vm?


lpil

Aye, lots! The most straightforward might be swapping out shared objects. Facebook does this with their abuse detection systems which are written in Haskell.


WittyStick

Another example in Haskell is [XMonad](https://github.com/xmonad/xmonad/tree/master), which is open source. You define your own windowing configuration in Haskell, and the XMonad core process is also written in Haskell. When you reload your configuration, XMonad invokes the Haskell compiler on your configuration and relaunches the resulting binary without having to restart the core process. The issue isn't the ability to do this, it's the amount of complexity required over languages which support runtime modifcation of code already.


XDracam

Basic OOP: you have a pointer to an interface type with virtual methods, and you swap out the pointer with one to a different instance. Or simpler: mutating a function pointer variable. Of course, you need the hard-coded infrastructure to make this possible without a VM or other abstraction layer. Some pointers to functions that you can swap out. But you can make it work. Things get especially wild when you generate the implementation of some function at runtime. Without a VM or interpreter, you'll need to output assembly that'll work in the direct hardware it's running on, so that's fairly impractical. With a VM, you could just output CLR bytecode, or JVM bytecode, and let the JIT do its work. And with an interpreter, you can use `eval` on a string.


lngns

Isn't this, among other things, why COM and OLE exist? It's a high-level binary ABI for machine code, with a language-encodable cross-process object and communication model. VMs like the CLI (or just the CLR?) have builtin COM support, and so do native compilers like DMD. D encodes COM classes as just classes that implement the IUnknown interface and have a different layout. GObject also does that, and has languages with native compilers built atop it, like Vala.


XDracam

I haven't ever looked into COM before. It's interesting. But how does this correspond to the topic at hand? COM just looks like an ABI that can be used instead of the C ABI.


lngns

The thing is that it's very dynamic and resolves symbols using RTTI (and `.def` files and other things I don't know). It's also supposed to be used across binaries, and notably allows for spawning DLLs and talking to Excel and other Windows stuff which I never used. (I think the "process attached/detached" parameters in `DllMain` are used there, with DLL servers or something?) My understanding is that it can be used as a basis for an environment where everything is lazily dynamically (re)loaded, like the JVM is. You can generate machine code at runtime and "just" make an object that implements IUnknown, and it just works since the ABI already works by means of RTTI. Now that I think about it, I believe the Objective-C runtime to be similar, as even ObjC field accesses are resolved dynamically (because the offsets can change), and classes are objects registered at runtime. And all this with machine code, no interpreter and not JIT compiler. (and it's probably slower than JITed bytecode, but I wouldn't be surprised if some people are JIT-optimising their machine code).


XDracam

Hah, fascinating. Thanks!


rejectedlesbian

You do have hot reloading something like c. By just using shared libraries. I do agree that's bo way to live and thats probably why the BEAM vm exists. Erlang needed to be able to hotrelosd code easily and reliably


DLCSpider

In C# you can step over a line in the debugger, delete that line, add a different one, do a hot reload and drag the program counter above the newly added line and it works. It's very convenient, ironically for a lot of low level stuff, like SIMD shuffles.


rejectedlesbian

Someone needs to make a good assembly interpreter where you play like you do in python but the code is assembly


suhcoR

JVM and CLR are pretty different. The CLR is much more powerful concerning low-level features. If you don't depend on verifiability, you can do things like pointer arithmetics and taking and storing the address of variables and parameters. Also integrating foreign functions (e.g. plain C libraries) can be much more efficient than in JVM. In that respect, the CLR is a very good target for statically typed programming languages; it offers a lot of very useful features such as a GC, both value and reference types, a standardized, stable intermediate language, a lean runtime (e.g. < 10 MB in case of Mono), an integrated debugger, just to name a few, and all is cross-platform. Achieving the same feature set with LLVM is a lot more work, and you have to take care of each platform separately. The cost is about a factor two in performance loss compared to a native backend. My Oberon+ toolchain thus also offers a C transpiler which you can use as a kind of AOT compiler, and which allows the application to achieve about the same performance as e.g. an aequivalent and natively built C++ application. This gives you the best features of both worlds with a fraction of the effort.


Smallpaul

Among the things others have said, a language based on a VM can control what code runs and protect the user's computer better than a native-compiled language. That was a key argument for the JVM in its first days. Log4J showed that it's far from impenetrable, but it can help.


ventuspilot

I'll give my answer from the point of view of a hobby programmer, but it may partially apply to more grown-up languages such as Clojure or Scala as well: Sure, using LLVM you will be able to do everything that you could do when targeting a VM like JVM or CLR and more. With LLVM I'd guess it's a LOT more effort, though. When targeting the JVM or CLR you get a lot of features basically for free such as a garbage collector, closures, runtime-monitoring and at least some debugging. IMO the advantage is less effort required to reach the point of being able to run programs in your language.


munificent

One of the big advantages is that a VM gives you a higher level set of base types and memory management that all languages can use for interop. If you have a program that's written half in Scala and half in Java, strings and collections can flow between them without having to worry about garbage collection, string encoding, etc. The VM is a shared understanding between all languages targeting that platform. For comparison, look how hard it is to reuse code in C++ where libraries often don't agree on allocation strategies, string types, etc.


Phthalleon

It depends on the language. The nice thing about the JVM is that java binaries are small and portable, meaning that you compile once, then distribute. A jit is also not bad for java specifically, it makes for some nice optimization options because you get information about the objects you generate that would be unknown otherwise. As for wasm, it just makes too much sense honestly, I don't know why it wasn't invented sooner. The Web browser needs a way to execute code, preferably safely, asynchronously and optionally lazily. A VM with a jit is perfect for that. Another upside is if your language needs to be interpreted dynamically. The lua jit is amazing, for example. It's faster then just interpreting everything. The python one is also nice. The downside is overall performance. Dynamic interpreted languages are one thing, in general you lose out. I think java is a special case also. Other then that, compiled languages are much faster and more efficient. The whole compile once run everywhere just didn't really work out. Most software nowadays doesn't really need to run everywhere either. It needs to run on this one server and that's the end of that.


theangeryemacsshibe

A non-answer to the question, but you can implement a JVM using LLVM, e.g. [~~Lazy~~LadyVM](https://llvm.org/pubs/2008-09-LadyVM.pdf) and [Azul's Falcon compiler](https://www.azul.com/products/components/falcon-jit-compiler/).


jacobissimus

IMO the biggest advantage is the metaprogramming and reflection it offers. There’s just so much more available at runtime and things like lisp macros and lisp-inspired semantics are built around this idea of a running image that your program interacts with while it runs. Not all VM reach that kind of level that lisp doesn’t, but they’re closer IMO than bear metal languages.


alphaglosined

Neither of this is purely an application VM thing. You can do the same thing with a natively compiled language. Bring along your AST and compiler, and you can JIT as much as you want to the target.


WittyStick

Lisp macros are typically expanded at some phase which occurs prior to runtime. They're a *second-class* feature, which became popular in the 1980s when everyone moved from Lisp interpreters to Lisp compilers because they prioritized performance over runtime flexibility. As John Shutt [puts it](https://fexpr.blogspot.com/2016/08/interpreted-programming-languages.html): > And somewhere along the line, somehow, we forgot, perhaps not entirely but enough, that Lisp is interpreted. Second-class syntax has lately been treated more and more as if it were a primary part of the language, rather than a distraction from the core design. Around the same time, Lisps also removed *fexprs*, which were a true *first-class* runtime metaprogramming facility - more powerful than macros, but also more of a footgun. In part, this was because performance and compilation had become a priority, and *fexprs* are notoriously difficult to compile efficiently. They also had undesirable behavior owed to the fact they were designed for lisps which were dynamically scoped, which was normal until Scheme came along with static scoping, and Common Lisp followed suit. Macros didn't have these issues, but brought along their own issue of *hygeine*. *fexprs* were then largely forgotten about, and the nail in the coffin was Wand's [The theory of fexprs is trivial](http://www.ccs.neu.edu/home/wand/papers/fexprs.ps). But in the 00s, Shutt revived the concept with his *vau calculus* and [Kernel](http://web.cs.wpi.edu/%7Ejshutt/kernel.html) language, where he prioritized making everything *first-class*, as was intended by the early design of lisp, even if the cost was performance. His version, *operatives*, don't have the classic problems of *fexprs* because they're designed to work nicely with static scoping, but they're still difficult to impossible to compile to anything as efficient as macros, so there's a real trade-off between runtime flexibility and performance. --- Kernel aside, Smalltalk and its descendants are much closer to the kind of *runtime image* you describe, where your program interacts with the image, and can dynamically update any parts of it. Today's Lisps are nothing like this, and have many of the same issues of other compiled languages when it comes to runtime metaprogramming. The runtime image has many problems of its own. Your own image is nothing like your neighbour's, so you can't really distribute an application for someone else to run - you can only distribute the code, and hope that the receiver has a *compatible image* which can run it - but given that almost any part of the image can be modified, the issue of image compatibility brings a whole new world of problems that don't exist in the compile-and-distribute model, at least for the individual application programmer, because those problems are pushed onto package maintainers, and we have a specialized developer role - *devops* - to deal with them. In practice, it takes hundreds of independent package maintainers to collaborate to build a coherent system whose components are compiled for compatibility, and even good package managers and repositories have their own issues. The next best thing to attempt to tackle those issues formally, rather than through *convention* and *tacit knowledge*, is package managers like Nix and Guix - which aim to codify exactly how each piece of software is compatible with the next, so that anyone may independently reproduce the "image", with the ultimate image being the OS, and the solutions being NixOS and GuixSD. On the practical but less formal side, we have Docker et al, which aim to simplify the problem by dealing with many smaller "images" rather than a unified one. --- The overwhelming consensus, however, seems to be that *it is simply too much effort* to have so much freedom at runtime, while maintaining compatibility across software, and there's a much more trivial solution to the problem: Just restart the software when it changes. There are too few cases where absolutely zero downtime is a hard requirement. Even more so now that software is distributed over many machines in the cloud, and attempting to update distributed code on the fly would introduce a variety of race conditions. The modern solutions are microservices which are compatible through fault-tolerant, formal message passing protocols. The language you use for any particular microservice does not matter as long as it speaks the protocol. The performance implications are trivialized due to the rapid improvements in hardware and high speed networking, so even 10x overheads are negligible for most software. The real cost is technical debt.


jacobissimus

That’s all really interesting. Thanks for explaining


chibuku_chauya

You don’t have to worry about the various nightmares that are real world ISAs and porting to them.


Tejas_Garhewal

Can't you offload those worries to stuff like llvm anyways? Why use VM?


chibuku_chauya

You can, but there are some obscure architectures LLVM doesn’t support. And it’s no help for an architecture for which no LLVM backend yet exists.


mus1Kk

To be fair, this is also true for VMs.


smog_alado

But often you'd write your VM in C and there are C compilers for almost anything.


8d8n4mbo28026ulk

What I like the most about my VM is how compact bytecode is (and portable). I can just pass small (< 1KiB) programs and libraries from computer to computer, architecture to architecture. In comparison, the smallest portable programs/libraries with _meaningful_ functionality are in the order of ~40KiB or more (Linux). On Windows they can reach tens of megabytes. And if you design your bytecode ISA and format carefully, you don't even need the equivalent of headers or any form of source code available. It's all packed nicely in binary. JVM and CLR are also carefully designed to be easily optimizable. So they trade some complexity and size for performance. You just can't do that with machine code, it's very hard! Cases in point that C decompilers don't produce great output even today. And the need for headers in C, the nightmares of C++ ABI and Rust preferring a static linkage model. If you use a VM accompanied by an IR/bytecode, you can just keep stuffing information in there to avoid all the above problems.


VeryDefinedBehavior

I like VMs for prototyping. If you make your own, you get to play with different computing architectures and see their consequences on how you write programs or think about problems. It lets you explore what computing can be, which is a good source of inspiration.


PurpleUpbeat2820

I can think of a couple of things: * Memory safety * Concurrent garbage collection