While out for a walk today, and while listening to lectures about economics, my thoughts wandered to the structure of the Java language. (They do that. I don’t always know why…) From writing assembly for various processors and also from writing Pascal, C, and C++ on machines with limited resources I was used to thinking about the various areas of memory for the operating system and for code, fixed data, the stack, and the heap. For some reason it occurred to me that since variables in Java are all declared within functions there might not be a need for a standard data segment. Upon Googling “Java Memory Model” it appears that this is the case. It also appears that the memory model includes some additional complexities I wasn’t used to, which could not help but be interesting (and also necessary to know). I’ve also spent some time thinking about the internal architecture of dedicated discrete-event simulation tools (continuous simulation programs are actually easier to implement from a framework standpoint) but that is a discussion for a different day.
Before proceeding it’s important to consider that the Java Memory Model itself has evolved over the years. Single-threaded Java programs didn’t present much of a problem and could be analyzed in much the same way as programs I was used to dealing with. The main complexity arises when handling multi-threaded operations. I had long worked with real-time systems but there was only so deep I ever had to go. I had to ensure that memory areas were locked when individual processes wrote to or read from them but otherwise let the operating system worry about making the many single-threaded processes play nice. As long as the individual programs were not too large there was never a problem. The most complicated multi-threaded program I designed was one that separated the a periodic communication process from the UI for that program (process), so if the user went crazy manipulating a slider or doing something else unusual there would be no interference with the part of the program that was doing the work. Java is a more recent development that encourages the use of multi-threading, so naturally it pays to understand how the model works in some detail.
Upon reading further it appears that the internal architecture of the JVM is intended to handle the fact that Java straddles the idea of being compiled and interpreted. .java files containing human-readable source code are translated into .class files containing byte code, which is the “compiled” part of the process. The byte codes are then “interpreted” by the JVM on each machine/OS, which presumably allows for a high degree of portability. By contrast, languages like C++ are purely compiled while Javascript, Perl, and Python are purely interpreted.
I also learned that the current usage of the term “memory model” has a very specific meaning related to how memory and operations are managed in multi-threaded systems. It turns out that the C++ memory model was worked out and adopted after this was done for Java. What I had traditionally thought of as a memory model is described in a Wikipedia entry titled “Memory Address.”
This highly-rated article provides some insight into diagnosing various types of memory errors thrown by the Java Virtual Machine (JVM), and discusses the behavior of different JVMs. The article, and many other sources, describe six possible areas of memory referenced by Java:
- Program Counter Register: This area merely stores the memory location of the instruction currently being executed by each thread, unless that thread is executing a native method, in which case this information is stored elsewhere.
- Java Virtual Machine Stack: This area stores the current working stack for each thread.
- Heap: This area stores all of the objects instantiated by all threads.
- Native Method Stack: Native code is code written in a different language (say, C++) for various reasons and such processes will have their own stack allocated for them. (A dedicated program counter will presumably be maintained as well.
- Method Area: Information about methods and their associated data elements are stored in this area. All threads share a single method area for each JVM instance. (Q: Can there be multiple JVM instances on one machine? A: each program or process gets its own JVM instance; threads within a program share the same JVM instance.)
- Runtime Constant Pool: As a JVM instance loads each type definition it stores the related descriptive elements in this memory area. The Runtime Constant Pool is itself allocated within the Method Area.
Links for future reference:
Free online chapters from “Inside the Java Virtual Machine” by Bill Venners
As clarified at Does java -Xmx 1G mean 1 GB or 2^30 B? , the unambiguous way to express how much memory you start with via
??