Understanding the technique used to pass information between variables and into methods can be a difficult task for a Java developer, especially those accustomed to a much more verbose programming language, such as C or C++. In these expressive languages, the developer is solely responsible for determining the technique used to pass information between different parts of the system. For example, C++ allows a developer to explicitly pass a piece of data either by value, by reference, or by pointer. The compiler simply ensures that the selected technique is properly implemented and that no invalid operation is performed.
In the case of Java, these low-level details are abstracted, which both reduces the onus on the developer to select a proper means of passing data and increases the security of the language (by inhibiting the manipulation of pointers and directly addressing memory). In addition, though, this level of abstraction hides the details of the technique performed, which can obfuscate a developer's understanding of how data is passed in a program. In this article, we will examine the various techniques used to pass data and deep-dive into the technique that the Java Virtual Machine (JVM) and the Java Programming Language use to pass data, as well as explore some examples that demonstrate in practical terms what these techniques mean for a developer.
In general, there are two main techniques for passing data in a programming language: (1) passing by value and (2) passing by reference. While some languages consider passing by reference and passing by pointer two different techniques, in theory, one technique can be thought of as a specialization of the other, where a reference is simply an alias to an object, whose implementation is a pointer.
Passing by Value
The first technique, passing by value, is defined as follows:
Passing by value constitutes copying of data, where changes to the copied value are not reflected in the original value
For example, if we call a method that accepts a single integer argument and the method makes an assignment to this argument, the assignment is not preserved once the method returns. While one might expect that the assignment is preserved after the method returns, the assignment is lost because the value placed on the call stack was a copy of the value passed into the method, as illustrated in the snippet below:
If we execute this code, we obtain the following output:
We see that the change made to the argument passed into the function was not preserved after we exited the scope of the function. This loss of data was due to the fact that a copy of the value held by the variable was placed on the call stack prior to the execution of the function. Once the function exited, this copy was popped from the call stack and the changes made to it were lost, as illustrated in the figure below:
Additionally, the action of popping the call stack at the completion of the method is illustrated in the figure below. Note that the value copied as the argument to the method is lost (reclaimed) once the call stack is popped, and therefore, all changes made to that value are in turn lost during the reclamation step.
Passing by Reference
The alternative to passing by value is passing by reference, which is defined as follows:
Passing by reference consitutes the aliasing of data, where changes to the aliased value are reflected in the original value
Unlike passing by value, passing by reference ensures that a change made to a value in a different scope is preserved when that scope is exited. For example, if we pass a single argument into a method by reference, and the method makes an assignment to that value within its body, the assignment is preserved when the method exits. This can be demonstrated using the following snippet of C++ code:
If we run this code, we obtain the following output:
In this example, we can see that when we exit the function, the assignment we made to our argument that was passed by reference was preserved outside of the scope of the function. In the case of C++, we can see that under-the-hood, the compiler has passed a pointer into the function that points to the variable. Thus, when this pointer is dereferenced (as happens during reassignment), we are making a change to the exact location in memory that stores the variable. This principle is demonstrated in the illustrations below:
Passing Data in Java
Unlike in C++, Java does not have a means of explicitly differentiating between pass by reference and pass by value. Instead, the Java Language Specification (Section 4.3) declares that the passing of all data, both object and primitive data, is defined by the following rule:
All data is passed by value
Although this rule may be simple on the surface, it requires some further explanation. In the case of primitive values, the value is simply the actual data associated with the primitive (.e.g , , , etc.) and the value of the data is copied each time it is passed. For example, if we define an expression such as , the variable holds the literal value of . In the case of objects in Java, a more expanded rule is used:
The value associated with an object is actually a pointer, called a reference, to the object in memory
For example, if we define an expression such as , the variable does not hold the object created, but rather, a pointer value to the created object. The value of this pointer to the object (what the Java specification calls an object reference, or simply reference) is copied each time it is passed. According to the Objects section (Section 4.3.1) of the Java Language Specification, only the following can be performed on an object reference:
- Field access, using either a qualified name or a field access expression
- Method invocation
- The cast operator
- The string concatenation operator , which, when given a operand and a reference, will convert the reference to a by invoking the method of the referenced object (using if either the reference or the result of is a null reference), and then will produce a newly created that is the concatenation of the two strings
- The operator
- The reference equality operators and
- The conditional operator
In practice, this means that we can change the fields of the object passed into a method and invoke its methods, but we cannot change the object that the reference points to. Since the pointer is passed into the method by value, the original pointer is copied to the call stack when the method is invoked. When the method scope is exited, the copied pointer is lost, thus losing the change to the pointer value.
Although the pointer is lost, the changes to the fields are preserved because we are dereferencing the pointer to access the pointed-to object: The pointer passed into the method and the pointer copied to the call stack are identical (although independent) and thus point to the same object. Thus, when the pointer is dereferenced, the same object at the same location in memory is accessed. Therefore, when we make a change to the dereferenced object, we are changing a shared object. This concept is illustrated in the figure below:
This should not be confused with passing by reference: If the pointer were passed by reference, the variable would be an alias to and changing the object that points to would also change the object that points to. In this case, though, a copied pointer is passed into the function, and thus, the change to the pointer value is lost once the call stack it popped.
While it is crucial to understand the concepts behind passing data in a programming language (Java in particular), many times it is difficult to solidify these theoretical ideas without concrete examples. In this section, we will cover four primary examples:
- Assigning primitive values to a variable
- Passing primitive values to a method
- Assigning object values to a variable
- Passing object values to a method
For each of these examples, we will explore a snippet of code accompanied by print statements that show the value of the primitive or object at each major step in the assignment or the argument-passing process.
Primitive Type Example
Since Java primitives are not objects, primitives and objects are treated as two separate cases with respect to data binding (assignment) and argument-passing. In this section, we will focus on binding primitive data to a variable and passing primitive data to a simple method.
Assigning Values to Variable
If we assign an existing primitive value, such as , to a new variable, , the primitive value is copied to the new variable. Since the value is copied, the two variables are not aliases of one another, and therefore, when the original variable, , is changed, the change is not reflected in :
If we execute this snippet, we receive the following output:
Passing Values to Method
Similar to making primitive assignments, the arguments for a method are bound by value, and thus, if a change is made to the argument within the scope of the method, the changes are not preserved when the method scope is exited:
If we run this code, we see that the original value of is preserved when the scope of the process method is exited, even though that argument was assigned a value of within the method scope:
Object Type Example
While all values, both primitive and object, are passed by value in Java, there are some nuances in passing objects by value that are made explicit when seen in an example. Just as with primitive types, we will explore both assignment and argument binding the following examples.
Assigning Values to Variable
The variable binding semantics of for objects and primitives are nearly identical, but instead of binding a copy of the primitive value, we bind a copy of the object address. We can see this in action in the following snippet:
In this example, we expect that assigning a new object to (after assigning to ) does not change the value of , since holds a copy of the address for the original . When the address stored at changes, no change is made to because the copied value in is completely independent of the address value stored in . If we execute this code, we see our expected results (note that the address of each object will vary between executions, but the address in line 1 and line 3 should be identical, regardless of the specific address value):
Passing Values to Methods
The last case we must cover is that of passing an object into a method. In this case, we see that we are able to change the fields associated with the passed in object, but if we try to reassign a value to the argument itself, this reassignment is lost when the method scope is exited.
If we execute this code, we see the following output:
Although there is a large volume of output, if we take each line one at a time, we see that when we make a change to the fields of a object passed into a method, the field changes are preserved, but when we try to reassign a new object to the argument, the change is not preserved once we leave the scope of the method.
In the former case, the address of the created outside the method is copied to the argument of the method, and thus both point to the same object. If this pointer is dereferenced (which occurs when the fields of the object are accessed or changed), the same object is changed. In the latter case, when we try to reassign the argument with a new address, the change is lost because the argument is only a copy of the address of the original object, and thus, once the method scope is exited, the copy is lost.
A secondary principle can be formed from this mechanism in Java: Do not reassign arguments passed into a method (codified by Martin Fowler in the refactoring Remove Assignments to Parameters). To ensure that no such reassignment of method arguments is made, the arguments can be marked as final in the method signature. Note that a new local variable can be used instead of the arguments if reassigned is required:
Although fundamental principles such as data binding schemes and data passing schemes can seem abstract in the realm of daily programming, these concepts are essential in avoiding subtle mistakes. Unlike other programming languages (such as C++), Java simplifies its data binding and passing scheme into a single rule: Data is always passed by value. Although this rule can be a harsh restriction, its simplicity, and understanding how to apply this simplicity, can be a major asset when accomplishing a slew of daily tasks.
When a variable is used as argument to a method, it's content is always copied. (Java has onlycall-by-value.) What's important to understand here, is that you can only refer to objects through references. So what actually happens when you pass a variable referring to an object, is that you pass the reference to the object (by value!).
Someone may tell you "primitives are passed by value" and "non primitives are passed by reference", but that is merely because a variable can never contain an object to begin with, only a reference to an object. When this someone understands this, he will agree that even variables referring to objects are passed by value.
From Is Java "pass-by-reference" or "pass-by-value"?
Java is always pass-by-value. The difficult thing can be to understand that Java passes objects as references passed by value.
Java does manipulate objects by reference, and all object variables are references. However, Java doesn't pass method arguments by reference; it passes them by value.
In Java, there is no counter part to the C++ "reference type" for primitives.