Description

5/5 - (1 vote)

ECE/CS 5510 Multiprocessor Programming Homework 1

Part I: Problems [50 points]:

Learning objective: Reinforcing understanding of the problem of mutual exclusion, its definition, concurrent reasoning and coordination, Amdahl’s Law. Solve the following problems [Problems 1 to 5 = 8 points, Problem 6 = 10 points.]: 1. For each of the following, state whether it is a safety property or a liveness property.

Identify the bad or good thing of interest. a. Two processes with critical sections c0 and c1 may not be in their critical sections at the same time. b. Clients may not retain resources forever. c. It is always cloudy before it rains. d. Only one person can sit on the driver’s seat in a car. e. A red traffic light (stop) will turn green (proceed) sometime in future. f. Cars entering a roundabout yield to cars already in the roundabout. g. A car in a roundabout eventually leaves the roundabout. h. At a four-way intersection, no more than one direction/side should be green at any given time

2. A programmer has two computer systems available to run his/her application. The first system has a single-core processor that executes N*X instructions per second. The second system has 2*N-core processor where each core executes X instructions per second. Using Amdahl’s law, find the lower bound on application parallelism, so that it is advantageous to run the application on the second system from a performance standpoint.

3. An application takes T’ seconds to execute on an N-core processor. When the sequential part of the application is optimized and made 10x faster, the program takes T’/4 seconds to execute on an N-core processor. What portion of the overall execution time (i.e., equivalent execution on a single-core processor) did the sequential part account for before the optimization? (as a function of N). Also find the lower bound on the number of cores N for which this scenario holds true.

4. An application has a last level cache miss rate of 50%; i.e., 50% of the executed memory instructions go to the DRAM to fetch data. Assume that the execution of 30% of such instructions is stalled because of the refresh mechanism in DRAM. What is the expected performance benefit for this application if the penalty incurred by DRAM refresh is fully eliminated?

5. In the following code snippet, can the inner loop be parallelized? Can the outer loop be parallelized? Give reason. Assume matrix a is already initialized and has appropriate dimension for the code snippet to work. for(i = 1; i < n; ++i) { for(j = 0; j < n; ++j) { a[j][i] += a[j][i+1] – a[j][i-1]; } }

6. There are 12 students taking an exam. Each student has their own examination room. Each student is individually led to a common room, only one at a time. During this time, each of the remaining 11 students is still in their respective examination room. The common room contains two table lamps, that can be set to an ON or OFF position.

The only way for the students to pass is by going into the room and stating that all other students have been in the room. If the statement is true, then all students will pass. If the statement is false, all students will fail. The only way for students to communicate state information is using the state of table lamps (on or off). That is, students can only use their own memory and the state of the table lamps to decide if all of the other students have been in the room.

Students can be taken to the room infinitely many times, and in no particular order. The important thing to note, is that the students know about the exam ahead of time and can coordinate a plan beforehand to pass. Suggest a plan that ensures all students succeed in the examination.

Part II: Programming assignment [50 points]

Learning objective: Getting started programming in Java, reinforcing Java concepts, getting used to reading the Java documentation, setting up IntelliJ IDE. You will write a (single-threaded) console-based inventory manager application, called Inventory.

You may need to use the following Java classes: Date, DateFormat, String, ArrayList, Scanner/BufferedReader/FileReader, Matcher, and others. Inventory will manage an in-memory database of items in a shop. It will operate upon this database by reading and executing commands from an input file named in.txt.

The program will also produce results, which will be written to a file named out.txt. For this program, it is sufficient to store the records in a simple in-memory Java collection in the order they are added (e.g., an ArrayList would be a good choice). Inventory must support the following commands. represents an argument passed to the command, e.g., 5, i7-8700K, 18/2/1995, “Intel Corp”.

Arguments will be given without angle brackets and will be separated by whitespace; arguments that contain whitespace must be double-quoted. For example, a company name that has a space. Arguments will not contain escaped whitespace (e.g. ‘\ ‘) or escaped quotes (e.g. \”). LOAD ● Loads the contents of database from the named file, replacing the existing contents.

● The file is in a comma-separated value (CSV) format, with one entry per line. ● Each entry matches the following format exactly. ,,, ● Name and Company are strings. They may not contain commas or double quotes. ● ReleaseDate is given as MM/DD/YYYY. ● Quantity is a number. ● At the end of the file, there is exactly one blank line (i.e., the text of all entries is followed by a new-line character \n).

● You can assume the CSV file is always given in the correct format. Example of an entry in the CSV file: i7-8700K,Intel Corp,05/02/2017,12 ● The following line should be written to the output file, where N is the number of records loaded: LOAD: OK STORE ● Stores the contents of the database in the named file.

Overwrites if the file already exists. ● The format must be exactly as described above. ● The following line should be written to the output file (not the csv file!), where N is the number of records stored: STORE: OK CLEAR ● Clears the contents of the database. ● The following line should be written to the output file: CLEAR: OK ADD ● Add the given entry to the database.

● Quantity is initialized to 0. An entry for an item must be added before BUY or SELL. ● Duplicate entries are not allowed, i.e., adding the same NAME/COMPANY combo twice results in an error. (This also applies if the items were loaded from a CSV file.) ● The following line should be written to the output file: ADD: OK STATUS ● Lists all added items.

● STATUS will write the following to the output file, where M is the number of items. ● STATUS: OK ,,, ,,, … ,,, BUY ● Buys an item. BUY searches for the record identified by Name Company and adds the specified quantity to the old quantity. ● Quantity should be >= 1. ● The following line should be written to the output file: BUY: OK SELL ● Sells an item. SELL searches for the record identified by Name Company and subtracts the specified quantity to the old quantity.

● should be >= 1. ● Cannot sell a quantity more than the quantity in the inventory. ● The following line should be written to the output file: SELL: OK QUAN GREATER QUAN FEWER QUAN BETWEEN ● These commands list all items that are greater than, fewer than or in between the specified quantities. ● Note: GREATER means >, not >=.

Similarly implement FEWER and BETWEEN by ignoring the equivalent case. ● The following will be written to the output file, where M is the number of items matching the request: QUAN: OK ,,, ,,, … ,,, SEARCH ● Search for Needle as a substring in either Name or Company. ● The following lines will be written to the output file.

The format for each record is the CSV format as described for LOAD. M is the number of matching records: SEARCH: OK ,,, ,,, … ,,, Error handling: In case of errors for an individual command, Inventory will write the following line instead of OK. The command will not be executed. : ERROR For example: ADD: ERROR DUPLICATE_ENTRY You must handle at least the following errors (error code shown first):

1. INVALID_DATE: Invalid dates by value or format, for example: 11/31/2011, 14/23/2008, 11-31-2011, 5/5, applesauce. 2. WRONG_ARGUMENT_COUNT: Too few or too many arguments were given to a command. For example: ADD Intel 11/11/2011 STATUS 5 10 15 BUY “AMD” 2

3. CANNOT_BUY_BEFORE_ADD: When attempting to buy an item before adding it. 4. CANNOT_SELL_BEFORE_ADD: When attempting to sell an item before adding it. 5. INVALID_QUANTITY: When the quantity for BUY/SELL command is 0 or negative or doesn’t match the format. For example: BUY “Ryzen 5 2600X” AMD -44 BUY “Ryzen 5 2600X” AMD 05/02/2017

6. CANNOT_SELL_QUANTITY_MORE_THAN_STOCK: When attempting to sell a greater quantity of an item than what’s available in the inventory. 7. FILE_NOT_FOUND: When LOAD command tries to load a .csv file that does not exist. 8. DUPLICATE_ENTRY: When an ADD command results in duplicate entries. Example, adding the same item twice.

ADD “Ryzen 7 2700” AMD 05/06/2018 ADD “Ryzen 7 2700” AMD 05/09/2018 (Note: ReleaseDate is ignored for this check.) 9. UNKNOWN_COMMAND: When the command is invalid. For example: PING kernel.org QUAN LESS 5 QUAN BTWEN 5 QUAN BTWEN 5 10 Example execution: in.txt out.txt ADD Intel 11/1/2011 ADD: ERROR WRONG_ARGUMENT_COUNT STATUS 5 10 15 STATUS: ERROR WRONG_ARGUMENT_COUNT BUY “Table Lamp” 2 BUY: ERROR WRONG_ARGUMENT_COUNT ADD “Ryzen 7 2700X” AMD 05/06/2018 ADD: OK “Ryzen 7 2700X” AMD ADD “Ryzen 7 2700” AMD 05/06/2018 ADD: OK “Ryzen 7 2700” AMD ADD “Ryzen 7 2700” AMD 05/06/2018 ADD: ERROR DUPLICATE_ENTRY ADD “Ryzen 5 2600X” AMD 05/04/2018 ADD: OK “Ryzen 5 2600X” AMD ADD “Ryzen 5 2400G” AMD 05/04/2018 ADD: OK “Ryzen 5 2400G” AMD ADD i7-8700K “Intel Corp” 05/02/2017 ADD: OK i7-8700K “Intel Corp” ADD i5-8600K “Intel Corp” 05/02/2017 ADD: OK i5-8600K “Intel Corp” ADD i3-8350K “Intel Corp” 05/02/2017 ADD: OK i3-8350K “Intel Corp”

STATUS STATUS: OK 7 Ryzen 7 2700X,AMD,05/06/2018,0 Ryzen 7 2700,AMD,05/06/2018,0 Ryzen 5 2600X,AMD,05/04/2018,0 Ryzen 5 2400G,AMD,05/04/2018,0 i7-8700K,Intel Corp,05/02/2017,0 i5-8600K,Intel Corp,05/02/2017,0 i3-8350K,Intel Corp,05/02/2017,0 BUY “Ryzen 5 2600X” AMD -44 BUY: ERROR INVALID_QUANTITY BUY “Ryzen 5 2600X” AMD 05/02/2017 BUY: ERROR INVALID_QUANTITY BUY “Ryzen 5 2600X” AMD 55 BUY: OK “Ryzen 5 2600X” AMD 55 BUY “Ryzen 5 2600X” AMD 22 BUY: OK “Ryzen 5 2600X” AMD 77 SELL “Ryzen 5 2600X” AMD 22 SELL: OK “Ryzen 5 2600X” AMD 55 SELL “Ryzen 5 2600X”

AMD 56 SELL: ERROR CANNOT_SELL_QUANTITY_MORE_THAN_STOCK SELL “Ryzen 5 2600X” “AMD” 55 SELL: OK “Ryzen 5 2600X” AMD 0 BUY i7-8700K “Intel Corp” 12 BUY: OK i7-8700K “Intel Corp” 12 BUY i5-8600K “Intel Corp” 17 BUY: OK i5-8600K “Intel Corp” 17 BUY i3-8350K “Intel Corp” 24 BUY: OK i3-8350K

“Intel Corp” 24 STATUS STATUS: OK 7 Ryzen 7 2700X,AMD,05/06/2018,0 Ryzen 7 2700,AMD,05/06/2018,0 Ryzen 5 2600X,AMD,05/04/2018,0 Ryzen 5 2400G,AMD,05/04/2018,0 i7-8700K,Intel Corp,05/02/2017,12 i5-8600K,Intel Corp,05/02/2017,17 i3-8350K,Intel Corp,05/02/2017,24 QUAN GREATER 5 QUAN: OK 3 i7-8700K,Intel Corp,05/02/2017,12 i5-8600K,Intel Corp,05/02/2017,17 i3-8350K,Intel Corp,05/02/2017,24 QUAN FEWER 15 QUAN: OK 5 Ryzen 7 2700X,AMD,05/06/2018,0 Ryzen 7 2700,AMD,05/06/2018,0 Ryzen 5 2600X,AMD,05/04/2018,0 Ryzen 5 2400G,AMD,05/04/2018,0 i7-8700K,Intel Corp,05/02/2017,12 QUAN BETWEEN 10 18 QUAN: OK 2 i7-8700K,Intel Corp,05/02/2017,12 i5-8600K,Intel Corp,05/02/2017,17 STORE out.csv STORE: OK 7 out.csv contents in the above example: Ryzen 7 2700X,AMD,05/06/2018,0 Ryzen 7 2700,AMD,05/06/2018,0 Ryzen 5 2600X,AMD,05/04/2018,0 Ryzen 5 2400G,AMD,05/04/2018,0 i7-8700K,Intel Corp,05/02/2017,12 i5-8600K,Intel Corp,05/02/2017,17 i3-8350K,Intel Corp,05/02/2017,24

Submission instructions: Your Java files should be all grouped in a single root folder called hw1. Make sure to give your program a package name (which must start with hw1). For example, your files may be organized as follows: hw1/ hw1/Inventory.java package hw1; hw1/Database.java package hw1; hw1/util/Errors.java package hw1.util; hw1/util/CSV.java package hw1.util; Your code should be compilable and executable on a GNU/Linux Ubuntu system with Java 1.8 as follows (according to https://stackoverflow.com/questions/6623161/javac-option-to-compileall-java-files-under-a-given-directory-recursively/8769536#8769536 ): find -name “*.java” > sources.txt javac @sources.txt java hw1.Inventory Note that the java command must be run from the folder containing your source package(s); if using the default IntelliJ project layout, that would be the src folder, which is also where in.txt needs to be. (When running your program through IntelliJ, in.txt will need to be in the folder one level above; that is, the folder containing src/ and out/.

Note: IntelliJ license is free for students.) Zip up the PDF from Part I, together with the source code for Part II, and save it in a file called hw1_.zip. Submit the archive as Homework 1 on the Canvas page before the deadline.

ECE/CS 5510 Multiprocessor Programming Homework 2

Part I: Problems [50 points]:

Solve the following problems: 1. [10 points] Figure 2.9 (Chapter 2.6, page 32) of the book explains the Bakery lock algorithm. The line 15 of the algorithm is as follows: while((∃k != i)(flag[k] && (label[k],k) << (label[i],i))) {}; This line ensures that a thread spins until it determines that no thread with a raised flag has a lexicographically smaller label and id pair. a) What do you think will happen if instead of using a lexicographical order on the pairs of label and thread id, we change line 15 to a less than comparison only on the labels? while((∃k != i)(flag[k] && (label[k] < label[i]))) {}; b) What do you think will happen if instead of using a lexicographical order on the pairs of label and thread id, we change line 15 to a less than or equal to comparison only on the labels? while((∃k != i)(flag[k] && (label[k] ≤ label[i]))) {};

2. [10 points] Consider the Bakery Algorithm, in which both the safety (mutually exclusive) and liveness (guarantees progress and bounded-waiting) properties hold. The proof for these properties depends on the following assumption: any process will stay in the critical section only for a finite amount of time. Suppose a process crashes while inside its critical section (for example, the program does a division that causes an uncaught division by zero exception in that thread), while other processes are still running? a. Does this violate the safety property of the solution? b. Does this violate the liveness property? c. Does this violate the bounded waiting part of liveness? For each of the questions above, provide a very brief explanation for your answer.

3. [8 points] The following pseudocode describes a lock algorithm. nextTurn := 0 nowServing := 0 Lock() { myTurn := AtomicFetchAndIncrement(nextTurn) while(myTurn != nowServing) { //wait } } Unlock() { nowServing = nowServing + 1 } Describe the doorway and waiting sections in this lock algorithm. Is this lock starvationfree? Why/Why not?

4. [6 points] The Peterson’s mutual exclusion algorithm, which ensures no starvation, is presented in Figure 1. Consider the algorithms in Figures 2 and 3. Either prove or refute the following claims about the algorithms in Figures 2 and 3. a. Algorithm provides mutual exclusion. b. Algorithm provides deadlock freedom. c. Algorithm provides starvation freedom. Figure 1 Figure 2 Figure 3

5. [6 points] A solution to the mutual exclusion problem has been proposed in Figure 4. The line numbers are on the right. Does this solution satisfy the safety and liveness properties of the mutual exclusion problem? Give an informal proof or a violating adversary schedule for each of these properties. Figure 4

6. [10 points] The following algorithm (Fig. 5) is for two threads 1 and 2, and it makes use of two registers: x which can hold three values (0, 1, and 2); and y which can hold two values (0 and 1). Both threads can read and write registers x and y. The symbol i is used to designate the thread-id and can be 1 or 2. Figure 5 a) Show that it satisfies mutual exclusion and deadlock-freedom for two threads. b) Does it satisfy starvation-freedom for two threads? c) Does it satisfy deadlock-freedom for three threads? That is, i can be 1, 2 or 3. d) Does it satisfy mutual exclusion for three threads? That is, i can be 1, 2 or 3. For each question, you should either sketch a proof, or display an execution where it fails.

Part II: Programming assignment [50 points]

Attached with this homework is a Java project that contains the implementation of LockOne, LockTwo, Peterson, and Filter locks, and simple benchmarks for testing the working and performance of these algorithms using a shared counter. Familiarize yourself with the code. Please configure IntelliJ to use JDK 8 to run the provided code and your development.

(https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) Do the following tasks: 1. [10 points] Implement Bakery lock. Compare its performance with Filter lock for different thread counts – 2 to as many threads your machine supports (e.g. my machine supports 8 threads), and preferably a high counter value.

For this purpose, you may modify the Test2.java which measures the runtime of each thread and reports the average. Show how the average thread runtime scales with the number of threads for both Filter and Bakery lock. Fix the instantiation code for Filter lock so that it can be easily instantiated for any number of threads. Among the two, Bakery lock and Filter lock, which lock performs better? Explain the reason.

2. [15 points] For each locking algorithm – LockOne, LockTwo, Peterson, Filter and Bakery, run the benchmark (Test.java) 5~10 times and “measure” their following properties: a. mutual-exclusion b. fairness c. deadlock-freedom d. starvation-freedom An example (and simple) way to measure these properties is to record a sample execution pattern of threads through print statements. Feel free to modify or add probes to the code to detect thread execution patterns.

3. [25 points] In the given benchmark, we are using only 2 threads. Consider a generalization of the two-thread Peterson lock by arranging a number of 2-thread Peterson locks in a binary tree, as follows. Each thread is assigned a leaf lock, which it shares with one other thread. Each lock treats one thread as thread 0 and the other as thread 1. In the tree-lock’s acquire method, the thread acquires every two-thread Peterson lock from that thread’s leaf to the root.

The tree-lock’s release method for the tree-lock unlocks each of the 2-thread Peterson locks that thread has acquired, from the root back to its leaf. At any time, a thread can be delayed for a finite duration. (In other words, threads can take “naps”, or even “vacations”, but they do not “drop dead”.) a. Run your new implementation against 16 competing threads modifying the Test class.

Does each of the four properties mentioned above hold? Sketch a proof that it holds, or describe a, possibly infinite, execution where it is violated (you may also measure it as described previously). b. Is there an upper bound on the number of times the tree-lock can be acquired and released between the time a thread starts acquiring the tree-lock and when it succeeds? c. The Test2 class measures the runtime of each thread and reports the average. Modify Test2 to run the new lock implementation. Run this benchmark class on your local machine as well as a high-core count server for different THREAD_COUNT (4, 8, 16, 32, and 64).

Make sure you do not run any other application when running the benchmark on your local machine. Include the core count of your local machine in the report. You may use any machine in the Rlogin cluster to satisfy the high-core count server requirement. The Rlogin cluster consists of 24 40-core server machines and 2 64-core server machines.

Your report must indicate the core-count of the machine you used. The instructions to setup an account to access these machines have been communicated in a previous announcement. The Rlogin is a shared cluster, thus there may be multiple users using the same machine at any given time, interfering with your benchmark. Thus, to overcome this, you have to run your benchmark at 4-5 different times of the day (For example, 12AM, 6AM, 9AM, 12PM, 6PM, 9PM). You have report these results in a table similar to below for each THREAD_COUNT. Iterations 12AM 6AM 9AM 12PM 6PM 1 2

3 Note that every time you login into Rlogin, you will be provided a random machine in the cluster. You must choose a particular machine for all your tests. Indicate the machine name in your report. You can use “hostname” to find the machine name initially. Then, even if you are given a different machine you can connect to the intended machine with the following command: ssh @ You must also plot the average runtime vs THREAD_COUNT as a bar graph. There must be two bars (one for local machine and another for server machine).

Essentially, you should compute the average value of each table and construct the graph. A sample has been provided below. Note that as you move to higher THREAD_COUNT values on your local machine, you might experience machine freezes. In such cases, you may ignore 0 1000 2000 3000 4000 4 8 16 32 64 Local Remote reporting that and other higher THREAD_COUNT values. (For example, if you experience intermittent freezes when running with THREAD_COUNT=16, it is advisable to ignore THREAD_COUNT=16,32,64).

However, you should not have any issues with the Rlogin machines since they are capable of handling such high thread counts. [Hint: The benchmark may use the provided binary tree implementation using Java generics. Balance the binary tree using an appropriate order of nodes insertions. Feel free to implement your own classes for this problem.] Zip up the PDF together with the source code for Part II and save it in a file called hw2_.zip.

Submit the archive as Homework 2 on the class’s Canvas page before the deadline. Rlogin Registration instructions • Create an account on – https://admin.cs.vt.edu/ • Create the SLO password according to the password policy (7 characters, one uppercase letter, one lower case letter, one special character or number). This is the password that will be used to access Rlogin. • Wait for an hour, and use the following command on Terminal (Windows users, see below) to login: ssh your_pid@rlogin.cs.vt.edu • In case you can’t login into Rlogin, you will have to use Virginia Tech’s Remote Access VPN. https://www.nis.vt.edu/ServicePortfolio/Network/RemoteAccess-VPN.html • Rlogin accounts of non-CS majors are deleted at the end of every semester.

So, if you are a non-CS major and had an account previously, you will have to create a new account. For Windows users: • You can either use PuTTy or MobaXterm to login into Rlogin. • Or you can install a Unix-like environment for Windows such as Cygwin or msys2. I suggest msys2 (Follow the guide here: https://www.msys2.org/). Follow the guide in the link. Install OpenSSH through the following command: pacman -S openssh • You also have an option of using ‘Windows Subystem for Linux’ – https://docs.microsoft.com/en-us/windows/wsl/install-win10

ECE/CS 5510 Multiprocessor Programming Homework 3

1 class VolatileExample {
2 int x = 0;
3 volatile boolean v = false;
4 public void writer () {
5 x = 42;
6 v = true;
7 }
8 public void reader () {
9 if ( v == true) {
10 int y = 100/ x ;
11 }
12 }
13 }
Listing 1: Volatile field example from section 1

1 Java memory model [6 points]

Consider the class shown in listing 1. According to what you have been told
about the Java memory model, will the reader method ever divide by zero?

2 Sequential Consistency (1) [2 points]

The following is a history of a FIFO queue. The operations are enq(x)/void
and deq()/x. Is this history a valid sequential history? Why not?
A: q.enq(x)
B: q.enq(y)
A: q:void
B: q:void

3 Sequential Consistency (2) [6 points]

P1 P2 P3
x = 1; y = 1; z = 1;
print(y,z); print(x,z); print(x,y);

The variables x, y and z are stored in a memory system that offers sequential
consistency. Every operations including the print statements is atomic1
.

Explain whether the following are legal output sequences or not.
1. 001011
2. 001111
3. 001110

For a legal output, show one possible interleaving of the instructions that leads
to the said output. For an illegal output, explain why there is no possible
interleaving that may leads to the said output.

4 Consistency (1) [6 points]

Consider a memory object that encompasses two register components. We
know that if both registers are quiescently consistent, then so is the memory
object. Does the converse hold? If the memory object is quiescently consistent,
are the individual registers quiescently consistent? Outline a proof or give a
counterexample.

5 Consistency (2) [6 points]

Give an example of an execution that is quiescently consistent but not
sequentially consistent, and another that is sequentially consistent but not
quiescently consistent.

6 Linearizability (1) [6 points]

The following is a history of a stack. The operations are push(x)/void and
pop()/x. Is the history linearizable? If yes, then find a legal sequential history.
A: s.push(x)
B: s.push(y)
B: s:void
B: s.pop()
1no operation can overlap with other operations

A: s:void()
A: s.pop()
A: s:y

7 Linearizability (2) [6 points]

The following is a history of a FIFO queue. The operations are enq(x)/void
and deq()/x. Is the history linearizable? If yes, then prove. Is it sequentially
consistent?
A: r.enq(x)
A: r:void
B: r.enq(y)
A: r.deq()
B: r:void
A: r:y

8 Linearizability (3) [6 points]

The following is a history of a FIFO queue. The operations are enq(x)/void
and deq()/x. Is the history linearizable? If yes, then prove.
A: q.enq(x)
B: q.enq(y)
A: q:void
B: q:void
A: q.deq()
C: q.deq()
A: q:y
C: q:y

9 Compositional Linearizability [6 points]

Proove the “only if” part of Theorem 3.6.1, reprinted below.
H is linearizable if, and only if, for each object x, H|x is linearizable.

130 4 Atomicity: Formal Definition and Properties
Fig. 4.9 Sequential consistency is not a local property
easy to see that, when we consider each object in isolation, we obtain the histories
H
|Q and H
|Q that are sequentially consistent. Unfortunately, there is no way to
witness a legal total order S that involves the six operations: if p1 dequeues b from
Q
, Q
.enq(a
) has to be ordered after Q
.enq(b
) in a witness sequential history.

But this means that (to respect process-order) Q.enq(a) by p1 is necessarily ordered
before Q.enq(b) by p2. Consequently Q.deq() by p2 should return a for S to be
legal. A similar reasoning can be done starting from the operation Q.deq(b) by p2.
It follows that there can be no legal witness total order. Hence, despite the fact that
H
|Q and H
|Q are sequentially consistent, the whole history H
is not.

4.5.2 Serializability
Overview It is sometimes important to ensure that groups of operations appear
to execute as if they have been executed without interference with any other group
of operations. The concept of transaction is then the appropriate abstraction that
allows the grouping of operations. This abstraction is mainly encountered in database
systems.

A transaction is a sequence of operations that might complete successfully (commit) or abort. In short, the execution of a set of concurrent transactions is correct
if committed transactions appear to execute at some indivisible point in time and
aborted transactions do not appear to have been executed at all. This correctness criteria is called serializability (sometimes it is also called atomicity).

The motivation
(again) is to reduce the difficult problem of reasoning about concurrent transactions
into the easier problem of reasoning about transactions that are executed one after
the other. For instance, if some invariant predicate on the set of shared objects is
preserved by every individual committed transaction, then it will be preserved by a
serializable execution of transactions.

Definition To define serializability, the notion of history needs to be revisited.
Events are now accsa trsai n, s ery H.
10 More histories (1) [10 points]
Figure 1 shows a History H with two processes: p1 and p2. The processes are
accessing two concurrent queues Q and Q0
.

Answer the following questions:
1. Is H|Q sequentially consistent?
2. Is H|Q0
sequentially consistent?
3. Is the entire history H sequentially consistent? If not, then why? If yes,
then explain.

11 More histories (2) [6 points]

Assume you have a multiprocessor system with three processors. Each
processor does not stall when it encounters a last-level cache (LLC) miss; i.e.,
each processor continues executing instructions that are independent of the
data that needs to be fetched from DRAM due to the LLC miss.

W stands for write instruction. R stands for read instruction. Assume
the data in the memory locations a and b is initialized to zero. The three
processors execute the following sequence:
Processor 1: W(a, 1); R(b, 0);
Processor 2: W(b, 1); R(b, 1); R(a, 1);
Processor 3: R(b, 1); R(a, 0);
Is this multiprocessor system sequentially consistent?

12 More Histories (3) [6 points]

Is this a linearizable execution?
Time Task A Task B
0 Invoke q.enq(x)
1 Work on q.enq(x)
2 Work on q.enq(x)
3 Return from q.enq(x)
4 Invoke q.enq(y)
5 Work on q.enq(y)
6 Work on q.enq(y)
7 Return from q.enq(y)
8 Invoke q.deq()
9 Return x from q.deq()
13 More Histories 4) [6 points]
Is this a linearizable execution?

Time Task A Task B
0 Invoke q.enq(x)
1 Work on q.enq(x) Invoke q.enq(y)
2 Work on q.enq(x) Return from q.enq(y)
3 Return from q.enq(x)
4 Invoke q.deq()
5 Return x from q.deq()
14 More Histories (5) [6 points]
Is this a linearizable execution?

Time Task A Task B
0 Invoke q.enq(x)
1 Return from q.enq(x)
2 Invoke q.enq(y)
3 Invoke q.deq() Work on q.enq(y)
4 Work on q.deq() Return from q.enq(y)
5 Return y from q.deq()

15 AtomicInteger [7 points]

The AtomicInteger class (in the java.util.concurrent.atomic package) is
a container for an integer value. One of its methods is boolean compareAndSet
(int expect, int update).

This method compares the object’s current value to expect. If the values
are equal, then it atomically replaces the object’s value with update and
returns true. Otherwise, it leaves the object’s value unchanged and returns
false. This class also provides int get(), which returns the object’s actual
value.

Consider the FIFO queue implementation shown in listing 2. It stores its
items in an array items, which, for simplicity, we will assume has unbounded
size. It has two AtomicInteger fields: head is the index of the next slot from
which to remove an item, and tail is the index of the next slot in which
to place an item. Give an example showing that this implementation is not
linearizable.

16 Herlihy/Wing queue [9 points]

This exercise examines a queue implementation (listing 3) whose enq()
method does not have a linearization point.

The queue stores its items in an items array, which for simplicity we will
assume never hits CAPACITY. The tail field is an AtomicInteger, initially
zero. The enq() method reserves a slot by incrementing tail and then
storing the item at that location. Note that these two steps are not atomic:
there is an interval after tail has been incremented but before the item has
been stored in the array.

1 class IQueue <T > {
2 AtomicInteger head = new AtomicInteger (0) ;
3 AtomicInteger tail = new AtomicInteger (0) ;
4 T [] items = ( T []) new Object [ Integer . MAX_VALUE ];
5 public void enq ( T x ) {

6 int slot ;
7 do {
8 slot = tail . get () ;
9 } while (! tail . compareAndSet ( slot , slot +1) ) ;
10 items [ slot ] = x ;
11 }
12 public T deq () throws EmptyException {
13 T value ;
14 int slot ;
15 do {
16 slot = head . get () ;
17 value = items [ slot ];
18 if ( value == null)
19 throw new EmptyException () ;

20 } while (! head . compareAndSet ( slot , slot +1) ) ;
21 return value ;
22 }
23 }
Listing 2: IQueue implementation

The deq() method reads the value of tail, then traverses the array in
ascending order from slot zero to the tail. For each slot, it swaps null with
the current contents, returning the first non-null item it finds. If all slots
are null, the procedure is restarted.

Give an example execution showing that the linearization point for enq()
cannot occur at line 142
. Give another example execution showing that the
linearization point for enq() cannot occur at line 15.

Since these are the only
two memory accesses in enq(), we must conclude that enq() has no single
linearization point. Does this mean enq() is not linearizable?
1 public class HWQueue <T > {
2 AtomicReference <T >[] items ;
3 AtomicInteger tail ;
4 static final int CAPACITY = 1024;

6 public HWQueue () {
7 items = ( AtomicReference <T >[]) Array .
newInstance ( AtomicReference . class ,
CAPACITY ) ;
8 for (int i = 0; i < items . length ; i ++) {
9 items [ i ] = new AtomicReference <T >( null) ;
10 }

11 tail = new AtomicInteger (0) ;
12 }
13 public void enq ( T x ) {
14 int i = tail . getAndIncrement () ;
15 items [ i ]. set (x ) ;
16 }
17 public T deq () {
18 while (true) {

19 int range = tail . get () ;
20 for (int i = 0; i < range ; i ++) {
21 T value = items [ i ]. getAndSet ( null) ;
22 if ( value != null) {
23 return value ;
24 }

2Hint: give an execution where two enq() calls are not linearized in the order they
execute line 14.

25 }
26 }
27 }
28 }
Listing 3: Herlihy/Wing queue

ECE/CS 5510 Multiprocessor Programming Homework 4

I. Theory [40 points]

1. [10 points] A proposed algorithm for mutual-exclusion works as follows:
Let there be a multi-writer register roll, whose range is {−1, 0, . . . , N −
1} and the initial value is −1. To enter the critical section, a process
i spins on roll, until the value is −1. And when it finds the value to
be −1, it sets the value of roll to i, within one time unit of the read.

After that, the process waits for more than one time unit, and then
reads roll. If it still has the value i, then it enters the critical section.
Otherwise, it returns to spinning on roll until the value is −1. When
a process exits its critical section, its sets the value of roll to −1.

(a) Does the algorithm guarantee mutual exclusion?
(b) Is it livelock-free?
(c) What are the drawbacks of this algorithm (in terms of performance
and requirement)?

2. [10 points] Consider a four-socket system comprising of four 10-core
processors. These processors are connected with each other through
a point-to-point interconnect. Cache coherency within a socket is
maintained through snooping. Coherency across different sockets is
implemented through a broadcast mechanism.

Assume a high contention
scenario where n threads simply try to acquire a single lock. n is varied
from 2 to 40.

Based on this, answer the following questions:
(a) What will happen to the throughput for threads n > 10? Give
reason.

(b) Which lock(s) among TAS, TTAS, ALock, CLH and MCS you
expect to be strong against contention as n varies from 2 to 40?
Why?

3. [5 points] Figure 1 shows a modified implementation of ALock, with
the only change being the two statements (Line 24 and Line 25) in the
unlock procedure are switched. Prove/disprove that this implementation
is correct. (Consider size = 3).

1 public class ALock implements Lock {
2
3 ThreadLocal < Integer > mySlotIndex = new ThreadLocal <
Integer > () {
4 protected Integer initialValue () {
5 return 0;
6 }
7 };
8 AtomicInteger tail ;
9 boolean [] flag ;
10 int size ;
11

12 public ALock (int capacity ) {
13 size = capacity ;
14 tail = new AtomicInteger (0) ;
15 flag = new boolean [ capacity ];
16 flag [0] = true;
17 }
18 public void lock () {

19 int slot = tail . getAndIncrement () % size ;
20 mySlotIndex . set ( slot ) ;
21 while (! flag [ mySlotIndex . get () ]) {};
22 }
23 public void unlock () {
24 flag [( mySlotIndex . get () + 1) % size ] = true;
25 flag [ mySlotIndex . get () ] = false;
26 }
27 }
Figure 1: Anderson’s Lock

4. [15 points] MCS Lock (Figure 2)
(a) Consider a modified implementation of MCS Lock when getAndSet
is not atomic, i.e., get() and set() are separate. Prove/disprove
that this implementation is correct.

(b) Consider a modified implementation of MCS Lock when compareAndSet
is not atomic, i.e., compare() and set() are separate. Prove/disprove that this implementation is correct.
(c) Prove that the unlock procedure of the MCS lock is deadlock-free.

(d) Consider a modified implementation of MCS Lock where Line 16
and Line 17 are swapped. Prove/disprove that this implementation
is correct.

II. Implementation [50 points]

1. [10 points] Imagine n threads, each of which executes method foo()
followed by method bar(). Suppose we want to make sure that no
thread starts bar() until all threads have finished foo(). For this kind
of synchronization, we place a barrier between foo() and bar().

Barrier implementation: We have an n-element array b, all 0. Thread
zero sets b[0] to 1. Every thread i, for 0 < i ≤ n − 1, spins until b[i − 1]
is 1, sets b[i] to 1, and waits until b[n − 1] becomes 2, at which point
it proceeds to leave the barrier. Thread b[n − 1], upon detecting that
b[n − 1] is 1, sets b[n − 1] to 2 and leaves the barrier.

Implement the barrier and measure the barrier time (i.e., time between
foo() and bar()) as a function of number of threads (upto 16; Use
Rlogin).

2. [20 points] Implement the four locks below (you may reuse textbook
code) and measure the time to execute an empty critical section as a
function of number of threads for all locks specified below, and plot the
data (similar to Figure 7.4 in the textbook). Collect data for upto 16
threads (Use Rlogin).

Note: You will actually measure throughput (in terms of the number
of locks acquired) over a two second period. Then you can calculate
the average waiting time, which you need to plot. Also, you might

1 public class MCSLock implements Lock {
2 AtomicReference < QNode > queue ;
3 ThreadLocal < QNode > myNode ;
4 public MCSLock () {
5 queue = new AtomicReference < QNode >( null) ;
6 myNode = new ThreadLocal < QNode >() {
7 protected QNode initialValue () {
8 return new QNode () ;
9 }
10 };
11 }

12 public void lock () {
13 QNode qnode = myNode . get () ;
14 QNode pred = queue . getAndSet ( qnode ) ;
15 if ( pred != null) {
16 qnode . locked = true;
17 pred . next = qnode ;
18 while ( qnode . locked ) {}
19 }
20 }

21 public void unlock () {
22 QNode qnode = myNode . get () ;
23 if ( qnode . next == null) {
24 if ( queue . compareAndSet ( qnode , null) )
25 return;
26 while ( qnode . next == null) {}
27 }

28 qnode . next . locked = false;
29 qnode . next = null;
30 }
31
32 static class QNode {
33 boolean locked = false;
34 QNode next = null;
35 }
36 }
Figure 2: MCS Lock

want to account for JVM warm-up in your measurements. Simply take
measurements 3 times (through a loop), and consider the last one.
(a) testAndSet lock
(b) testAndTestAndSet lock
(c) CLH queue lock
(d) MCS queue lock

3. [20 points] Priority-based CLH Lock
The CLH lock provides FCFS fairness, which is a useful property for
many applications. But some applications may attach priorities to
threads (e.g., some thread may have a higher priority than others).

Design and implement a priority CLH lock that ensures that the highest
priority thread that is waiting for the lock is always granted access to
the critical section than all other waiting threads.

Note that we don’t care about the priority of the thread that is currently
holding the lock, which may or may not have a higher priority than the
highest priority waiting thread.

Thus, whenever the lock is released, the
highest priority waiting thread is granted access first. (If you include
the priority of the lock holder in the total priority ordering of the lock,
then it may require aborting the lock holder, executing roll-back logic,
etc., which is outside the scope of this homework.)

Beside the lock() and unlock() methods for your CLH lock, also implement a trylock() method that attempts to acquire the lock (respecting
thread priorities), and if it can’t acquire it within a predefined amount
of time (constructor argument in ms), it fails/aborts. trylock() will
return a boolean value indicating whether the lock was successfully
acquired.

For each thread, measure the waiting time before starting critical section
execution, and multiply it by its priority (where priority=1 is the highest
priority, and priority=5 is the lowest).

Obtain the average of your calculated data as a function of number of
threads and plot the data (similar to Figure 7.4 of textbook). Include
both priority CLH lock (only the lock() method) and FCFS CLH lock
in the plot.

Hints: Don’t try to implement the P-CLH lock the same way the CLH
lock is implemented. Instead, use the principles behind the CLH lock.
This is easier if you use Java’s PriorityBlockingQueue, and let threads
spin on their own node, instead of the predecessor’s.

ECE/CS 5510 Multiprocessor Programming Homework 5

1 Problems (56 points)

1.1 Set (1) [8 points] In listing 3, two entries are locked before key presence is determined. If no entries were locked and it instead returned true/false based on key existence, would this alternative still be linearizable? If so, explain; if not, give a counterexample.

1.2 Set (2) [8 points] Will listing 1, by itself, function correctly if we switch the locking order on line 9? What about in context with listings 2 and 3? If it has issues in context, how can that be fixed?

1.3 Set (3) [8 points] Show that listing 1 only needs to lock pred.

1.4 Lock-free Linked List (1) [8 points] Listings 4 and 5 show the add() and remove() methods for LockFreeList class. Describe the two execution scenarios where the compareAndSet() operation in line 11 of add() method fails because of: 1. expected reference (1st argument) 2. marked value (3rd argument)

1.5 Lock-free Linked List (2) [8 points] Listings 4 and 5 show the add() and remove() methods for LockFreeList class. Describe the two execution scenarios where the compareAndSet() operation in line 11 of remove() method fails because of: 1. expected reference (1st argument) 2. marked value (3rd argument) 1

1 public boolean add ( T item ) { 2 int key = item . hashCode () ; 3 while (true) { 4 Node pred = head ; 5 Node curr = pred . next ; 6 while ( curr . key < key ) { 7 pred = curr ; curr = curr . next ; 8 } 9 pred . lock () ; curr . lock () ; 10 try { 11 if ( validate ( pred , curr ) ) { 12 if ( curr . key == key ) { 13 return false; 14 } else { 15 Node node = new Node ( item ) ; 16 node . next = curr ; 17 pred . next = node ; 18 return true; 19 } 20 } 21 } finally { 22 pred . unlock () ; curr . unlock () ; 23 } 24 } 25 } Listing 1: Add item

1 public boolean remove ( T item ) { 2 int key = item . hashCode () ; 3 while (true) { 4 Node pred = head ; 5 Node curr = pred . next ; 6 while ( curr . key < key ) { 7 pred = curr ; curr = curr . next ; 8 } 9 pred . lock () ; curr . lock () ; 10 try { 11 if ( validate ( pred , curr ) ) { 12 pred . next = curr . next ; 13 return true; 14 } else { 15 return false; 16 } 17 } 18 } finally { 19 pred . unlock () ; curr . unlock () ; 20 } 21 } Listing 2: Remove item

1 public boolean contains ( T item ) { 2 int key = item . hashCode () ; 3 while (true) { 4 Node pred = this. head ; 5 Node curr = pred . next ; 6 while ( curr . key < key ) { 7 pred = curr ; curr = curr . next ; 8 } 9 pred . lock () ; curr . lock () ; 10 try { 11 if ( validate ( pred , curr ) ) { 12 return curr . key == key ; 13 } 14 } finally { 15 pred . unlock () ; curr . unlock () ; 16 } 17 } 18 } Listing 3: Containment check 4

1 public boolean add ( T item ) { 2 int key = item . hashCode () ; 3 while (true) { 4 Window window = find ( head , key ) ; 5 Node pred = window . pred , curr = window . curr ; 6 if ( curr . key == key ) { 7 return false; 8 } else { 9 Node node = new Node ( item ) ; 10 node . next = new AtomicMarkableReference ( curr , false) ; 11 if ( pred . next . compareAndSet ( curr , node , false , false) ) { 12 return true; 13 } 14 } 15 } 16 } Listing 4: LockFreeList add() 5

1 public boolean remove ( T item ) { 2 int key = item . hashCode () ; 3 boolean snip ; 4 while (true) { 5 Window window = find ( head , key ) ; 6 Node pred = window . pred , curr = window . curr ; 7 if ( curr . key != key ) { 8 return false; 9 } else { 10 Node succ = curr . next . getReference () ; 11 snip = curr . next . compareAndSet ( succ , succ , false , true) ; 12 if (! snip ) 13 continue ; 14 pred . next . compareAndSet ( curr , succ , false , false) ; 15 return true; 16 } 17 } 18 } Listing 5: LockFreeList remove() 6

1.6 Lock-free Linked List (3) [8 points] A programmer notices that the add() method of the LockFreeList (Listing 4) never finds a marked node with the same key. Therefore, to save the need to insert a new node, he/she modifies algorithm to simply insert new added object into the existing marked node with same key if such a node exists in the list. Explain why will this not work.

1.7 Compare-And-Swap [8 points] Some naïve approaches to lock-free deletion in linked lists use a single CAS, swapping the element to be deleted with its next node as shown below. H 10 30 T Unfortunately, this can lead to the following situation: H 10 20 30 T Why can this happen? Why is it a problem?

2 Programming (44 points)

In this implementation, you will write a microbenchmark to test each of the linked-list algorithms for sets described in chapter 9 of the textbook; for the locking algorithms, replace usage of explicit locks with synchronized blocks that provide the same semantics as the explicit lock usage (this will not be possible in all cases).

Don’t forget to add volatile to those variables that need it (but keep usage to a minimum as otherwise performance may suffer). Note that there may be errors in the code that you will need to correct as well. Use sets of integers Set as the implementation to benchmark. A worker thread does an operation ITER times.

The operation on the set could be add, remove or contains with a number passed as a parameter.

ITER = 1000 and operating with numbers (a random choice) from 0 to 100 is a good choice for a worker thread as it ensures some degree of contention. You will need to vary workload parameters (e.g. ratio of contains/add /remove operations) and find trends in behavior of all algorithms. Vary percentage of contains operation from 20% to 80% with the following spacing – 20%, 40%, 60% and 80%. Divide rest of the operations between add and remove equally wherever possible. Benchmark from 4 threads to 40 threads with reasonable spacing and at least for 8 different thread counts (Use Rlogin). There are two ways for approaching mixed workloads.

Choose whichever you feel comfortable with. (1) Have worker threads that only perform one operation. For example, for 20 threads, you can have 12 threads that always invoke contains, and 4 threads each that always invoke add and remove. Approximate wherever needed. (2) Each thread chooses between contains , add and remove through a uniformly distributed float random number generator. Hint: Read about java.util.concurrent.ThreadLocalRandom.

Use it in the run() method of your thread. Don’t forget proper benchmarking practices (accounting for JVM warm-up – the more the better, no points of contention beyond the set object itself, etc.); you do not need to use JMH, but you may find it useful to avoid the pitfalls discussed in https://www.oracle.com/technetwork/articles/ java/architect-benchmarking-2266277.html. Provide the following plots: 1. throughput vs. threads; threads varying from 4 to 40 (at least 8 different thread counts); contains % = 20; all algorithms. 2. throughput vs. threads; threads varying from 4 to 40 (at least 8 different thread counts); contains % = 40; all algorithms. 3. throughput vs. threads; threads varying from 4 to 40 (at least 8 different thread counts); contains % = 60; all algorithms. 4. throughput vs.

threads; threads varying from 4 to 40 (at least 8 different thread counts); contains % = 80; all algorithms. 5. throughput vs. contains %; contains = 20%, 40%, 60%, 80% for fixed thread count = 20; all algorithms.

Analyze your findings in a writeup, which should not exceed two pages in length.

Your code and writeup (with filename hw5_.pdf) should be included in a zip file named hw5_.zip to be uploaded to Canvas by the announced due date; as with homework 1, your code should be part of a hw5 package and should be runnable from the command line (document the exact command(s) to use).

ECE/CS 5510 Multiprocessor Programming Homework 6

1 Problems (50 points)

1.1 ABA problem [10 points] Many processors implement special versions of load and store instructions called, linked-load (LL) and store-conditional (SC) respectively. The LL instruction reads from memory location p. A subsequent SC instruction to p attempts to store a new value at p, but succeeds only if p is untouched since that thread issued LL to p.

How do you think these instructions can help in solving the ABA problem? Give a clear example. How do you think LL/SC are implemented in hardware?

1.2 Stack (1) [10 points] A standard stack data structure has the following operations: push() and pop(). Are each of the executions presented in 1 (for two threads), linearizable and/or sequentially consistent? Provide an explanation.

1.3 Stack (2) [10 points] The implementation of lock-free unbounded stack is in listings 2 and 3. The ABA problem could arise in pop() method with improper garbage collection. How can this happen? Give a scenario. Also, explain how the code can be fixed to avoid the ABA problem.

1.4 Queue (1) [10 points] An implementation of an unbounded queue has been provided in listing 4. The deq() is blocking in the sense that it spins until there is an item to dequeue. Assume that the bucket array is too large to be filled and last is the index of the next unused position in bucket. • Explain whether enq() and deq() methods are wait-free or lock-free. • What are the linearization points for enq() and deq()? Are they execution dependent or independent?

Figure 1: Executions on stack

1.5 Queue (2) [10 points] Listing 5 shows the deq() method of unbounded lock-based queue. Explain why holding the lock is necessary to check whether the queue is empty.

2 Programming (50 points)

Linearizability is a useful and intuitive consistency correctness condition that is widely used to reason and prove correctness of concurrent implementations of common data structures.

Though linearizability is non-blocking, it may impose a synchronization requirement that is stronger than what is needed for some applications, limiting scalability. For example, in highly concurrent servers, a set of threads (called a “thread pool”) is often used to execute concurrent entities of the application, which are called “tasks”.

A thread pool typically uses a queue, where tasks are stored. Threads dequeue tasks from this task queue for execution. Such a queue need not have strict FIFO behavior. A queue that does not allow one task to be bypassed by another task “too much” is often sufficient. For example, it is acceptable to dequeue a task T from the queue even if a task T that has been enqueued k places “before” T has not yet been dequeued.

In this assignment, you will construct a n-semi-linearizable queue. To understand semi-linearizable queue, consider the following concurrent executions of a linearizable queue: For a 2 semi-linearizable queue, the following execution is possible: The following execution is not possible for a 2 semi-linearizable queue.

However it is acceptable for 3 semi-linearizable queue. Implement an n- semi-linearizable queue and measure its throughput (number of operations enqueue/dequeue per seconds) and compare it against any linearizable queue (e.g., you can use java.util.concurrent.ConcurrentLinkedQueue from JDK). Plot your throughput as a function of the number of threads. (at least 8 different thread counts, threads varying from 4 to 40.

Use Rlogin.) Hint: A possible implementation for an n semi-linearizable queue is by using a queue, where each of its elements is an n-element array. Thus, enqueue will insert multiple nodes in the same queue element, and they will be considered equally by dequeue (so you can randomly select any of them).

Another possible implementation is by using a normal queue. For dequeue, you can pick any of the last n nodes and return. Note that the last n elements are treated equally in an n semi-linearizable queue. Your program should be invoked as: java QueueTest []

where qname is either LQueue for linearizable queue or SLQueue for semilinearizable queue, threads > 0 is the number of test threads, duration > 0 is the duration of the test in seconds and n is the value of n for the semilinearizable queue (n is not required when LQueue is supplied as the qname). Each thread should perform both enqueue and dequeue repeatedly with equal probability.

As output, your program should (from a single-thread) emit a line containing three numbers separated by spaces: the first value should be the total number of enqueues done by all threads, and the second should be the total number of successful dequeues done by all threads, and the third the total number of number of nodes remaining in the queue. A second line of output should emit the throughput. No other output is acceptable.

Your code and writeup (with filename hw6_.pdf) should be included in a zip file named hw6_.zip to be uploaded to Canvas by the announced due date.

1 public class LockFreeStack { 2 AtomicReference < Node > top = new AtomicReference < Node >( null) ; 3 static final int MIN_DELAY = 10; 4 static final int MAX_DELAY = 100; 5 Backoff backoff = new Backoff ( MIN_DELAY , MAX_DELAY ) ; 6 7 protected boolean tryPush ( Node node ) { 8 Node oldTop = top . get () ; 9 node . next = oldTop ; 10 return ( top . compareAndSet ( oldTop , node ) ) ; 11 } 12 13 public void push ( T value ) { 14 Node node = new Node ( value ) ; 15 while(true) { 16 if( tryPush ( node ) ) { 17 return; 18 } else { 19 backoff . backoff () ; 20 } 21 } 22 } Figure 2: Lock-free Unbounded Stack

1 protected Node tryPop () throws EmptyException { 2 Node oldTop = top . get () ; 3 if ( oldTop == null) { 4 throw EmptyException () ; 5 } 6 Node newTop = oldTop . next ; 7 if ( top . compareAndSet ( oldTop , newTop )) { 8 return oldTop ; 9 } else { 10 return null; 11 } 12 } 13 14 public T pop () throws EmptyException { 15 while(true) { 16 Node returnNode = tryPop () ; 17 if ( returnNode != null) { 18 return returnNode . value ; 19 } else { 20 backoff . backoff () ; 21 } 22 } 23 } 24 25 public class Node { 26 public T value ; 27 public Node next ; 28 public Node ( T value ) { 29 this. value = value ; 30 this. next = null; 31 } 32 } Figure 3: Lock-free Unbounded Stack

1 public class YetAnotherQueue { 2 AtomicReference [] bucket ; 3 AtomicInteger last ; 4 5 public void enq ( T item ) { 6 int idx = last . getAndIncrement () ; 7 bucket [ idx ]. set ( item ) ; 8 } 9 10 public T deq () { 11 while (true) { 12 int end = last . get () ; 13 for (int idx = 0; idx < end ; idx ++) { 14 T value = bucket [ idx ]. getAndSet ( null ) ; 15 if ( value != null) { 16 return value ; 17 } 18 } 19 } 20 } 21 22 } Figure 4: YetAnotherQueue – Unbounded

1 public T deq () throws EmptyException { 2 T result ; 3 deqLock . lock () ; 4 try { 5 if ( head . next == null) { 6 throw new EmptyException () ; 7 } 8 result = head . next . value ; 9 head = head . next ; 10 } finally { 11 deqLock . unlock () ; 12 } 13 return result ; 14 } Figure 5: UnboundedQueue – deq() method

ECE/CS 5510 Multiprocessor Programming Homework 1 to 6 solutions

Download Details:

Description