# CPSC 418 / MATH 318 Introduction to Cryptography ASSIGNMENT 1 solved

\$30.00

Original Work ?

## Description

Problems 1-4 ,
Problems 7 and 8 only
Written Problems for CPSC 418 and MATH 318
Problem 1 — Password length and entropy (20 marks)
ASCII, short for American Standard Code for Information Interchange, is a method for
encoding any character into a string of 7 bits. For example, the ASCII codes for ‘A’, ‘a’,
and ‘?’ are 1000001, 1100001, and 0111111, respectively1
. ASCII has long been superseded
by the much more modern and extensive UTF-8 encoding method; ASCII codes constitute a
very small subset of UTF-8 codes.
A password can only contain printable characters that appear on a standard US English
keyboard. These include 26 upper case letters, 26 lower case letters, 10 numerical digits
and the 32 special characters `'”.,;:!?~@#\$%^&*_-+=(){}[]<>\/|. Not all ASCII codes
correspond to printable characters.
a. (2 marks) What is the total number of ASCII encodings of 8-character strings?
b. What is the number of passwords of length 8 consisting of
(i) (2 marks) any printable characters?
(ii) (2 marks) lower case letters only?
c. Approximately what percentage of 8-character ASCII encodings are passwords consisting
of
(i) (2 marks) any printable characters?
(ii) (2 marks) lower case letters only?
d. Assuming that each character in a password is chosen equally likely, what is the entropy
of the space of passwords consisting of
(i) (2 marks) 8 printable characters?
(ii) (2 marks) 8 lower case letters?
e. Suppose we want a password space with entropy 128, i.e. a total of 2
the minimum number of characters in a password, to achieve this entropy, assuming that
(i) (3 marks) any printable characters?
(ii) (3 marks) lower case letters only?
1Source: https://www.ascii-code.com.
2For modern cryptosystems, 2
128 is a typical key space size. Of course the assumption that all passwords
are equally likely is a stretch: most of us use only a small handful of our favourite passwords.
2
Problem 2 — One-time pad without the all-zeros key (6 marks)
Recall that in the one-time pad, plaintexts, ciphertexts and keys are n-bit blocks for some
n ∈ N. For any key K, plaintext block M and ciphertext block C, the encryption of M with
key K is EK(M) = M ⊕ K and decryption of C with key K is DK(C) = C ⊕ K, where
⊕ denotes bitwise x-or (or equivalently, bitwise addition modulo 2). Assume as always that
keys are chosen with equal likelihood.
When using the one-time pad with the key K = 0n
consisting of n zeros, we have EK(M) =
M⊕K = M for every paintext M, i.e. M is left unencrypted. It has been suggested to improve
the security of the one-time pad by only encrypting with non-zero keys, i.e. removing 0
n
from
the key space.
a. (4 marks) Use either the definition of perfect secrecy,
p(M) = p(M|C) for all plaintexts M and ciphertexts C with p(C) > 0
or its equivalent characterization
p(C) = p(C|M) for all plaintexts M with p(M) > 0 and ciphertexts C with p(C) > 0
to formally prove that this modification of the one-time pad does not provide perfect
secrecy. Specifically, you will need to give a counterexample, i.e. exhibit a pair (M, C)
that violates the definition or the above characterization of perfect secrecy. Answers that
argue informally or do not use this characterization will receive very little if any credit.
b. (2 marks) Explain (informally) why allowing K = 0n does not weaken the security of the
one-time pad, even though using that key does not hide the plaintext when encrypting.
(Note: the assertion that choosing K = 0n happens so rarely that we don’t have to worry
Problem 3 — Weak collisions (16 marks)
The cryptographic relevance of this problem will become evident when we cover hash functions
in class.
For each question below, provide a brief explanation and a compact formula for your answer.
For part (d), give a numerical value that is an integer.
Let n be positive integer. Consider an experiment involving a group of participants, where
we assign each participant a number that is randomly chosen from the set {1, 2, . . . , n} (so all
these assignments are independent events). Note that we allow for the possibility of assigning
the same number to two different participants.
Now pick your favourite number N between 1 and n. When any one of the participants is
assigned the number N, we refer to this as a weak collision (with N). In this problem, we
determine how to ensure at least a 50% chance of a weak collision in our experiment.
3
a. (2 marks) What is the probability that a given participant is assigned your favourite
number N?
b. (2 marks) What is the probability that a given participant is not assigned the number N?
c. (3 marks) Suppose k people participate in the experiment (for some positive integer k).
What is the probability that none of them is assigned the number N, i.e. that there is no
weak collision?
d. (4 marks) Intuitively, the more people participate, the likelier we encounter a weak collision. We wish to find the minimum number K of participants required to ensure at least
a 50% chance of a weak collision.
Suppose n = 10. What is the threshold K in this case?
e. (5 marks) Generalizing part (d) from n = 10 to arbitrary n, prove that if the number
of participants is above log(2)n ≈ 0.69n, then there is at least a 50% chance of a weak
collision. Use (without proof) the inequality
1 − x < exp(−x) for x > 0 . (1)
(This inequality comes from the Taylor series
exp(−x) = 1 − x +
x
2
2! −
x
3
3! +
x
4
4! − · · · + · · · ,
and the sum of the terms from x
2/2 onwards is positive.)
Concluding Remark. In (1), when x is small, exp(−x) is very close to 1 − x; for example,
exp(0.01) and 1−0.01 are within 4 decimal places of each other. This implies that for large n,
the threshold 0.69n of part (e) is close to optimal, i.e. we expect the necessary and sufficient
number of participants for ensuring a weak collision with at least 50% probability to be on
the order of n. Compare this to the corresponding result for (strong) collisions derived in the
next problem.
Problem 4 — (Strong) collisions (20 marks)
The results of this problem bear relevance to Problem 7 below. Further cryptographic significance will become evident when we cover hash functions in class.
For each question below, provide a brief explanation and a compact formula for your answer.
For part (c), give a numerical value that is an integer.
We consider the same experiment as in the previous problem, although here, you don’t need
to pick a favourite number N. When at least two of the participants are assigned identical
numbers, we refer to this as a strong collision or just a collision. In this problem, we determine
how to ensure at least a 50% chance of a collision in our experiment.
4
a. (4 marks) What is the probability that among k participants, no collision occurs?
b. (2 marks) What is the probability of a collision among k participants?
c. (4 marks) Once again, intuitively, the more people participate, the likelier we encounter
a collision. We wish to find the minimum number K of participants required to ensure
at least a 50% chance of a collision.
Suppose n = 10. What is the the threshold K in this case?
d. (5 marks) Let P be the probability of no collisions as computed in part (a). Prove that
P ≤ exp 

k(k − 1
2n

.
First, if your expression obtained for P in part (a) is not already in this form, rewrite P
as
P =
k
Y−1
i=1
(1 − zi) ,
where 0 < zi < 1 for 1 ≤ i ≤ k − 1 (you’ll have to figure out the appropriate expression
for zi). Then use inequality (1) from the previous problem.
e. (5 marks) Similarly to the previous problem, when n is much larger than the number k
of participants, the upper bound on P in part (d) is a very close approximation to P.
When in addition k is not too small (but still much smaller than n), k is very close to
k − 1. This means that the quantity exp(−k
2/2n) is a very close approximation to the
probability P of part (a).
Generalizing part (c) from n = 10 to arbitrary n, prove that if the number of participants
is above p
log(4)n ≈ 1.177√
n, then there is at least a 50% chance of a collision. You
may replace the actual expression for P derived in part (a) by exp(−k
2/2n).
Note: you can solve this question even if you didn’t attempt parts (a)-(d).
Concluding Remark. As in the previous problem, for large n, the threshold 1.177√
n of part
(e) is close to optimal, i.e. we expect the necessary and sufficient number of participants for
ensuring a collision with at least 50% probability to be on the order of √
n. Compare this to
the corresponding result for weak collisions derived in the previous problem.
5
Written Problems for MATH 318 only
Submit your answers to this problem in a separate PDF file. Do not include the solution
to this problem in the PDF file containing your solutions to Problems 1-4.
Let F2 = {0, 1} with the usual arithmetic modulo 2. The set F
n
2 = {0, 1}
n
consisting of all
n-bit vectors (vectors of length n with entries 0 or 1) is an n-dimensional vector space over
F2, with the standard basis for example. Linear algebra in F
n
2
as a vector space over F2 works
just like linear algebra in R
n as a vector space over R, except that linear combinations of
vectors in F
n
2 have coefficients 0 and 1, and all arithmetic is done modulo 2.
For any n ∈ N, let Gln(F2) denote the set of invertible n × n matrices with zeros and ones
as entries. Calculations involving such matrices again work exactly the same as the familiar
matrix arithmetic over R that you learned in first year linear algebra, except that arithmetic
using real numbers is again replaced by arithmetic modulo 2.
Now fix n ∈ N and consider the class of linear ciphers with M = C = F
n
2
, K = Gln(F2), and
for all plaintexts ~m (interpreted as n-bit column vectors with entries 0 and 1), encryption
under a key matrix K is
EK(~m) = K ~m . (2)
Then for all ciphertexts ~c, decryption under K is obviously
DK(~c) = K−1
~c .
Problem 5 — Transposition ciphers are linear (8 marks)
Recall that a transposition cipher produces a ciphertext by permuting the plaintext symbols.
For a transposition cipher operating on bit strings of some length n ∈ N, encryption of a bit
string (m1, m2, . . . , mn) can thus be written as
Eσ(m1, m2, . . . , mn) = (mσ(1), mσ(2), . . . , mσ(n)
) ,
where σ is a secret key permutation on n symbols.
a. (3 marks) Let n = 4 and let σ be the permutation sending 1 to 2, 2 to 4, 3 to 3 and 4 to
1. Then transposition encryption of a 4-bit string using the key permutation σ is given
by
Eσ(m1, m2, m3, m4) = (m2, m4, m1, m3) . (3)
Now write 4-bit strings (x1, x2, x3, x4) as column vectors ~x =
 x1
x2
x3
x4

. Prove that the
cipher given in (3) is a linear cipher by explicitly writing down a permutation matrix K
that satisfies (2) for this cipher. A permutation matrix is a a square matrix whose entries
consist of zeros and ones such that there is exactly one one and n − 1 zeros in each row
and in each column.
b. (5 marks) Generalizing part (a), formally prove that a transposition cipher operating on
bit strings of any length n is a special case of a linear cipher where the matrix K in (2)
is a permutation matrix. Use formal reasoning and good notation for your proof.
6
Problem 6 — Cryptanalysis of linear ciphers (30 marks)
In this problem, we explore attacks on linear ciphers as given in (2).
a. (3 marks) If each key matrix K is chosen equally likely, does this class of linear ciphers
provide perfect secrecy? Formally prove your claim.
b. (3 marks) Explain how an attacker Eve can mount a chosen plaintext attack on a cipher
of the form (2). The goal of this attack is to chose one or more plaintexts, obtain their
encryptions under some unknown key matrix K, and derive K. How should Eve choose
her plaintexts, and how many does she need to choose in order to be successful?
c. (3 marks) Let ~m1, ~m2, . . . , ~mi be any collection of i linearly independent vectors in F
n
2
for
some i with 1 ≤ i ≤ n. Prove that there are 2
i vectors ~mi+1 ∈ F
n
2
such that the vectors
~m1, ~m2, . . . , ~mi
, ~mi+1 are linearly dependent.
d. (2 marks) As in part (c), let ~m1, ~m2, . . . , ~mi be linearly independent vectors in F
n
2
. How
many vectors ~mi+1 ∈ F
n
2
are there so that the vectors ~m1, ~m2, . . . , ~mi
, ~mi+1 are linearly
independent?
e. (5 marks) Prove that the number of sets consisting of n linearly independent vectors in
F
n
2
is
1
n!
nY−1
i=0
(2n − 2
i
) .
f. (5 marks) Prove that the probability that any set of n vectors in F
n
2
is linearly independent
is
Pn =
nY−1
i=0
(2n − 2
i
)
nY−1
i=0
(2n − i)
.
g. (2 marks) Suppose n = 4. What is the probability that any set of 4 vectors in F
4
2
is
linearly dependent? Express the result as a fraction.
h. (6 marks) Part (b) considered a chosen plaintext attack, whereas we will now consider the
scenario of mounting a known plaintext attack on a linear cipher as given in (2). Given a
set of known plaintext/ciphertext pairs where all the plaintexts were encrypted to their
respective corresponding ciphertexts using the same unknown key matrix K, the goal of
this attack is to find K.
Assume that linear dependence of any collection of n plaintexts in F
n
2
represents independent events, i.e. if p is the probability that any n plaintexts are linearly dependent,
then p
2
is the probability that any two collections of n plaintexts are linearly dependent.
Explain how Eve can use multiple attempts at a known plaintext attack to find the key
matrix. For n = 4, how often does Eve have to try to achieve a 99 percent chance of
success? In other words, what is the maximum number k of failed attempts such that
Eve has a 99% chance of finding K on the (k + 1)-st attempt?
7
Programming Problems for CPSC 418 only
Specifications. Design and implement your solutions using Python 3 and the template
programs provided on the Piazza “Resources” tab. Please do not to alter the filenames,
functions, or their input and output types. Submissions that do not comply with these
specifications will be penalized or not marked at all. Use the latest version of the Python
cryptography library found at
https://cryptography.io/en/latest/
This link can also be found on the “references” page of our course website. Use this library
for all required cryptographic primitives. This includes encryption, decryption, hashing and
The cryptography library has its own interface to which you are expected to adhere. You
must make use of the hazardous materials layer, not the recipes layer. Make sure to use
good coding practices. You can split your program up into multiple source files, so long as
the main executable retains the same name as the template program.
The testing of your program will largely be done on Gradescope using python3. An autograder will be made available to test component functions, while the overall problem solutions
will be tested by the TAs separately.
You may assume that for any input file requirements, Gradescope will use text files of at
most 1 MB in size and that all byte encoding is done using UTF-8.
Submission. Submit a description of your implementation for both problems in a separate
README file in text format. Do not include any written portions of the programming problems in the PDF file containing your solutions to any of the written
problems. Your description must include the following:
• A list of the files you have submitted that pertain to the problem, and a short description
of each file.
• A list of what is implemented in the event that you are submitting a partial solution,
or a statement that the problem is solved in full.
• A list of what is not implemented in the event that you are submitting a partial solution.
• A list of known bugs, or a statement that there are no known bugs.
• Any other answers to questions specified in the problem.
8
Problem 7 — Prefix collision finder (12 marks)
Overview. Bob is interested in storing the passwords for users of his website in hashed
form using SHA-2 224 to produce the hashed outputs. However, for space reasons, he is
considering storing only the first 5 bytes of each SHA-2 224 hash tag. As his friend, you want
to convince Bob that this is a terrible idea by demonstrating that you can easily find two
inputs whose SHA2 224 outputs have the same first 5 bytes.
Problem. Finish the template program named
prefix_collision.py
which finds two distinct bytes objects whose images under SHA-2 224 have the same 5-byte
(40-bit) prefix.
Specifications. You will need to complete three separate functions:
1. hash_bytes(), which takes a sequence of bytes and returns a SHA-2 224 hash value.
2. compare_bytes(), which compares the first n bytes of two sequences and checks if they
match.
3. find_collision(), which uses the other functions to find two byte sequences with hash
values that match for the first n bytes.3
In order for your code to run efficiently you must utilize memory in a smart way. The number
of SHA-2 224 tags that you compute and store should be on the order of √
2
b = 2b/2
, where b
is the number of bits in the prefix. You should also make use of an appropriate data structure
for quickly detecting duplicates.
Submission. Please submit a completed version of the template program, with the filename
prefix_collision.py
If you’ve spread your code across multiple source files, submit all of them. If your code takes
more than 2 minutes on Gradescope’s servers for a 5-byte prefix (n = 5), it will be timed
out, so you will need to do something smarter than hashing every single input one-by-one.
Problem 8 — Password attack on authenticated encryption (26 marks)
Overview. Suppose Bob has designed his own authenticated encryption scheme, using the
hash-then-encrypt paradigm, with AES-128 in CBC mode for encryption, and SHA-2 224 for
the key derivation function and message hash tag. On input the name of a plaintext file and
a password p, Bob’s program does the following:
1. Converts the plaintext file to a byte array B.
3
If you’re an experienced coder, you might be happy to learn that we won’t verify you actually used those
prior functions to help create this one. They exist to provide guidance and partial marks for beginning coders.
9
2. Computes a hash tag t on the plaintext by applying SHA-2 224 to the byte array B, then
appends t to B to obtain an extended byte array B0 = B||t (here, as always, “||” denotes
concatenation).
3. Derives an encryption key by applying SHA-2 224 to the password p and truncating the
result to the appropriate length for use in AES-128.
4. Generates a random 16-byte initial value IV (for use in CBC mode) and writes it to a
file F.
5. Pads the extended byte array B0 using the PKCS7 format, then encrypts the padded array
with AES-128-CBC and appends the resulting ciphertext to the file F.
Out of laziness, Bob is known to use strings of the form YYYYMMDD, which mark certain
dates from his life, as his passwords. Bob was born in the year 1984, and always includes the
string FOXHOUND in his communications.
Problem. Create a Python 3 program that takes as input a file produced by Bob’s hashthen-encrypt routine and performs the following tasks:
1. Determines the password used to derive the encryption key, and prints it out.
2. Decrypts the ciphertext using this key.
3. Checks the resulting plaintext for the phrase CODE-RED. If present, the program replaces
this phrase with the phrase CODE-BLUE and writes the modified plaintext to a new file.
If not present, the plaintext is left unchanged.
4. Computes a new hash tag on the possibly-modified plaintext.
5. Generate a new 16-byte IV.
6. Re-encrypt the possibly-modified plaintext plus new tag using the same password and
exactly the same process as Bob’s hash-then-encrypt program.
7. Writes the result to a new file. Recall that AES-128-CBC has a block size of 16 bytes.
Specifications. Fill in the empty functions in the template program encrypt_modify.py.
Once complete, you should be able to replicate Bob’s encryption scheme by running
python3 encrypt_modify.py –encrypt [PLAINTEXT] –password “YYYYMMDD”
[CYPHERTEXT]
where PLAINTEXT is the name of the file to by encrypted, YYYYMMDD is a number Bob would
use for a password, and CYPHERTEXT is name of the file to store the output. Cracking the
password, doing the text substitution, and reencrypting the result can be performed by
running
python3 encrypt_modify.py –modify [ORIGINAL CYPHERTEXT] [NEW CYPHERTEXT]
where the input file ORIGINAL CYPHERTEXT is the output of the encryption process, and NEW
CYPHERTEXT is an encrypted version of the possibly-modified plaintext. Don’t forget to print
out the password to the terminal.
In detail, the functions you’ll need to fill in are:
10
1. create_iv( length ), which creates an IV that’s length bytes long.
2. derive_key( input ), which takes a string and hashes it to form a key for AES-128.
3. pad_bytes( input ), which pads input so it can be encoded by AES-128-CBC.
4. encrypt_bytes( input, key, iv ), which encrypts input with the given key and IV.
5. hash_pad_then_encrypt( input, string, iv ), which uses the above functions to perform Bob’s encryption algorithm.
6. check_tag( input ), which verifies the tag appended to input matches the hash of the
remaining bytes.
7. decrypt_unpad_check( input, string ), which reverses Bob’s encryption algorithm,
and both verifies and strips off the tag.
8. generate_passwords( year, month, day ), which generates a list of passwords to try,
according to Bob’s format, with the earliest starting on year/month/day.
9. determine_password( input ), which tries to brute-force decrypt input using generate_passwords(
year, month, day ) and decrypt_unpad_check( input, string ).
10. attempt_substitute( input, codeword, target, substitute ), which combines all
of the above to do the decryption, substitution, and re-encryption.
Additional details are in the template program encrypt_modify.py, on the Piazza resources
page.
Submission. Please submit a completed version of the template program, with the filename
encrypt_modify.py
If you’ve spread your code across multiple source files, submit all of them.
11