Project Euler Problem 89 Statement
For a number written in Roman numerals to be considered valid there are basic rules which must be followed. Even though the rules allow some numbers to be expressed in more than one way there is always a "best" way of writing a particular number.
For example, it would appear that there are at least six ways of writing the number sixteen:
VIIIIIIIIIII VVIIIIII XIIIIII VVVI XVI
However, according to the rules only XIIIIII and XVI are valid, and the last example is considered to be the most efficient, as it uses the least number of numerals.
The 11K text file, roman.txt (right click and 'Save Link/Target As...'), contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; see About... Roman Numerals for the definitive rules for this problem.
Find the number of characters saved by writing each of these in their minimal form.
Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.
Solution
An indirect approach
This problem can be solved with a clever trick: instead of fully converting the Roman numerals, we simply perform targeted string replacements to minimize them.
The process involves applying a sequence of replacements in a specific order to condense the numerals into their most compact form. Below are the replacements:
DCCCC to CM, LXXXX to XC and VIIII to IX IIII to IV, XXXX to XL and CCCC to CD
We read the Roman numerals from the Project Euler website as one long string and use these replacements to shorten it. Actually, we don't bother with the exact Roman symbol replacement, since they are all two characters long, we just use two asterisks for simplicity.
By comparing the lengths of the original and minimized strings, we find the total character savings.
Let's consider this example: reduce MMMCCCCXXVIIII to its minimal form and calculate the number of characters saved in doing so. The initial string length is 14 characters.
- The first target replacement is 'CCCC' and it was replaced with 2 asterisks. That saves 2 characters. MMM**XXVIIII
- The next target replacement is 'VIIII' and it was replaced with 2 asterisks. That saves 3 characters. MMM**XX**
- No more targets are found, and our resultant string is "MMM**XX**" with a length of 9 characters. A savings of 5 (14−9) characters.
This solution is a result of lateral thinking. Providing a solution from a much different perspective. Now, I know you're thinking that counting substrings would be faster, but it's not. Try it to find out why.
import re
import urllib.request
file_url = 'https://projecteuler.net/project/resources/p089_roman.txt'
rows = urllib.request.urlopen(file_url).read().decode('utf-8')
print ("Project Euler 76 Solution =", \
len(rows) - len(re.sub("DCCCC|LXXXX|VIIII|CCCC|XXXX|IIII", '**', rows)))
HackerRank requires a translation
HackerRank presents us with a different challenge: convert a string of Roman numerals into its most efficient canonical form. In this problem, the given Roman numerals are already arranged in descending order and do not include subtractive notations like 'IX' or 'IV'.
We use a data structure that provides an integer value, i
, and its respective Roman symbol, r
. The structure is organized in such a way that even valued indexes (starting from zero) represent single character symbols and the odd indexes the two character subtractive symbols. Also, the numerical breaks, 1000, 900, 500, 400, 100, ..., 1, ensure that the subtractive symbols will only be used once and the single symbols at most 3 times (except for 'M').
Conversion from Roman numerals to an integer
We count the occurrences of each single-character numeral and multiply it by its corresponding value. For example, "MMMCCCCXXVIIII" translates to (1000 * 3) + (100 * 4) + (10 * 2) + (5 * 1) + (1 * 4) = 3,429.
Conversion from an integer to an efficient Roman numeral
The conversion from an integer to a Roman numeral is powered by the divmod
function. We loop through each symbol in the roman[] list and see how many of that symbol we can use. Each time through the loop we reduce the original integer value to the remainder of the previous divmod
operation.
HackerRank version
HackerRank Project Euler 89 has us format various Roman numerals up to 1,000 characters long to their minimal form.
Python Source Code
roman = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'),
(50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]
def intToRoman(n):
romanNum = ''
for i, r in roman:
f,n = divmod(n,i)
romanNum+= r*f
return romanNum
s = input().upper()
d = sum(s.count(r)*i for i,r in roman[0::2])
print (intToRoman(d))
Last Word
- The
r*f
may cause some confusion as it is not a multiplication but a method of repeating a string. So, 'X' * 3 yields 'XXX'. - There are many other symbols defined for Roman numerals. These are just the ones most used.