### Project Euler Problem 89 Statement

For a number written in Roman numerals to be considered valid there are basic rules which must be followed. Even though the rules allow some numbers to be expressed in more than one way there is always a "best" way of writing a particular number.

For example, it would appear that there are at least six ways of writing the number sixteen:

VIIIIIIIIIII VVIIIIII XIIIIII VVVI XVI

However, according to the rules only XIIIIII and XVI are valid, and the last example is considered to be the most efficient, as it uses the least number of numerals.

The 11K text file, roman.txt (right click and 'Save Link/Target As...'), contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; see About... Roman Numerals for the definitive rules for this problem.

Find the number of characters saved by writing each of these in their minimal form.

Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.

### Solution

#### An indirect approach

We can solve this problem by using a simple string substitution to determine the number of characters saved. No actual translation is required.Here’s the table of replacements and their equivalents in order of execution:

DCCCC to CM, LXXXX to XC and VIIII to IX IIII to IV, XXXX to XL and CCCC to CD

The file is read from the source URL on Project Euler web site as one long concatenated string. Next, we use a simple regular expression to perform all the character replacements with two arbitrary characters, such as spaces.

By subtracting the length of the new string with replacements from the length of the initial string will equate to the number of characters saved.

Let's consider this example: reduce MMMCCCCXXVIIII to its minimal form and calculate the number of characters saved in doing so. The initial string length is 14 characters.

- The first target replacement we found was 'CCCC' and it was replaced with 2 spaces. That saves 2 characters.
- The next target replacement is 'VIIII' and it was replaced with 2 spaces. That saves 3 characters.
- No more targets are found and our resultant string is "MMM__XX__" with a length of 9 characters. A savings of 5 characters.

This solution is a result of lateral thinking. Providing a solution from a much different perspective. Now, I know you're thinking that counting substrings would be faster, but it's not. Try it to find out why.

```
import re
import urllib.request
file_url = 'https://projecteuler.net/project/resources/p089_roman.txt'
rows = urllib.request.urlopen(file_url).read().decode('utf-8')
print ("Project Euler 76 Solution =", \
len(rows) - len(re.sub("DCCCC|LXXXX|VIIII|CCCC|XXXX|IIII", ' ', rows)))
```

##### HackerRank requires a translation

HackerRank provides a valid string of single character Roman numerals in descending order and wants us to translate it into the most efficient form possible. A big benefit is not having to deal with two character subtractive symbols such as 'CM' and 'XL' in the input.

We use a data structure that provides an integer value, `i`

, and its respective Roman symbol, `r`

. The structure is organized in such a way that even valued indexes (starting from zero) represent single character symbols and the odd indexes the two character subtractive symbols. Also, the numerical breaks, 1000, 900, 500, 400, 100, ..., 1 mandate that the subtractive symbols will only be used once and the single symbols at most 3 times (with the exception of 'M').

###### Conversion from Roman numerals to an integer

Converting to an integer is simply counting each single character symbol and multiplying it by its respective integer value. We loop through the even indexes so that only single character symbols are considered. Looking at our example, 'MMMCCCCXXVIIII', we would calculate the integer value as 1000*3 + 100*4 + 10*2 + 5*1 + 1*4 = 3,429.###### Conversion from an integer to an efficient Roman numeral

The conversion from an integer to a Roman numeral is powered by the`divmod`

function. We loop through each symbol in the `roman`

list and see how many of that symbol we can use. Each time through the loop we reduce the original integer value to the remainder of the previous `divmod`

. The `r*f`

may cause some confusion as it is not a multiplication but a method of repeating a string. So, 'X' * 3 yields 'XXX'.
Again, using our example of 3,429 would result in 'MMMCDXXIX'. Run this number through the algorithm, `function intToRoman`

, to visualize its mechanics.

#### HackerRank version

Re-format various Roman numerals up to 1,000 characters long to their minimal form.

### Python Source Code

```
roman = [(1000, 'M'), (900, 'CM'), (500, 'D'), (400, 'CD'), (100, 'C'), (90, 'XC'),
(50, 'L'), (40, 'XL'), (10, 'X'), (9, 'IX'), (5, 'V'), (4, 'IV'), (1, 'I')]
def intToRoman(n):
romanNum = ''
for i, r in roman:
f,n = divmod(n,i)
romanNum+= r*f
return romanNum
s = input().upper()
d = sum(s.count(r)*i for i,r in roman[0::2])
print (intToRoman(d))
```

### Last Word

- There are many other symbols defined for Roman numerals. These are just the ones most used.