Over the last 6 months I have focused on rebuilding the majority of the backend code that deals with calculating parts reuse between sets (i.e. the main reason behind Rebrickable!). I am happy, and quite a bit relieved, to finally announce that this new functionality is now live and I am going to go ahead and call it v2 :)
Alternate Parts/Mold Variations
A major change included in v2 is that we now distinguish between the different mold variations of parts over time. In the past, I had force converted all of these parts into a single identifier so that the matching would give the best results. However, this resulted in set inventories that were not 100% accurate. It was never the original intention of Rebrickable to become yet another authority on inventories, but over time the need for it became clearer. In fact, Rebrickable now powers the inventories displayed on sites such as Brickset. Anyone else can get the same data via the API.
Similarly, we recently embarked on some parts renumbering and splitting up of minifigs (don't hate me!) for various reasons. More details in the forum.
So now that there are different molds of the same part throughout the set inventories, what happens to the build results? In v1 it could not tell the difference between eg a 4085c and 4085d so the % matches would suffer. That’s where Part Relationships come in, and the power of the v2 code comes into play.
These relationships are what tie together the various molds of a single LEGO part design. In addition to alternate molds, there are other relationship types that can be defined:
- Mold = Alternate mold which can be used as a functional drop-in replacement.
- Print = Printed or Painted surface of the part.
- Pattern = Marbled color, embossed or molded patterns. These are currently treated the same as Printed parts in all calculations, but are recorded separately.
- Pair = One of a pair of related parts, e.g. tyre+wheel, left+right panels, etc.
- Alternate = Similar part that can usually be used as a replacement, not necessarily functionally compatible.
Several months ago, I released this feature to the admin team to start populating. We now have over 20,000 relationships defined and it continues to be refined over time.
Some of the more common parts will have a hierarchy of relationships such as:
You can see some of the results of this effort on the part details popups and pages.
This has already made the submission of inventories far simpler than it used to be and this will continue to evolve to become simpler and transparent.
Building in v2
With those pre-requisite features out of the way, we can move onto the core of the v2 improvements – the Build Engine. This is the code that finds you sets you can build with your parts, provide your parts matching % on each set/MOC you visit, and suggest LEGO sets for you to buy to improve your matches. It now takes into account the part relationships, and offers some more flexibility in your build options such as ignoring printed/patterned part differences and mold variations.
To enable this, the Build page has had a bit of a redesign:
You'll notice that the 6 boxes that took Set IDs have been moved behind the "You may also enter some sets manually" link, and there are now only three boxes. Analysis of the usage shows this to be enough and I hope to encourage more people to sign up to use more anyway!
There are a number of new default settings now in use that you can change by clicking the "Show advanced options" link.
The first three options make use of the new part relationships.
- Ignore printed and patterned part differences - Any printed parts from the same mold will be treated as if they were the same part for the matching calculations.
- Ignore mold variations in parts - Any parts that are different molds of the same basic design will be treated the same. Note that these should all be 100% functionally identical and are usually things like strengthening changes in the mold.
- Consider alternate parts that can usually be used as replacements, but are not always functionally compatible - This makes use of the Alternate Part relationships of which there aren't too many yet, but will increase later.
If you have all three options turned on, the Build Engine can for example equate a printed part from mold A with a non-printed part from mold B.
The next two options are used to filter out parts completely from any comparisons.
- Exclude Minifigs and Minifig Accessory parts - Any parts of these types are taken out of the calculations. The idea being they usually don't play an important part of a build - however in some cases they might.
- Exclude Non-LEGO parts such as stickers, gear, etc - Similarly, these types of parts are taken out of the calculations. I can't imagine why you would want these included, but left it an option anyway.
The next set of options are used to filter your search results.
They should all be pretty self-explanatory. The two new choices that weren't anywhere in v1 are the B-Models and Premium MOCs filters. You can also now search the output by a few different methods, although I don't see much value beyond the default of % match.
Another new feature here is the "Save these settings as your default" link which does exactly as it describes. The current values of all the advanced options are saved for reuse. They are used as the defaults next time you start a new build search, and for any automatic build calculations used on the site.
How does it work?
Let's follow an example. You are calculating the % of parts you have for some specific set. This set has part 3626cpr0001 in it, but you only have 3626b. You have chosen to ignore printed part differences so will this part be matched? No, because they are different molds. If you have chosen to ignore both printed part and mold differences, the two parts will be matched successfully.
Using the image of the 3626 Minifig Head relations tree above, the following rules would apply. There are a total of 1674 parts in the tree. With the old v1 code, that would result in 1674 parts that cannot be reused. With the new v2 Build Engine, we have:
- Ignoring Molds = 1 pool of 4 parts (3626, 3626a, 3626b, 3626c) + 1670 various printed parts = 1670 part groups.
- Ignoring Prints = 1 part (3626) + 1 pool of 3626a prints (16 parts) + 1 pool of 3626b prints (1238 parts) + 1 pool of 3626c prints (419 parts) = 4 distinct part groups.
- Ignoring Molds + Prints = 1 pool with all parts in it.
Obviously, these calculations are now heavily influenced by the part relationships that have been setup by the admin team. This will continue to evolve over time, and a lot of the old v1 parts that were forcibly normalised will continue to be fixed.
So to calculate the percentage of matching parts in a single set for a single user, the Build Engine has to perform the following:
For every part in the set do: For every part in the user's list of sets and loose parts do: Lookup the two part's in the relationship tree and see if they are linked together via any number of relationships that meet the criteria being used in this calculation. If they match, yay! If not, display it as a missing part.
If the set has 200 parts, the user has 10000 parts, and there are approximately 5 relationships per part, that makes 10 million database comparisons needed. A build search needs to perform this across every set in the database, so that equates to about 50 billion database comparisons. At any one time, there are between 30 and 100 users online running searches and build calculations simultaneously. And that doesn't include the complexity introduced by fuzzy colour matching, user's lost parts, etc.
And yet, the average build search time is under 1 sec :)
Of course, there are many tricks involved in making it do so much in such a short time, and these will remain the "secret sauce" of Rebrickable!
There are a few small tweaks to be done around the place to better make use of this new functionality, and these will slowly appear over time.
Since this massive change has been my main focus for so long, there is quite a backlog of other smaller fixes and features that need addressing. I will try to prioritise these and get to them soon.
Thank you to all my regular users who provide valuable feedback, even when I don't ask for it ;) These changes are a direct result of your suggestions so keep them coming!