Importing Polygon Data

Time to read: ~10 min

The general procedure and available options on the Import tab is discussed on the Import Page and a majority of those topics are not going to be repeated here.

This page discusses some of the specifics to consider when working with Polygon data as it relates to importing into the InStep Studio Application.

General considerations

Polygon data files contain, in general, a large collection of primitive shapes (the most common being triangles) that define a surface by means of three points. More 'complicated' shapes are Quadrilaterals (defined by four points) and Polygons (defined by a larger number of points). A triangle is the ideal Polygon in this scenario as it is guaranteed to define a flat shape. Quadrilaterals and Polygons in in general, do not guarantee that they are flat as any one of the points could lie outside of the plane. The InStep application assumes that for higher level shapes, the 'same plane' definition is upheld. If not, polygon shapes will start to show strange artifacts due to the way that the data is extrapolated from this assumption. Similarly for Quadrilaterals, it is assumes that the four points lie on a plane as future operations consider the normal vector (the vector that points away from the surface) to be defined by the order of the first three points.
Indeed, for any shape, the direction of the surface (or 'feature' going forward) is defined by the 'Right Hand Rule' of the order of the three points. This rule is quite simple: place the palm of your right hand at the first point, the fingers at the second point and curl your hand in the direction of the third point. Now if you extend your thumb, it will point in the direction of the normal (provided that the normal is perpendicular to the surface and has a length of one, it is the unit normal). Mathematically, this is the normalized cross product of the two vectors going from point one to point two and from point one to point three.
Herein lies one of the first problems that is sometimes encountered:
some applications 'cheat' when they generate STL data. In order to generate simpler shapes, they sometimes generate a triangle that has its two sides collapsed onto the same edge with one edge being longer than the other. This way they can represent what would otherwise be a Quadrilateral shape where three points lie on the same line by means of a single triangle. Though this works fine for representing STL data, it is not acceptable for use in boundary representation data (where edges of features need to be continuous).

Another consideration comes from the lack of units in the majority of Polygon (Mesh) data files. Though the relative scale and information is fully defined by the relative arrangement of the values, it is often uncertain what the overall size represents. It is possible to make an assumption based on where or by whom the file was generated, but this information is not available to the applications reading it. For a number of applications that work with mesh data, the actual size of the shapes is of little concern, probably the reason why units were not considered when the formats where initially developed. However, for mechanical CAD applications, the overall size is of importance. Though the application has the ability to re-size the data, simply looking at this as a way to scale the shapes is trivializing the underlying behavior.
For one, the data being loaded usually has limited precision. In fact, the InStep application assumes that all data being loaded is of Single (aka. Float) precision which usually means that only six or seven digits after the decimal are being used (plus an 10^exponent to multiply the values by and a plus/minus sign).
This has the outcome that where originally two points may have been in very close proximity, once the data becomes scaled, they may no longer be within a distance that the application considers 'equal' and therefore a small crack opens at a location that was considered closed by the application that generated the data. Though the application can heal/repair some of these, it may be beneficial to keep the scale at unity (1.0), work through the process to export the data and only then re-scale the shapes once loaded into its destination application. This is not a requirement, but something to keep in mind when working with data.

Something that deserves some additional consideration is texture and color. Sometimes creators of the files spend a large amount of time defining the exact look that the end data is to contain. The way to do this can be either by generating intricate detail on the surface of the mesh or to super-impose a texture that uses a graphic/image to represent small details. Image based textures are not supported as there is no realistic way by which this can be converted to a geometric representation. Mesh based detail on the other hand, can generate a different set of challenges as it often leads to a very high triangle (poly) count which slows down the application and will ultimately result in very large files that most CAD applications have a problem processing. A reasonable range for most applications is around 10,000 facets though several applications can easily handle larger data sets with 100,000 to 1,000,000 facets not being unheard of.
The thing to remember with files that contain very smooth surfaces and a lot of detail is that it might be best to first start off with a part that has less detail and then to see if the additional detail is really needed. If lower level files are not readily available, the InStep application has a few different ways of reducing the level of detail by iteratively removing detail based on how much influence a feature has on the overall shape. A number of different simplification tools and options can be used for this process (using the Simplify option).

Lastly, sometimes data is loaded that originates from 3D scanner output. Depending on the software used (assuming that the point data wasn't imported as a point cloud file), the data may simply represent a number of surfaces that collectively do not form a closed (aka 'water-tight' or 'manifold' ) body or shape. In these cases, it is difficult to 'repair' the data without making some fairly rough assumptions. Some of the tools can close holes and similar that would lead to a solid body, but in several cases, simply adding data to the closest location that is defined does not generate surfaces that are valid or useful.
For these scenarios, the most likely way to recover something useful is to use a method named Thicken. This, as implied by the name, generates a second set of surfaces that are offset from the first, original set and then joins the edges that are otherwise boundaries. By doing so, a thin solid body is generated that exactly represents the original data but in a way that is acceptable for a BRep type body. The downside is that the amount of data more than doubles which again can lead to issues.

File Size Increase

It is tempting to think that, because BRep files (like the STEP file format) accurately defines solid bodies by means of a boundary definition, the files will be compact and lightweight. Indeed, this is usually the case when comparing files generated in a CAD application with files representing the same bodies but exported to a STL or OBJ format. The reason for the smaller file size in such an export has to do with the STEP file representing just a handful of well defined surfaces (for example circles, cylinders and similar) while the STL file requires many more triangles to achieve a similar level of accuracy through the use of approximated locations on an otherwise perfect circle, arc, or similar. The reverse operation however, is not as simple. It is a difficult process to search for geometric shapes within an otherwise unordered set of triangles to find a group of them that can collectively be assumed to represent such a feature. This very method of searching through such a 'pile' of triangles to find geometries that are otherwise well defined is the value proposition of this application and something that will receive additional focus as it matures.
Nevertheless, under the assumption that the imported data is ideal and that the algorithms can accurately find such features, the data can be greatly reduced as approximate items are replaced with exact representations. The difficulty lies in the fact that frequently the data comes from applications or systems where more or less tolerance is given to the details of these approximations or, for the case of 3D scanned data, the process introduces a certain amount of noise, something that is often seen as a very fine amount of wave-like surfacing on the data. Processing this kind of data is inherently complicated and should be taken into consideration when working with these types of files.
In terms of going from STL to STEP files, if no feature detection (conversion of triangles to geometric features) is applied, the file size will always be larger than the input as each feature relies on a complex feature tree that defines things such as cartesian coordinates, vertices, edges, oriented edges, directions, vectors, edge loops, closed edge loops, planes, etc. All of which combine to define a solid body (assuming that the data indeed defines such). The STL files simply define triplets of points (in the binary format 4 bytes of data are used for each of the points' three coordinates and another 12 bytes for the triangle's normal vector (plus another two for a non-standard color entry). Compared to the plain text format used in STEP files, this is many times more compact if the same data is being represented.

Ascii vs. Binary STL

One confusion about the 'file size' that seems to require some information is when Ascii files are involved. Ascii (American Standard Code for Information Interchange) formatted files are simply files that are written in a way that humans can understand their content. As such, point data is represented by writing out a value like "1.2345" rather than using a four byte block to represent the same data in a IEEE 754 compliant way. Usually, the vertex locations will be explicitly given on a line of text, while in the binary form, it is simply represented by four characters (where the Ascii character code is a value between 0 and 255 and thus considered one byte of data).
For this reason, Binary files are far more compact but carry exactly the same information as Ascii files. The only exception to this is that Binary files reserve two bytes of data after each triangle that (could) be used to define the color of the triangle. Unfortunately, there is no accepted standard by which this color is being implemented and thus usually ignored.