TactStyle: Generating Tactile Textures with Generative AI for Digital Fabrication

Publication

Faraz Faruqi, Maxine Perroni-Scharf, Jaskaran Singh Walia, Yunyi Zhu, Shuyue Feng, Donald Degraen, Stefanie Mueller.

TactStyle: Generating Tactile Textures with Generative AI for Digital Fabrication
Published at ACM CHI '25.

DOI PDF Video Slides

Press

Video

Slides

tactstyle-personalizing-3dprintable-models-with-generative-ai

TactStyle: Generating Tactile Textures with Generative AI for Digital Fabrication

Teaser figure depicting an Airpods cover stylized with four different textures

Figure 1. TactStyle allows creators to stylize 3D models with image input while incorporating the tactile properties of the texture in addition to its color. Here, we show four different textures applied to the same 3D model, an Airpods cover, with the image stylization prompt shown on the bottom right. The different textures used are: a) round stone roof, b) layered brown rock, c) herringbone wood, and d) colorful hexagonal tiles.

ABSTRACT

Recent work in Generative AI enables the stylization of 3D models based on image prompts. However, these methods do not incorporate tactile information, leading to designs that lack the expected tactile properties. We present TactStyle, a system that allows creators to stylize 3D models with images while incorporating the expected tactile properties. TactStyle accomplishes this using a modified image-generation model fine-tuned to generate heightfields for given surface textures. By optimizing 3D model surfaces to embody a generated texture, TactStyle creates models that match the desired style and replicate the tactile experience. We utilize a large-scale dataset of textures to train our texture generation model. In a psychophysical experiment, we evaluate the tactile qualities of a set of 3D-printed original textures and TactStyle's generated textures. Our results show that TactStyle successfully generates a wide range of tactile features from a single image input, enabling a novel approach to haptic design.

INTRODUCTION

With the growing popularity of 3D printing research within HCI, there is also increasing interest in developing tools that enable users to customize 3D models. Open-source repositories, such as Thingiverse, are a useful resource for ready-to-print 3D~models. However, their customization is limited to changing predefined parameters. Recent advances in Generative AI allow users to more freely customize their 3D models using text prompts or images as user-provided style descriptions. However, these existing frameworks for stylizing 3D models primarily focus on modifying the model to match a desired visual appearance as described via the provided text or image prompt.

One underexplored area of model customization is texture—specifically, the `tactile feedback' of printed structures, such as whether a surface feels smooth, rough, or potentially reminiscent of materials like wood grain or stone. Our ability to sense these textures through touch plays a crucial role in our interactions with the physical world, shaping not only how we perceive and manipulate objects but also influencing our emotional and cognitive responses to them. Therefore, augmenting printed structures with appropriate tactile properties can enrich interaction with physical objects, especially when mimicking materials that differ from the 3D printing material.

Recent advances in computer vision have proposed methods for capturing high-fidelity visual properties from images, enabling digital replication of textures from real-world surfaces. However, these techniques are limited to digital replication, as highlighted in TextureDreamer, where they do not optimize normal maps to avoid details that are inconsistent with the target mesh. In the field of digital fabrication, researchers have proposed techniques to capture surface microgeometry as a heightfield and use this data to replicate the texture using fabrication methods such as 3D printing. However, they currently require sophisticated equipment such as photometric sensing techniques to capture the surface microgeometry of each texture, limiting their usability. Thus, image-based replication is currently limited to replicating visual elements of textures, and creators are currently limited to replicating textures through associated digital surface microgeometry data. We hypothesize that by learning a correlation between a texture's visual image, and its heightfield (surface microgeometry), we can replicate the tactile properties of textures, directly from an input image.

We present TactStyle, a system that allows creators to stylize 3D models with texture images while incorporating the expected tactile properties. TactStyle accomplishes this by separating the visual and the geometry stylization, and augmenting the process with a novel geometry stylization module that replicates the tactile properties of textures based on user input. The novel geometry stylization module uses a fine-tuned variational autoencoder (VAE), that translates the user provided visual image of a texture into a surface microgeometry or heightfield. The model then uses this heightfield to manipulate the geometry to create the tactile properties on the 3D model surface. Separately, the visual appearance of the 3D model is optimized with a method that has been shown to accurately replicate visual qualities. Thus, by optimizing 3D model surfaces to embody both the color and tactile properties of a given input image, TactStyle allows creators to generate stylized models that not only visually match the desired style but also replicate the tactile experience.

FORMATIVE STUDY

We hypothesize that current stylization frameworks that leverage latent representations such as CLIP are efficient at replicating the visual appearance of a texture but ineffective at replicating its tactile properties. We first test this hypothesis by performing a formative study. Although stylization frameworks allow both text and image-based stylization, we consider only image-based stylization for our experiments. This is because text-based stylization methods require a captioning technique to generate textual descriptions of textures, which may not express all its details. This limitation was also highlighted by TextureDreamer. Thus, in this formative study, we focus on testing the stylization of a 3D model based on image prompts.

Dataset and Stylization Baseline

To investigate the accuracy of texture replication, we use a large-scale dataset of PBR (Physically Based Rendering) textures from CGAxis. This dataset contains both visual and heightfield information about textures. We collect a total of 500 textures which contain textures for`Parquets', `Wood', `Rocks', `Walls', and `Roofs'. For each of these 500 textures, we take the visual texture and its associated heightfield as ground-truth pairs.

For the stylization framework, we use Style2Fab, which allows users to personalize 3D models based on text prompts. We modified Style2Fab's system to take image prompts instead of text by changing the hyperparameters in the stylization module.

Procedure

For consistency, we perform stylization of a single tile of size 5x5x1 cm³. To create the ground truth set of textures, we apply the heightfield from our dataset on the tile surface following the technique from Degraen et al. We take 50 random textures from our dataset (10\% of the dataset size) and stylize the tile with the texture image as the prompt. We subdivide the tile surface to 25k resolution for accurate texture generation and run the stylization process for 1500 iterations, as specified in Style2Fab. We apply stylization to only one face of the tile, the same as that of the ground truth textures, and freeze the geometry on the remaining faces, retaining a flat surface. This allows for a consistent comparison. Stylization iteratively modifies the geometry and color channel of the 3D model, and using the CLIP loss to assess the stylization quality. At the end of the study, we have 50 modified 3D tiles created using 50 random textures from our dataset.

Results

To quantitatively assess the fidelity of the stylized textures in replicating the ground-truth textures, we compare the Root Mean Square (RMS) values of the textures' heightfields as it has been shown to correlate to surface roughness. We take the 50 heightfields associated with the texture images used to stylize the 3D tiles. To extract the heightfield from the stylized tile, we take the boolean difference of original unstylized tile, and then map the displacement of the modified vertices onto the grayscale range(0 - 255). The RMS values capture the overall surface variation, allowing us to evaluate the differences between the original textures and the stylized outputs.

Formative Study figure showing RMS surface values of original and stylized textures

Figure 2. The boxplot shows the distribution of Root Mean Square (RMS) values for original and stylized textures, representing surface roughness. The original textures exhibit a wider range of RMS values, indicating higher variability in surface roughness. In contrast, the stylized textures have consistently higher RMS values with less variability, indicating rougher and more uniform surfaces as a result of the stylization process.

Figure 2 presents a boxplot comparing the RMS distributions for the original textures and their stylized counterparts. We observe that the stylized textures generally exhibit higher RMS values compared to the original textures. RMS values can be interpreted as a metric for surface roughness suggesting that the stylization process results in rougher surfaces. Moreover, the RMS values for the original surface textures have a wider range of values, showing higher variability, whereas the stylized surfaces have lesser variability indicating more uniformity.

To determine the statistical significance of the observed differences, we perform a Welch's t-test between the RMS values of the original and stylized textures. The test reveals a statistically significant difference between the two groups (t = 11.89, p < 0.0001), indicating that the stylized textures have significantly different RMS values compared to the original heightfields.

This result suggests that Style2Fab and similar stylization strategies do not accurately modify the surface geometry to replicate a specific texture's heightfield. Further refinement in the stylization process could enhance the replication of texture variation for more accurate texture replication in 3D models. In the next sections, we present TactStyle, a system that allows creators to accurately replicate the tactile properties via a new geometry stylization approach.

SYSTEM OVERVIEW

Prior work has shown that Generative-AI based stylization methods closely approximate the user's style visually. However, our formative study found that such geometry modifications do not accurately replicate the desired texture represented by its surface microgeometry. We designed TactStyle to enable the replication of a surface's microgeometry, and by extension, its tactile properties.

Figure 3. The TactStyle augments traditional 3D model stylization techniques by introducing a novel geometry stylization module that replicates the tactile properties of textures based on user input. (a) The system takes an input model and a stylization prompt (e.g., an image of a texture) and applies two separate stylization processes: (1) Color Stylization and (2) Geometry Stylization. The color stylization modifies the model's visual appearance, while the geometry stylization alters its surface to reflect tactile properties. The two modules operate in tandem, creating a stylized 3D model that replicates both the visual and tactile aspects of the texture. (b) The geometry stylization module uses a variational autoencoder (VAE) to generate heightfields from texture images, which are then applied to modify the model's surface geometry, enabling co-optimization of geometry and color for a unified tactile and visual experience.

TactStyle augments existing stylization methods with a new approach to modifying the geometry of 3D models that replicates the tactile properties of the texture described by the user. It accomplishes this by co-optimizing the geometry and the color channels separately. We call these two modules: (1) the color stylization module and (2) the geometry stylization module. The focus of this paper is on the geometry stylization module.

Our main challenge was to design a geometry stylization module that modifies a 3D model's geometry to replicate the tactile properties of a texture. We leverage the fact that heightfields can be represented as images, and thus TactStyle accomplishes this goal by fine-tuning an image generation model to generate heightfields based on visual images of a texture. This heightfield is then used to modify the 3D model's geometry using an approach based on UV mapping. Thus, the color and geometry stylization modules work in tandem, stylizing the color and geometry of the 3D model to replicate both the visual appearance and tactile properties of a given texture (Figure 3a).

HEIGHTFIELD GENERATION TECHNIQUE

In this section, we describe our novel heightfield generation model. This model takes a texture image as a prompt and generates the associated heightfield. For this purpose, we fine-tune a trained diffusion-based Image-to-Image model and integrate it into the TactStyle system. In the following subsections, we describe the modified architecture of the diffusion model and the dataset used to train and test the system.

Diffusion Model

We approach this problem as an image generation task and use a modified version of the Stable Diffusion model, a popular open-source image generation model. Specifically, we use an image-to-image generation model proposed in SDEdit. This deep-learning model uses a diffusion model to synthesize new realistic images. Given an input image along with a user prompt in the form of text or image, SDEdit first adds noise to the input, then subsequently denoises the resulting image to generate a modified image based on the user prompt. At the core of this diffusion-based generative model is a variational autoencoder (VAE), which encodes images into a latent representation, and decodes that latent representation into an image.

The VAE is trained to encode an image into a latent representation, a compact high-dimensional representation that can then be `decoded' using another network called a `Decoder' to generate another image. More details on the architecture and training approach are available in Meng et al. and Kingma et al. Our goal with this model was to generate a heightfield given an image of a texture. Since heightfields are traditionally represented as grayscale images, we (1) modify the VAE architecture to generate representative grayscale images, i.e., heightfields, (2) and fine-tune the trained model on our texture image-heightfield pairs.

Modified Model Architecture

We use a trained open-source Image-to-Image Generation model available through the Diffusers library. As described above, this model's essential component is the VAE, which encodes an image into a latent representation and then decodes it into another image. This VAE is structured to generate images in 3 (RGB) channels. We modify the architecture's decoder module by adding 4 additional layers to learn heightfield features and modify the final layer to output single-channel grayscale images. This approach was motivated by the fact that the pre-trained model was trained to generate colored images, and there are additional features that the model would need to learn to generate heightfield-specific features.

In fine-tuning our modified image generation model, our goal was both to maximize the similarity in intensity between the target and generated heightfield and minimize their perceptual difference. For comparing overall intensities, we use the Mean Squared Error Loss (MSE), a standard in regression and image generation tasks. For the perceptual similarity metric, we use the Structural Similarity Index Measure (SSIM). These two loss functions serve two different purposes. MSE calculates an average of per-pixel similarity that provides a guide towards a similar intensity in generated images. However, independent training with MSE does not generate high-quality heightfields because it assumes pixel-wise independence. For instance, blurred images can have a large perceptual difference but a small MSE loss. SSIM on the other hand, takes into account the luminance, contrast, and structure of the two images being compared, highlighting local structural differences. Thus, a combination of these two loss functions allows us to generate heightfields that are similar in both overall intensity (MSE) and local structural features (SSIM). In training our model, we use these both loss measures.

Training Methodology

The Variational Auto Encoder (VAE) is fine-tuned for generating accurate heightfields using the PBR Dataset consisting of texture image-heightfield pairs. The associated heightfields serve as ground truth representations of the tactile features, and our model learns the correlation between visual appearance and tactile properties.

We fine-tune the model updating the decoder parameters over 60 epochs, using a batch size of 10 images with an RMSprop optimizer. We use a lower learning rate (1e^-5) for fine-tuning existing layers in the VAE model, and a higher learning rate (1e^-3) for the newer layers. This was done because the original layers are already trained and need small adjustments, whereas the newer layers are randomly initialized and require larger changes. Since our goal was to modify the `Decoder' module of the VAE, we froze the weights for the encoder module, training the weights for only the decoder module.

Dataset

To train our model on realistic textures, we utilize the CGAxis repository, which contains a wide range of textures designed to provide accurate real-world simulations of materials in 3D environments. We collected 500 pairs of texture images and corresponding heightfields in 4k resolution. The dataset contains 5 different material types: `Parquets', `Wood', `Rocks', `Walls', and `Roofs', containing 100 textures each. This allows for a diverse set of textures to train our model. For each of these 500 textures, we collect the visual texture and its associated heightfield as ground-truth pairs. These heightfields represent the tactile features of the textures and are critical for learning the correlation between visual appearance and haptic properties.

We split our dataset into a train and test set, using a 90\% - 10\% split, resulting in 450 textures to train our model and 50 textures to test it. We also augment the train set by rotating each image-heightfield pair by 90 degrees three times, effectively generating four variations for each texture and resulting in a total of 1,800 textures in our train set. This augmentation allows us to increase the diversity of the data, providing a more comprehensive set of examples for training our model without introducing synthetic artifacts. This enables the model to learn more robust and invariant representations of visual and tactile features, improving its ability to generalize across different orientations of textures.

Texture Application

To apply our textures to 3D models, we apply the heightmap by displacing vertices along their normals based on the corresponding height values from a UV map normalized to fit the texture map, producing a texturized object ready for 3D printing. This process creates a final texturized object that is ready to be 3D printed, as shown in geometry stylization step of Figure 3a.

USER INTERFACE AND WORKFLOW

Figure 4: TactStyle's user interface, implemented as a Blender plugin, allows users to load (a) original 3D model and (b) stylize with image prompts. In order to use TactStyle, the user (c) loads the model, (d) uploads an image of their desired texture, (e) optionally adjust the Texture Magnification Factor to control the level of height displacement applied on the 3D model. (f) Finally, the user clicks the "Stylize" button, which starts the stylization process using TactStyle's integrated color and geometry stylization modules.

TactStyle has been implemented as a plugin for the open-source 3D design software tool Blender to allow easy integration with makers' existing workflows. Figure 4 shows a view of the interface. To stylize a model with TactStyle, the user (1) loads their model, (2) uploads the image prompt of the desired texture, and (3) clicks the stylize button. TactStyle then processes the model and stylizes it using the integrated color and geometry simulation modules. The stylized model is rendered next to the original model, which the user can export for fabrication.

Preprocessing

Once the user has loaded an OBJ file of their 3D mesh into the plugin, the model is automatically pre-processed for stylization. The model is first standardized to a unit-sized cube for stylization. Next, we use Pymeshlab to increase the model's resolution by subdivision to 25k faces following the standardization protocol from Style2Fab. This enables accurate stylization of the model by increasing the number of vertices on the model, which are then modified both in color and geometry to approximate the style desired by the user.

Stylization

TactStyle used two modules — the color stylization module and the geometry stylization module to optimize both visual and tactile properties. As shown in Figure 3, TactStyle uses Style2Fab for iterative color optimization. Here the model's geometry is frozen, and the generative AI model modifies the color channels of the vertices to approximate the style in the image. Next, the geometry stylization module uses the modified image generation model to generate a heightfield using the texture image prompt provided by the user. This heighfield is applied on the model using the technique described in section. The completed model is rendered alongside the original model for review. Furthermore, the segmentation tool from Style2Fab has been integrated into TactStyle. This allows the user to have multiple textures on the same model. For this, the user can segment the model through Style2Fab's segmentation, and then apply TactStyle on individual segments.

Fine-Tuning and Export

Users can iterate on this process and apply new styles using new image prompts as needed. Users can also optionally increase the amount of height displacement to magnify their texture by changing the `Texture Magnification Factor' slider shown in Figure 4e which can exaggerate or diminish the texture applied. This factor is by default at 1.0, which corresponds to the value used in our study following the height displacement values from Degraen et al.

Finally, the user can export the stylized model and fabricate it.

TECHNICAL EVALUATION

To validate the effectiveness of our system, we conducted a quantitative evaluation. We evaluate TactStyle's performance on its ability to replicate the surface micro-geometry, represented by the ground truth heightfield. We evaluate TactStyle's results using two metrics: (1) the RMS Error, which represents the difference in surface roughness, (2) the Mean Squared Error (MSE) which calculates the average error in per-pixel intensity between the textures. In the following subsections, we discuss the results of the quantitative evaluation.

Figure 5: Quantitative Comparison of TactStyle and Stylized Textures, and original heightfields. (a) Comparison of RMS values for Original, Stylized, and TactStyle textures, demonstrating that TactStyle replicates surface roughness more closely to the Original textures. (b) Box plot of MSE loss between the Original textures and the Stylized and TactStyle textures. TactStyle exhibits significantly lower MSE compared to the stylized method, indicating more accurate texture replication. (****p < 0.0001)

Analyzing Root Mean Square Error: We evaluate the Root Mean Square (RMS) values of the generated heightfields, which allow us to compare the overall surface roughness of the generated textures. Figure 5a shows the comparison between the Original, Stylized, and TactStyle textures. The original textures exhibit a wide range of RMS values, reflecting the inherent variability in surface roughness across different textures.

To evaluate the differences in RMS values between the Original, Stylized, and TactStyle textures, we performed a Welch's ANOVA test which indicated a statistically significant difference (F = 47.58, p < 0.0001). Next we conducted a Games-Howell post-hoc analysis. We found a significant difference between Stylized and Original textures (T = 6.79, p < 0.0001) and TactStyle and Stylized (T = 14.34, p < 0.0001) textures. However, we found no significant difference between the TactStyle and Original textures (p > 0.05). These results suggest that stylization process results in textures with significantly higher surface roughness compared to the original and TactStyle's generated textures.

Analyzing Mean Squared Error: MSE measures per-pixel intensity differences, where 0 indicates identical per-pixel intensities, and 1 indicates completely different intensities. As shown in Figure 5b, TactStyle exhibits lower MSE (M = 0.03, std-dev = 0.03) compared to the stylized method (M = 0.10, std-dev = 0.05), indicating more accurate texture replication. To evaluate statistical significance, we conducted a Welch's t-test, and found that the MSE Loss for TactStyle's results was significantly lower than that of Stylized results (F = 6.79, p < 0.0001).

PERCEPTION STUDY

In order to evaluate TactStyle's accuracy at replicating the tactile feedback of textures, we performed a psycho-physical experiment used to evaluate texture replication techniques. The goal of our study was to understand if the reconstructed heightfield from TactStyle creates similar tactile perceptions to the original heightfield. In addition, our second goal was to evaluate if the tactile perception of TactStyle's heightfields are similar to the expected tactile perception from just looking at a visual image of the texture. To compare tactile properties, we take a representative set of descriptors from Degraen et al.

Figure 6: 3D-Printed samples of 15 textures from our test set used in perception study: We created four different sets for our perception study: the `visual set', `original set', `TactStyle set', and the `stylization set'. The original set was created with the heightfield associated with the texture and served as the groundtruth. The reconstructed set was created using TactStyle, with the texture image as input. The visual set was created using printed texture images. Finally, the stylization (baseline) set was created using Style2Fab, using the texture image as input.

Conditions

We created four distinct conditions to evaluate the tactile and visual characteristics of textures. These conditions were designed to isolate specific aspects of perception, allowing us to better understand how each modality contributes to the overall experience of texture replication. The conditions were:

Visual (No Heightfield): The texture image was printed with no heightfield on glossy paper and pasted on flat tiles.
Original Heightfield: The texture was printed with the heightfield originally provided with the texture (groundtruth).
TactStyle: The texture was printed with the heightfield created from TactStyle with the texture image as input.
Stylization: The texture was printed with the heightfield generated from Style2Fab using the texture image as input.

This separation of conditions was motivated by the literature on visuo-haptic stimuli integration by humans. When humans explore objects with their hands, vision and touch both provide information for estimating the properties of the object. Vision frequently dominates the integrated visual-haptic percept. To address this, we kept the visual and tactile conditions separate to mitigate cross-modality influence and isolate modality-specific effects to assess tactile perception. Here, condition 4 (stylization) is our baseline to be compared with TactStyle's results.

In this experiment, we investigate the following research questions:

RQ1: How accurately does TactStyle replicate the tactile properties of a texture, as represented by its original heightfield?
RQ2: To what extent do TactStyle-generated textures align with user expectations based on their visual appearance?
RQ3: How do the tactile expectations derived from a texture's visual appearance differ from its actual tactile properties, as represented by the heightfield?

Dataset

We collected 15 random samples from our test set, matching the size of the sample set used by Degraen et al. in their study on the perceptual similarity between real textures and their digital replicas. As these textures were not used to train the model, they can be used to evaluate TactStyle's ability to replicate unseen textures. Figure 6 shows the models used in our study. This gave us a total set of 60 models (4 conditions for each of the 15 textures). All the conditions were presented to the user on tiles of the same size - 5cm x 5cm x 1cm. Our models were printed using an SLA printer, namely, Elegoo Saturn 3 Ultra. In order to keep our printed objects comparable to the previous studies, we used the Elegoo Resin Standard 2.0 - Grey, which has a Shore Hardness of 80-86 (Scale D). For comparison, Degraen et al. used a material with Shore Hardness of 83-86 (Scale D). We printed all the samples at a layer resolution of 30 μm.

Study Design

We used a within-subjects experimental design. To control for carry-over effects, we counter-balanced conditions using round-robin ordering between participants. Our study was structured as a self-assessment test in which participants compared and recorded perceptual attributes of the 3D-printed texture samples from the different conditions.

During the study, each participant recorded their ratings of the sample in terms of hardness, roughness, bumpiness, stickiness, scratchiness, uniformity, and how isotropic the surface is, each on a 1-to-9 Likert scale, 1 indicating a low assessment and 9 indicating a high assessment of the respective variable. To rate these dimensions, participants were asked the following questions:

Q1: How hard does this surface feel? (1 meaning extremely soft, 9 meaning extremely hard)
Q2: How rough does this surface feel? (1 meaning extremely smooth, 9 meaning extremely rough)
Q3: How bumpy does this surface feel? (1 meaning extremely flat, 9 meaning extremely bumpy)
Q4: How sticky does this surface feel? (1 meaning extremely slippery, 9 meaning extremely sticky)
Q5: How scratchy does this surface feel? (1 meaning extremely dull, 9 meaning extremely scratchy)
Q6: How uniform does this surface feel? (1 meaning extremely irregular, 9 meaning extremely uniform)
Q7: How isotropic does this surface feel? (1 meaning extremely anisotropic, 9 meaning extremely isotropic)

Hardness, Roughness and Stickiness are motivated by related work indicating these are the base dimensions of tactile discrimination. Bumpiness and Scratchiness are informed by the notion that roughness can be divided into respectively macro and micro dimensions. The inclusion of Uniformity and Isotropy stems from the fact that our textures embed some directionality and localized variations, which affect perception during tactile exploration. While other works have considered the additional dimension of Hairiness, we excluded this descriptor since none of our textures in our dataset were representative of it.

Apparatus

Our apparatus was built to limit visual cues and ensure accurate recording of purely tactile perceptual attributes of the textures. Participants were positioned in front of a screen that separated them from the experimenter, as shown in Figure 7. A small opening in the screen, covered by a piece of cloth, allowed participants to reach through and access the samples, placed by the experimenter. On the other side, the experimenter arranged the samples for the participants to explore. The samples were held in place with a laser-cut wooden frame.

Experimental setup for the perception study

Figure 7: Experimental Setup for perception study: a) Experimenter side, b) Participant side

Participants

A total of 15 participants (6 female, 9 male, 22 - 38 years, M=27.3 years, SD=5.4) were recruited for our study. When asked about their hand dominance, 14 participants indicated to be right-handed with 1 participant indicating ambidexterity. All participants chose to use their right-hand index finger for the study. They were informed that they could only use this finger throughout the study for consistency in perception. All participants indicated that they do not suffer from any impairment to haptic perception to best of their knowledge. Participants were compensated with 20$ an hour for the 90-minutes long study.

Study Procedure

During this stage, the experimenter placed one sample at a time at a fixed location behind a screen. The participant could insert their hand into the screen and feel the texture but were not allowed to see the texture itself. The participant was then asked to explore the texture and rate its tactile properties based on the 7 descriptors. During the visual perception stage, the samples were placed on a board next to the screen where the participants could see the texture but were not allowed to touch them. This allowed us to isolate visual and tactile perception for textures evaluated in the study, and mitigate cross-modal influence. Prior work has shown that human fingers are particularly sensitive in perceiving and distinguishing textures. Since all the participants chose their right hand for the study, all participants were requested to use only their right index finger throughout their study for consistency. The interaction window was limited to 5 seconds per sample so that the participants' first impressions could be communicated. All participants answered the descriptor questions for all 60 textures.

Ethical approval for this study was obtained from the Ethical Review Board of the author's institute.

RESULTS

Figure 8: Box Plots showing the individual assessments on Hardness, Roughness, Bumpiness, Stickiness, Scratchiness, Uniformity, and Isotropy. Tactile Correlations are shown as heatmaps showing the correlations between the 4 conditions for each descriptor. (*p < 0.05, **p < 0.01, ***p < 0.001)

In the following section, we describe the analysis and the obtained results from our texture perception study.

Comparing Visual and Tactile Ratings

In this section, we present the results from the perception study. To analyze the individual tactile assessments, we conducted Friedman tests with post-hoc analysis using Wilcoxon signed-ranks tests and Bonferroni-Holm correction. Figure 8 shows box plots for each assessment. The assessment data for each descriptor is provided in Appendix in our paper.

Hardness: Users perceived significant differences in Hardness between Original vs. Visual, Original vs. Stylization, Visual vs. TactStyle, Visual vs. Stylization, and TactStyle vs. Stylization. However, users did not perceive significant differences in Original vs. TactStyle.

Roughness: Users perceived significant differences in Roughness in all comparisons except between the Visual and the TactStyle conditions.

Bumpiness: Users perceived no significant differences in bumpiness between Original vs. Visual, or Visual vs. TactStyle. However, they perceived significant differences between Original vs. TactStyle, Original vs. Stylization, Visual vs. Stylization, and TactStyle vs. Stylization.

Scratchiness: Users perceived significant differences in Scratchiness between Original vs. Visual, Original vs. Stylization, Visual vs. TactStyle, Visual vs. Stylization, and TactStyle vs. Stylization. However, they did not perceive any significant difference between Original vs. TactStyle.

Stickiness: Users perceived no significant difference in Stickiness between Original vs. Visual, Visual vs. TactStyle, or TactStyle vs. Stylization conditions. However, significant differences were found between Original vs. TactStyle, Original vs. Stylization, and Visual vs. Stylization.

Uniformity: Users perceived no significant differences in uniformity between Original vs. Visual, Original vs. Stylization, Original vs. TactStyle, or TactStyle vs. Stylization. However, significant differences were found between Visual vs. TactStyle and Visual vs. Stylization.

Isotropy: Users perceived no significant difference in isotropy between Original vs. Visual, Original vs. TactStyle, or Visual vs. TactStyle samples. However, they perceived significant differences between Original vs. Stylization, Visual vs. Stylization, and TactStyle vs. Stylization.

Perceptual Correlations

To uncover relationships between different tactile perceptions in our samples, we performed Spearman's rank-order correlation analysis. For each descriptor, we evaluate the relationship between different texture descriptors across the Original, TactStyle, Visual, and Stylization samples. This analysis helped determine whether the tactile ratings of the textures were correlated across different conditions. Figure 8 shows correlation plots for each assessment. The assessment data for each descriptor is provided in Appendix provided in our paper.

Hardness: There were significant correlations in perception of hardness between all pairs of textures. This can be explained by the fact that the samples were printed with the same material. We found that users perceived significant differences between all pairs except Original vs TactStyle samples. This suggests that TactStyle effectively replicates the tactile properties of hardness from the Original textures, while stylization does not.

Roughness: There were significant correlations in perception of roughness between Original vs Visual, Original v TactStyle, and Visual vs TactStyle. We found that users perceived significant differences in roughness perception in all comparisons except Visual vs TactStyle condition. Thus, TactStyle effectively replicates tactile perception of roughness from visual expectations of a texture.

Bumpiness: There were significant correlations in perception of bumpiness between Original vs Visual, Original vs TactStyle and Visual vs TactStyle. However, we found significant differences in comparing Original vs TactStyle, but not in Visual vs TactStyle, and Original vs Visual. Thus, TactStyle effectively replicates bumpiness of textures from their visual expectations.

Scratchiness: There were significant correlations in perception of scratchiness between Original and Visual, and Original and TactStyle. We found that users did not perceive significant differences between Original and TactStyle samples. Thus, TactStyle effectively replicates Scratchiness of Original textures.

Stickiness: There were significant correlations in perceived stickiness of textures between Original and Visual, Original and TactStyle, Visual and TactStyle, Visual and Stylization. Since users did not perceive any significant differences in perceived stickiness for Visual and TactStyle, this shows that TactStyle effectively replicates perceived stickiness of textures from their visual expectations.

Uniformity: There were significant correlations in perceived uniformity, between Original and Visual, Original and TactStyle, and Visual and TactStyle conditions. Since we found that users did not perceive any significant differences in perceived uniformity between Original and TactStyle, TactStyle effectively replicates perceived uniformity of textures from their Original textures.

Isotropy: There were significant correlations in perceived isotropy between Original and TactStyle, Original and Visual, and Visual and TactStyle samples. Since users did not perceive any significant differences between Original vs TactStyle, Original vs Visual, and Visual vs TactStyle, this shows us that TactStyle is able to effectively replicate perceived isotropy from both Visual expectations and Original textures.

Discussion

We conducted a comparative analysis of the 4 texture sets - Visual, Original, TactStyle, and Stylization, identifying which textures are significantly different on various tactile descriptors, and which of them are correlated. We found that TactStyle effectively replicates visual and Original textures for several key descriptors, and outperforms the baseline stylization method. Based on our analysis, we were able to answer all our research questions.

RQ1: TactStyle accurately replicates several Original Texture Descriptors: TactStyle effectively replicated the tactile experiences expected from visual cues for several descriptors. Hardness, Scratchiness, Uniformity, and Isotropy are correlated between Original and TactStyle samples without exhibiting significant perceptual differences. This suggests that TactStyle is capable of closely replicating the tactile sensations associated with these descriptors from the original textures.

The analysis also showed that Stylization does not perform well in replicating the tactile features of Original textures. In between Stylization and Original samples, significant differences across most descriptors such as Hardness, Roughness, Bumpiness, Stickiness, and Scratchiness, indicate poor alignment between the tactile experiences replicated by Stylization compared to the Original textures.

RQ2: TactStyle accurately replicates several Visual Texture Descriptors: TactStyle effectively replicates several tactile features based on the visual textures, as demonstrated by significant correlations and the absence of perceptual differences in descriptors such as Roughness, Bumpiness, Stickiness, and Isotropy. This indicates that TactStyle can reproduce the tactile experiences expected from visual cues based on these descriptors.

In contrast, stylization does not reliably replicate tactile expectations derived from visual samples. Stylization does not effectively replicate the tactile expectations from Visual samples, with most descriptors showing significant differences.

RQ3: Differences Between Tactile Expectations from Visual Textures and Actual Tactile Perceptions: We also compared the expected tactile perceptions of visual textures, and the tactile perceptions of the original textures. We found that most descriptors such as Bumpiness, Stickiness, Uniformity, and Isotropy align closely between visual perceptions and actual original surface tactile experiences. In contrast, descriptors like Hardness, Roughness, and Scratchiness show significant perceptual differences between Visual and Original samples, although moderate correlations indicate some level of consistency. These results suggest that while certain tactile experiences can be anticipated based on visual cues alone, others may require direct tactile interaction for accurate perception.

APPLICATIONS

Figure 9: Application Examples for TactStyle: a) a phone stand stylized with a wood-parquet texture, b) a granite-textured vase, c) an airpods case stylized with a texture of `round stone roof' and `layered brown rock', d) two tiles, one styled with a volcanic rock texture and the other styled with stone from the Grand Canyon, e) walking stick handle stylized with a rough rock texture.

In this section, we showcase how TactStyle's stylization technique allows users to stylize 3D models with accurate tactile properties for fabrication. We demonstrate five application scenarios across four categories: home decor, personal accessories, tactile learning tools, and personalized health applications. All 3D models shown in Figure 9 were stylized with TactStyle using textures not present in the training dataset and printed on a Stratasys J55 printer.

Home Decor

TactStyle can be used to apply textures to objects downloaded from platforms like Thingiverse, enabling users to enhance the tactile experience of 3D-printed items at home. This allows individuals to create customized, textured versions of everyday objects, adding both aesthetics and functionality. We illustrate two applications of TactStyle in applying textures to functional home objects. Figure 9a we shows a wood-parquet textured phone stand, demonstrating how organic textures can be applied to enhance the visual appeal and usability of frequently handled items. Figure 9b, shows a granite-textured vase. By combining TactStyle and digital fabrication techniques, users can now personalize their objects, or prototype specific tactile properties in addition to the aesthetics of their home decor objects.

Personalizing Accessories

Personal accessories are a popular domain for personalized fabrication. TactStyle enables creators to replicate both the `look' and the `feel' of textures based on image input, allowing creators to create customized versions of their accessories with specific textures and fabricate them with digital fabrication. In Figure 9c we showcase an AirPods case, stylized with two different textures: a round stone roof texture taken from an image of a `round stone roof' (top), and another stylized with an image of a `layered brown rock'. These textures not only provide visual distinction but also have different surface microgeometry, associated with the texture.

Tactile Learning Tools

TactStyle has the potential to create educational tools that enhance learning in subjects such as geometry, topography (e.g., the texture of different terrains), and biology (e.g., the texture of animal skins). To exemplify this concept, we present two examples in Figure 9d: the top surface features a `volcanic rock texture', and the bottom surface replicates the texture of stone from `the Grand Canyon'. Both textures are not present in the dataset, however are samples of a class that TactStyle is trained on. Thus, TactStyle is able to effectively generalize over different texture classes provided they were represented in the training data. Tangible learning materials are well-known to improve educational outcomes, particularly by engaging multiple senses. TactStyle offers a new way to create such materials, allowing educators to bring textures and surfaces to life in the classroom. By giving students the ability to physically interact with these textures, TactStyle could potentially help them better understand the tactile properties of objects, making abstract concepts more concrete and accessible.

Customizable Assistive Devices

In the field of "Medical Making" and "DIY Assistive Technology" personalized fabrication by nontechnical experts is becoming an emerging and critical domain. TactStyle can be employed to customize assistive devices with specific textures, enhancing grip, comfort, or usability tailored to the unique needs of the user. Figure 9e illustrates this by applying a `rough rock' texture to the handle of a walking stick. This texture generates a rough heightfield, which post-fabrication, significantly increases surface friction, thereby improving grip and stability for the user. Such tactile enhancements are particularly valuable for assistive devices, where safety and ease of use are critical, offering a practical solution that can be tailored to the specific requirements of individuals with mobility challenges.

DISCUSSION AND FUTURE WORK

TactStyle demonstrates an ability to replicate both visual and tactile features from an image input, allowing creators to stylize their 3D models for both accurate color replication, and expected tactile properties. In this section, we discuss TactStyle's current limitations, and its possible extensions in the future.

Opportunities for Richer Datasets

TactStyle's performance and robustness to a diversity of textures is dependent the quality and diversity of its training dataset. Currently, the model utilizes the CGAxis repository, which provides 500 texture-heightfield pairs across five material categories: Parquets, Wood, Rocks, Walls, and Roofs. The high quality of the available images, and their corresponding heightfields allowed us to train the image-generation model with a high accuracy. While this dataset offers a diverse selection of real-world textures, this does not cover all types of textures encountered by humans. Additional material categories such as fabrics, metals, and organic surfaces could enhance the model's generalizability, and allow personalization of tactile surfaces in fashion, automotive design, etc. Moreover, expanding the dataset to include dynamic material properties, such as elasticity, thermal responsiveness, or friction, could enable TactStyle to model textures with more complex interactions, further improving the reproducibility of their tactile properties.

Cross-Modal Texture Design

TactStyle is able to replicate specific tactile descriptors from both expected tactile features extracted from the visual texture, and the perceived tactile properties correlated to the original heightfield. This combined replication of expected and perceived tactile properties allows for a cross-model design of textures. Recent work in VR and Haptics have explored novel ways to design and map user-defined tactile properties in virtual reality, such as voice. TactStyle approaches a similar problem, but in the fabrication domain, allowing users to apply textures that have `expected' tactile properties. In the future, this approach can be extended to text prompts, allowing users to describe their expected tactile response, and fabricate a 3D model with such tactile properties.

Incorporating Material Properties of Textures

The material properties of textures play a critical role in defining their tactile experience. Hardness and Scratchiness, for instance, are closely related to the rigidity and resistance of a material, directly affecting how a surface feels when touched. Currently, TactStyle operates by taking images as input to generate tactile features. While this method effectively aligns visual and tactile experiences, there is potential to enhance its accuracy by incorporating material descriptors as input. In this study, we standardized the material used for all texture samples for consistency, to evaluate the accuracy of generated heightfields in replicating tactile perception. However, tactile perception is also influenced by material-specific properties such as compliance, thermal conductivity, and surface friction. Future work could explore how these properties can be replicated by predicting material types that approximate their tactile properties. Additionally, integrating novel approaches like metamaterials could provide new avenues for tailoring and enhancing texture replication across diverse applications.

Analyzing Visuo-Haptic Properties Together

TactStyle currently evaluates visual and tactile perceptions separately, identifying key differences between expected tactile properties based on visual cues and the actual tactile perceptions of textures. These findings highlight an interplay between visual and haptic modalities in shaping texture perception. Future work could explore this property of tactile perception, and leverage visuo-haptic mismatches to create novel experiences, such as "impossible materials" that visually appear soft but feel rigid, defying conventional expectations. Additionally, photochromic materials have been used in prior work to create re-programmable multi-color surfaces. Such materials offer opportunities to dynamically link visual and tactile feedback to create novel dynamic textures.

3D Model Generation with accurate texture information

Recent Generative AI methods have enabled users to generate novel 3D models from scratch based on image and text prompts. However, while current systems excel in generating visual representations of textures, they often lack the capacity to accurately generate the tactile properties on these materials. Since TactStyle works with image modality as well, an extension of TactStyle could allow creators to provide an image of description of a novel object and its expected tactile properties, allowing creators to not only create novel digital artifacts but also fabricatable designs with accurate texture information. By extending generative tools to encode material properties, these models could also propose materials to fabricate the object such that the tactile experience is closely approximated.

CONCLUSION

In this paper, we present TactStyle, a system that allows users to stylize 3D models using image prompts, replicating both visual appearance and tactile properties. By extending generative AI techniques, TactStyle generates tactile features as a heightfield and applies them to 3D models. A quantitative study demonstrates significant improvements over traditional stylization methods. In a psychophysical experiment with 15 participants, we evaluate TactStyle's ability to create textures perceived as similar to both visually expected tactile properties and the original texture's tactile features.

Our findings show that TactStyle successfully aligns visual and tactile properties, enabling more realistic 3D model personalization. This work opens up new possibilities in cross-modal design, and future work can expand TactStyle by incorporating material descriptors to further enhance its tactile accuracy.

ACKNOWLEDGMENTS

We would like to extend our sincere gratitude to the MIT-Google Program for Computing Innovation for their generous support, which made this research possible. Furthermore, we thank Varun Jampani, Yingtao Tian, Vrushank Phadnis, Yuanzhen Li, and Douglas Eck from Google for their valuable insights and feedback on this research.