paperJanuary 2025

3D Shape Tokenization

AuthorsRick Chang, Yuyang Wang, Miguel Angel Bautista Martin, Jiatao Gu, Xiaoming Zhao, Josh Susskind, Oncel Tuzel

We introduce Shape Tokens, a 3D representation that is continuous, compact, and easy to integrate into machine learning models. Shape Tokens serve as conditioning vectors, representing shape information within a 3D flow-matching model. This flow-matching model is trained to approximate probability density functions corresponding to delta functions concentrated on the surfaces of 3D shapes. By incorporating Shape Tokens into various machine learning models, we can generate new shapes, convert images to 3D, align 3D shapes with text and images, and render shapes directly at variable, user-specified resolutions. Additionally, Shape Tokens enable a systematic analysis of geometric properties, including normals, density, and deformation fields. Across tasks and experiments, the use of Shape Tokens demonstrates strong performance compared to existing baselines.

Figure 1: Our Shape Tokens representation can be readily used as input / output to machine learning models in various applications, including single-image-to-3D (left), neural rendering of normal maps (top right) and 3D-CLIP alignment (bottom right). The resulting models achieve strong performance compared to baselines for individual tasks.

Figure 2: Overview of our architecture. (Left) We model a 3D shape as a probability density function that is concentrated on the surface, forming a delta function in 3D. (Right) Our tokenizer uses cross attention to aggregate information about the point cloud sampled on the shape into ST. The velocity estimator only use cross attention and MLP to maintain independence between points.

Figure 3: Reconstruction, densification, and normal estimation of unseen point clouds in GSO dataset. For each row, we are given a point cloud containing 16,384 points (xyz only), we compute ST and i.i.d. sample the resulted p(x|s) for 262,144 points. Different columns render the input and the sampled point clouds from different view points. Indicated by the label in the parenthesis, we color the input points according to their xyz coordinates and the sampled points according to their initial noise’s uvw coordinates and their estimated normal (last two columns). Note that we do not provide normal as input to the shape tokenizer.

Figure 4: The ODE integration trajectory defines a mapping from xyz (data) to uvw (noise).

Video 3: The video shows recent single-image-to-3D methods on Google Scanned Objects, which are unseen to all methods.

From left to right:

Input image
Spatter-image (CVPR 2024): trained on Objaverse
Point-e (2022): trained on several million proprietary 3D meshes.
Make-a-shape (ICML 2024): trained on 18 datasets, including Objaverse
Ours: trained on Objaverse

Note that this video does not intend to compare individual methods --- these models differ in their training data (e.g., Point-e was trained on proprietary 3D meshes) and mechanisms (e.g., Splatter-image is not a generative model, our method assumes a known camera model). We provide the results for the viewer's reference. [Credits]

Mesh/Image Credits: Google Scanned Objects, fedomo.ru, Jacob.Elhatmi, WrenArt, undeadfae, Monicag97, STK_produktion, Andi R, xabi, th_jabba, johnnokomis, LasquetiSpice, AdiXXioN, taplinhvip111, Stolmark, Koppany.IDK, vetorprotensao, JacksonSanders, remdwaas, GRAPHTEC AMERICA, iiircha, despinozavi, AstrumProjects, asleshka, ulmsklv, S.Duce, idcim, Darkkostas25, CREATRBOI, steam2020, fedomo.ru, AnirudhRao, 3DFoxHound, pattarrian, katienixdesigns, icepacha, A109082012, RyanCrosby, Armen Gevorgyan, EnjoyLife_Tlt, Fong Chen, WHA Arquitectos, andreagonzalez28, YouSaveTime, Cutestormy, amy3d, daand, EfrenR, Poppy, MARTINICE GROUP, julianChee, Whatsername, Stuart, danielleclark, redkaratz, LuDiChRiS, mbilalsiddique1, Frybrix, defnotdan, invisiprim, Brent Loncher, MrMaxICT, Stevie_66, Jesse Van Norman, WuhuAirline, anyaachan, Lustron.ru, КУКАЛЕВ, Maxmalow, Karolina K Bieńkowska, Steel Frame Solutions Limited, James Robson, tepapalearninglab, exhibitbook, Christopher Cox, apoiocad, Padraig Daly, CurveCreativeStudio, DennisGray, grantbowlds, YouniqueĪdeaStudio, nobodyroo, dinomaster, pattarrian, rodrigo.ferrada, tamaliteitor123, George B, Csaba Baity (tsabszy), tim.a.schmitz, romane_bouverot, RPG_Engineer, rilisjr, DJMaesen, agglover, Adrian Carter, mohamedsuspeito, Kevin Bond, faizn0rdin, SpaceCowBoy, Giravolt, NukedGames, bhrf, mscla1r3, ScannerDev, Vikrama Raghuraman, NoobiePie, prostair.pl, Rzyas, Phil Gosch, gFiamma, pahlevidaffa, Onironauta digital, pixelsquare, SketchingSushi, Mateus Schwaab, archmixes, jacob_kenndey, lidija.simo, Jessica Peterson, Ltcolscotty, 3Dystopia, Vincent Laberge, frdifrn, Frédérick Pagé, camlaneve, Matt, IronEqual, Tursito, Davidk, Mrs.X.A.YarnArt, prostair.pl, ChrisLee, guseu, Guilhermino, dieterreinert, Mattyew, natalimedeiros, leopro, Trappemakeren, beehn, alisachen69, Chrifuf, cncbrasil, zuzana vajdova, nguyenlouis32, DarksProducer, globalshizaku, louayleo, semmert179, naruemol.pholnuangma, Eric Haines, 3DHA, Nick_Sherman, chaosexcell, ssarinareza, aveli.ladva, Tomas Rubianes, RainerWahnsinn, Lucas Jaenisch, cs_adam, trinityscsp, a109082026, JasseeNFT, Cowdi, Kisielev Mikhail, kay Quobad, secretariatep, me16019, scailman, Stichting Consortium Beroepsonderwijs, fedomo.ru, PatelDev, bipolarbear, Emm (Scenario), De Oliveira M., Наруто, Keita-sama, RodierGabrielle, mizuhi, shughes, Gregory Khodyrev, millerj449, Marko31, David_Holiday, edouard.angebault, fedomo.ru, Artem Shamsuarov, Alan Grice Staircase Co Ltd, THESTIG03, vamsikrishna.v, Dundee Howff Conservation Group, sinhoroto, jia100, 10668285, Born_Canadian, jashma82, aki.karppinen, DarkAaron999, Luckster, julius.j.bib, trolosqlfod, RBG_illustrations, fedomo.ru, MOHMAX1, jamesdeantv1, moxmoin, Adrian21, andrea bocchini geometra, Re3xyyz, Binkley-Spacetrucker, FeralMan, unownlord, pigfinite, duperonvincent, ayekerik, 140813, antonio.a.longoria, cyber0063, Mateja Veljkovic, Vonka Stairs Ltd, breezeca, kishi, 97jana, Sogomonyan_Vaagn, peachybunny, gb.prof.69, milen.margaryan2003, nguyenhuydang, andysmiles4games, Aorie, jonamanz9673, mommy long legs, buckygaming2019, gwen.domingo, PointXX, Lukas Guhse, arakiminoru, Tatiana Sumarokova, potaato, Lustron.ru, jhseok8927, Xillute | Dev, re1monsen, c4n, Ceat, joseph.terronez, matousekfoto, Max Wittig, rltw, lsbergin, KIΣITO, Aiden Huxley, 3Dystopia, MartyUkovGBS, Jamie Rose, Mihail.Burduja, ashpatz845, Schack-Trapper, brian.h.moyer, Excel Stairs Ltd, Behets, Noemi.Mancilla.Serrano, madison319478, Drake, xeratdragons, timpugh44, GSMRF, Lauren Hasegawa, Ca7chi, dewathoem, schaffsp, newfields-3dprinting, Dikart, MariaMam, Micayla Spiros, silvinomc00, Neut2000, Orie J. Braun, hafsa.ishtiaq97, Robwaah007, shakiller, newfields-3dprinting, -Slash-, Saumleid, DreamSail Games - Graham, Jingbari, sualogo3d, maypassamon, Uğur Yakışık, Caitlin, LynSalvador, lanvalond, TheDesigner, e90r96, guilherme.vinicius, Lustron.ru, ZOMBIEFOLIFE, TroyMay21, Qubx.3D

3D Shape Tokenization

Related readings and updates.

On Device Llama 3.1 with Core ML

Texturify: Generating Textures on 3D Shape Surfaces

Discover opportunities in Machine Learning.