We introduce a novel method for estimating the
structure and joint parameters of articulated objects from a
single casual video, captured by a potentially moving camera.
Unlike previous works that rely on multiple static views
or a priori knowledge of the object category, our approach
leverages 2D point tracking and depth map prediction to
generate 3D trajectories of points on the object. By analyzing
these trajectories, we generate and evaluate hypotheses
about joint parameters, selecting the best combination using
the Bayesian Information Criterion (BIC) to avoid overfitting.
We then optimize a dense 3D model of the object
using Gaussian Splatting, guided by the selected joint hypotheses.
Our method accurately recovers the geometry,
segmentation into parts, joint parameters, and motion of
each part, enabling the rendering of the object from new
viewpoints and under new articulation states. Extensive
evaluations on several datasets demonstrate the effectiveness
of our approach.