|
|
결론 및 의미
출생 직후 어머니로부터 미생물이 주로 전파되는 것은 잘 알려져 있지만, 어린이집에서 또래 아기 간 사회적 상호작용이 1세 미만 아기의 장내 미생물군집 발달에 매우 중요한 추가 동인임을 처음으로 대규모 메타지노믹 데이터로 증명한 연구입니다.
이 연구는 유아기 사회적 경험이 장 건강과 면역 발달에 미치는 영향을 새롭게 조명하며, 어린이집 생활이 미생물 다양성 증진에 긍정적 역할을 할 수 있음을 시사합니다.
https://www.dongascience.com/news/76063
Baby-to-baby strain transmission shapes the developing gut microbiome
Baby-to-baby strain transmission shapes the developing gut microbiome
Nature volume 651, pages191–200 (2026)Cite this article
44k Accesses
5 Citations
278 Altmetric
Abstract
The early infant microbiome is largely primed by microbial transmission from the mother between birth and the first few weeks of life1,2,3, but how interpersonal transmission further shapes the developing microbiome in the first year remains unexplored. Here we report a metagenomic survey to model microbiome transmission in the nursery setting among babies attending the first year, their educators and their families (n = 134 individuals). We performed dense longitudinal microbiome sampling (n = 1,013 faecal samples) during the first year of nursery and tracked microbial strain transmission within and between nursery groups across 3 different facilities. We detected extensive baby-to-baby microbiome transmission within nursery groups even after only 1 month of nursery attendance, with nursery-acquired strains accounting for a proportion of the infant gut microbiome comparable to that from family by the end of the first term. Baby-to-baby transmission continued to grow over the nursery year, in an increasingly intricate transmission network with single strains spreading in some classes, and with multiple baby-acquisition and species-transmissibility patterns. Having siblings was associated with higher microbiome diversity and reduced strain acquisition from nursery peers, while antibiotic treatment was the condition that most accounted for the increased influx of strains. This study shows that microbiome transmission between babies is extensive during the first year of nursery, and points to social interactions in infancy as crucial drivers of infant microbiome development.
Similar content being viewed by others
Compiling an early life human gut microbiome atlas and identification of key microbial drivers
Article Open access05 December 2025
Article Open access08 February 2024
Early life microbial succession in the gut follows common patterns in humans across the globe
Article Open access14 January 2025
Main
The early infant microbiome assembles via intricate and partially stochastic microbial acquisitions that have the mother as the primary source and other family members as additional ones1,2,3,4,5. The infant microbiome then evolves during the following few years with complex dynamics that later result in a more stable adult-like microbiome6. Although early family-to-baby microbiome strain transmission has been quite extensively investigated1,2,3,4,7, later infant developmental stages, including those involving interaction with other peers in social contexts, have received very little attention.
As the person-to-person intra-generational microbiome transmission has been recently revealed to be extensive and impact the personal microbiome make-up8, we predicted that early social contexts such as nurseries might exert a large impact on infant microbiomes via baby-to-baby transmission. Beyond work on pathogen spreading9,10 and linked immune competence development11,12,13, microbiome investigations in nurseries are limited in observing increased microbial diversity among attendants14. This leaves a major gap in the understanding of the dynamics of human microbiome maturation during the key first 1,000 days of life15.
Here we present microTOUCH-baby, a strain-resolved longitudinally dense metagenomic study modelling interpersonal gut microbiome transmission between babies attending the nursery for the first time and their close contacts, including family members and nursery educators.
The microTOUCH-baby study
We set up the microTOUCH-baby cohort to study the dynamics of microbiome development and transmission among babies of about 1 year of age and their close social interactions network (Methods). Participants included 43 babies attending the first year of nursery (median age at nursery admission 10 months), 7 co-living siblings, 39 mothers and 30 fathers of the babies, and 5 pets from the participants’ houses, as well as 10 nursery educators (134 volunteers in total; Fig. 1a and Supplementary Table 1). Baby participants were enroled from three public nurseries in Trento (Italy). Babies spent on average 8 hours per weekday (after the ‘settling-in period’; Methods) in the nursery, with limited activities and spaces shared between the two classes in the same nursery, which are followed by different educational staff.
Fig. 1: The microTOUCH-baby study and species-level microbiome configurations before and after nursery attendance.
a, microTOUCH-baby study design and overview. b, Species-level microbiome composition overview of the microTOUCH-baby cohort during first term (principal coordinate analysis on Jaccard dissimilarity, n = 646). Samples are coloured by host categories and shapes indicate the nursery. Baby samples’ colour intensity is according to time point (from initial T01 to final T15). c,d, Average SGB richness across all timepoints and for all individuals in each family member category having versus not having a sibling (c) and having versus not having a pet (d). e,f, Change in alpha-diversity (SGB richness; e) and beta-diversity (Jaccard dissimilarity; f) across participant types between the beginning and the end of the first term, with n indicating the number of individual–individual pairs. Beta-diversity refers to the all-versus-all within-nursery dissimilarities. In the box plots (c–f), box edges show the lower and upper quartiles, the centre line indicates the median, and whiskers extend to the most extreme data point within 1.5× the interquartile range (IQR). P values are reported where statistically significant (two-sided Mann–Whitney U-test in c and d and two-sided Wilcoxon signed-rank tests for e and f); all other comparisons are non significant.
Sampling started before the beginning of the first term (T01), hence before participants from different families had any nursery-related contact among them, and ended after the Christmas nursery closure (Fig. 1a). During nursery attendance, we collected stool samples of the babies on a weekly basis, whereas educators and parents were less densely sampled (Methods). For all participants in group 1 of nursery A, sample collection continued through the second term. Two additional follow-up samples were collected for all participants at nursery year’s conclusion (TA) and at the end of the summer break (TB) (Fig. 1a).
Overall, we collected and metagenomically sequenced 1,013 microbiome samples (average sequencing depth 15.61 Gbp; Methods). Host metadata information included exact age, past and current host-health data, antibiotic exposures, maternal delivery information (Methods, Fig. 1a, and Supplementary Tables 2 and 3), and diet questionnaires (Methods). Metagenomes were processed via the MetaPhlAn 4 computational tool16 to generate taxonomic profiles at species-level genome bin (SGB)17 resolution (Supplementary Table 4 and Extended Data Fig. 1a), including yet-to-characterize species (that is, unknown SGBs accounting for 46.37% of total SGBs). We then used the StrainPhlAn 4 computational tool16,18 to generate strain-level phylogenies for 311 known SGBs and 201 unknown SGBs that were used to infer microbiome strain transmission8 (Methods).
Compositional baby microbiome landscape
We first observed expected microbiome structures1,19,20, with large compositional divergence between adults and babies (Fig. 1b, Extended Data Figs. 1b, 2 and 3, and Supplementary Table 5), age-dependent differences in babies (Extended Data Fig. 4a–e), and diet-dependent microbial stratification in adults (Extended Data Fig. 4f), but not in babies after accounting for age (Supplementary Table 6). Interestingly, at T01 (median age 10 months), the impact of maternal intrapartum antibiotic prophylaxis against group B Streptococcus and of mode of delivery on alpha-diversity was already not detected as statistically significant (Mann–Whitney U-test, n = 37, U = 137, P = 0.68 and n = 37, U = 109, P = 0.89, respectively; Extended Data Fig. 5a–d).
Some compositional patterns were suggestive of a role of microbiome transmission. Babies having a sibling had, for example, an overall higher SGB richness compared with babies without brothers and sisters (n = 40, U = 271, P = 0.012; Fig. 1c and Supplementary Table 6), further supporting previous observations21,22 and suggesting that siblings may provide important sources for infant microbiome enrichment. In contrast, babies with pets showed lower overall SGB richness (n = 40, U = 61, P = 0.012; Fig. 1d), but significance was lost after adjusting for age (Supplementary Table 6). Babies’ alpha-diversity increased during the 3 months of nursery attendance (Fig. 1e), and although the total pool of microbial species detected among babies in the nursery did not change noticeably throughout the study (Extended Data Fig. 5e), the inter-baby beta-diversity decreased significantly (7% average decrease; Wilcoxon signed-rank test n = 116 baby pairs, W = 1,026, P = 7.0 × 10−11; Fig. 1f). As overall this might be indicative of baby microbiome convergence influenced by inter-individual transmission, we performed strain-level transmission analysis to investigate this hypothesis.
Mapping strain-sharing in the nursery
Extending our StrainPhlAn-based validated pipeline8 (Methods), we defined a strain-sharing event as the identification of the same strain (that is, differing by a genetic distance lower than the pre-computed optimal species-specific threshold distinguishing between inter- and intra-individual genetic distance distributions) in different microbiome samples. Strain-sharing rates (SSRs) are computed as the number of strains shared between a pair of microbiome samples over the number of species with profiled strains present in both samples (Methods). Applied on the task of inferring mother–baby transmission, the pipeline estimated a 50% median SSR for babies at the beginning of the study, which is highly consistent with previous results irrespective of population (Extended Data Fig. 5f).
Overall, we captured over 9.47 million instances of the same SGB typed at the strain level in different samples (including those from the same participant and from different participants), with a total of 5.97% of cases in which the same strain of the SGB was present, resulting in 565,258 detected strain-sharing events (Supplementary Tables 7 and 8). Within-individual strain-sharing accounted for 27.9% of the total (157,599 events, with 99% likelihood of samples from the same individual sharing at least 1 strain, and 87% at least 5) but also strain-sharing between different individuals in the same family was very high (51,483 events, 9.1% of the total, with 86% likelihood of sharing at least 1 strain, 47% at least 5), with rarer between-family strain-sharing instances at T01 (46% likelihood of sharing at least 1 strain and 3% at least 5; Extended Data Fig. 5g). Although most strain-sharing over the first term was observed among individuals from different families (356,176 events, 63%), this reflected the >75-times greater number of between-family comparison pairs; after normalizing for the number of comparisons, one order of magnitude fewer strains were shared between families versus within family (0.7 versus 7.9 strains shared per sample pair; Supplementary Table 7). The 0.7 average strains shared by unrelated individuals represent the cohort’s microbiome sharing background, including untraced social interaction before T01, clonal strains spreading into the nursery-associated local community and possible false-positive instances, among other factors.
Tracking multi-host strain transmission
As a representative example of the combined capabilities of our study design and metagenomic pipeline to trace complex strain transmission chains, we illustrate the interpersonal transfer of a nursery-acquired strain of Akkermansia muciniphila (SGB9226) in group 1 of nursery B. A strain from this species was first introduced in the nursery group by a baby (B05) who probably obtained it from their mother, passed to another baby (B06), to then be found in their mother (M06) and father (F06), in the latter replacing another A. muciniphila strain (Fig. 2a). A. muciniphila strains contain CRISPR arrays that can be used as unique genetic tags for strains23,24 that further confirmed A. muciniphila strain identity across volunteers (Methods). Metagenomic assembly also validated such transmission patterns for the limited number of strains (8 out of 19 StrainPhlAn-positive samples) that could be reconstructed into draft genomes of sufficient quality, with high genomic similarity between assemblies from samples with the same strain according to StrainPhlAn (pairwise average nucleotide identity (ANI) 99.97%, which aligned with same-strain boundaries independently estimated elsewhere25,26). We note that the missing detection of A. muciniphila strains (grey circles in Fig. 2a) was overall consistent with the absence of the species as shown in a high-sensitivity, SGB-specific polymerase chain reaction (PCR) (Methods and Extended Data Fig. 6a). Within this example, we found only one sample in which we missed the metagenomic strain profiling to be PCR-positive at the SGB-level, concordantly with a non-zero relative abundance (0.04%) in its MetaPhlAn profile (B05_T08; Extended Data Fig. 6b), being thus the single case in Fig. 2a of SGB9226 falling below the limit of detection for strain profiling. Another transmission chain example involved Alistipes finegoldii (SGB2301) and included an educator (Extended Data Fig. 6c), further contributing to show the potential of our approach to recapitulate microbial transmission in nurseries.
Fig. 2: Inter-individual strain transmission and nursery spreading during the first term.
a, Strain-level profiling for A. muciniphila SGB9226 (left) uncovers the chain of transmission events of one strain of this species in group 1 of nursery B (right). Participant types are identified by shape (mother, diamond; baby, circle; father, square) containing participant identifiers composed of the first letter indicating participant type (M, mother; B, baby; F, father) and the family number; familiar relations are also highlighted by same-colour filling. On the right, each circle represents a sample collected from the participants depicted, with colour filling indicating the identity of the strain of A. muciniphila detected in the sample (except grey, used to indicate that the SGB was not detected/typable at the strain level) and arrows indicating the most likely transmission event. The light orange and grey circle identifies SGB9226-positive sample (B05_T08) in which a strain could not be profiled by StrainPhlAn. The identification of shared CRISPR spacers of the target strain of A. muciniphila (orange circles) across different samples is indicated by an asterisk. b, Strains present at most in one baby before nursery admission (T01) and spreading to other participants in the same nursery, reaching ≥50% prevalence in the following time points, until T15. Left and right y axes show the proportion and number of babies in which the outbreaker strain was detected, respectively. The left y axis also refers to the proportion of babies in which the SGB was detected (that is, their prevalence in the nursery). S. intestinalis, Sellimonas intestinalis.
We also explored potential gut microbiome transmission between household pets and their families. Anecdotally (given the only five pets considered), we overall identified a low total number of pet–human strain-sharing events, with intra-family pet–baby strain-sharing significantly higher than inter-family (Fisher’s exact test, n = 211, P = 0.005; Extended Data Fig. 6d,e). Strains found to be transmitted between babies and pets belonged to human-associated species that had also been previously detected in pet gut microbiomes (Faecalimonas umbilicata, Ruminococcus gnavus, Clostridium sp. AT4 and Phocaeicola vulgatus27,28,29,30), indicating they may be ecologically fit to overcome host-species boundaries.
Strain-spreading patterns in the nursery
We then examined the changes in the collective composition of the human microbiome in nurseries. First, we found the overall pool of distinct strains to decrease over time (that is, average nursery strain heterogeneity decreasing from 0.91 at T01 to 0.77 at T15, Mann–Whitney U-test, n = 454, U = 34,312, P = 1.3 × 10−11). Considering that the total reservoir of microbial species did not increase (Extended Data Fig. 5e), this indicates that some strains within the same species may have spread among babies and prevailed over other strains initially present (Extended Data Fig. 7a).
We then focused on strains that showed efficient spreading within a nursery. We found 8 cases of strains initially detected in no more than one baby before nursery start (T01) reaching ≥50% prevalence afterwards (Fig. 2b). Among these, a Streptococcus gallolyticus (nursery A) and a Bifidobacterium pseudocatenulatum (nursery B) strain were introduced in the nursery after approximately the first month of attendance and progressively spread to seven and eight babies, respectively (Fig. 2b). Although S. gallolyticus spread appeared to dwindle after reaching the maximum diffusion, B. pseudocatenulatum presence was steadily detected, consistent with the high prevalence of the Bifidobacterium genus in the infant population2. Other cases of bacterial strain diffusion involved Escherichia coli and Veillonella dispar in nursery B, and Clostridium innocuum in nursery C, which was possibly limited in its spread by other conspecific strains and niche preemption dynamics31.
Baby microbiomes built via transmission
Quantification of strains shared between babies attending the same nursery over time revealed they had, on average, more shared strains at the end of the first term than before nursery admission (the average number of strains shared with any other baby was 2.5 at T01 and 7.2 at T15, or 8.8 at T15 when disregarding strains already present at T01 and only for babies with samples available at both time points; Fig. 3a). Accordingly, whereas at T01 baby strain-sharing relations were not recapitulating nursery attendance, at T15 they clustered consistently with it (Fig. 3b, Supplementary Table 9 and Methods). We thus found strong evidence of quantitatively relevant acquisition of nursery-specific microbial profiles by babies, occurring via inter-individual strain transmission even in the relatively short time frame of the first nursery term.
Fig. 3: Strain-sharing across hosts before, during and after the first term of nursery attendance.
a, Average number of strains shared between each baby and other participants at T01 and T15. The triangles under the boxes report the average number of strains shared between the baby and any participant in ‘family’ (mother, father, sibling) or ‘nursery’ (other babies, educator). b, Average number of shared strains between baby pairs in the same versus different nursery; P values (two-sided permutation test for means; Methods) for intra- versus inter-nursery comparisons are shown in italic in the circle. The statistics for a and e are in Supplementary Tables 9 and 10. Networks are built on strain-sharing matrices among all babies at T01 and T15. c, Strain replacement rate (one minus the SSR) between initial and final time points. P values are reported where statistically significant (two-sided Mann–Whitney U-tests); all other pairwise comparisons are non significant. In the box plots c–e, box edges show the lower and upper quartiles, the centre line indicates the median, and whiskers extend to the most extreme data point within 1.5× the IQR. d, Baby–baby SSR and average number of strains shared throughout the first term. In d and e, statistical significance asterisks refer to the highest significant P value adjusted for multiple comparisons (two-sided permutation test for medians; Methods) for the set of comparisons indicated in the legend, with **P < 0.01 and ***P < 0.001. Left and right y axes indicate average strains shared and average common SGBs. e, Baby–baby SSR and average number of strains shared at T01, at the beginning and the end of the second term (T15 and TA), and after the summer break (TB), across all babies in all nurseries. At the top, P values are reported where statistically significant (two-sided Wilcoxon signed-rank test evaluating longitudinal SSR for paired baby–baby pairs attending the same nursery).
Investigating longitudinal gut microbiome changes, babies showed the lowest rate of SGB retention (defined as the Jaccard similarity between samples from initial and final time points of the same individual; Extended Data Fig. 7b) and the highest rate of strain replacement (defined as 1 − SSR) among the retained SGBs (Fig. 3c) compared with adults. A median 44.4% of the retained SGBs in babies showed baseline strain replacement during the 5 months of the study. In contrast, all other participants replaced a much lower fraction of strains in their gut (medians below 11.1%), with strain replacement rates correlated although non-significantly with age among non-baby participants (Spearman’s test, n = 68, ρ = 0.22, P = 0.071; Extended Data Fig. 7c). This reflects the expected high plasticity of the infant gut microbiome with its rapidly evolving ecosystem and limited colonization resistance6,32.
To assess the extent to which nursery attendance affects microbiome assembly in babies via microbiome transmission, we quantified and compared the SSR between pairs of babies within the same group or nursery, and across different nurseries at each time point (Fig. 3d). Strain sharing among babies in the same nursery group was significantly higher after approximately only 1 month of nursery attendance compared with babies from different nurseries (median SSR 8.3% versus 0% at T04; permutation test for medians, n = 249, P = 0.001). This is all the more noteworthy in view of the first 2 weeks of the nursery’s ‘settling-in period’ during which babies attend discontinuously and for shorter periods. In addition, at the end of the first term (T15), the SSR in the same nursery group reached an average of 20.2%, significantly higher than the SSR between babies attending different nurseries (4.6%; permutation test for medians, n = 312, P < 0.001) and higher than the SSR among babies attending the same nursery but in different groups (16.1%; permutation test for medians, n = 122, P = 0.079, significant at T08 P = 0.026, T10 P < 0.001 and T13 P = 0.001).
By extending the investigation to the second term of nursery, we found the baby–baby SSR within the same nursery (regardless of group) to reach a median 33.3% at the end of school year (TA; versus median 17.9% at T15; Wilcoxon signed-rank test, n = 58, W = 86, P = 6.2 × 10−9; Fig. 3e), with a progressive increase occurring during the whole second term, as observed for the class that was densely sampled over such a period (group 1 of nursery A; Extended Data Fig. 7d). Although the baby–baby SSR decreased during the summer break (TB), it remained significantly higher compared with post-Christmas-break levels (T15; median 23.7% at TB versus 17.9% at T15; Wilcoxon signed-rank test, n = 31, W = 68, P = 2.0 × 10−4). These results highlight that social relations outside of the household and continued spatial proximity are key determinants of infant microbiome transmission and development at levels that are substantially higher than what was recently observed for adults8.
Nursery strains match family contribution
The parent–baby SSR at T01 averaged 37.3% for mothers and 19.6% for fathers, consistent with available reports1,4,8,33,34,35. Such patterns persisted throughout the first term (Fig. 4a). The contributions of sibling strains to the baby was even higher (average SSR 56.2%; Fig. 4b). As expected, strain transmission between babies and individuals from different families remained negligible throughout the first term (Fig. 4a,b), a testament of the reliability of the strain-transmission-inference approach.
Fig. 4: Dynamics of strain transmission during the first term of nursery.
a, SSR and average number of strains shared between pairs of babies and parents (at T01) from the same versus different families at each time point. In a and b, statistical significance asterisks refer to two-sided permutation tests for medians (Methods) adjusted for multiple comparisons for same family versus different family across each family member type, with ***P < 0.001. Exact P values for a–e are provided in Supplementary Table 11. In all box plots, box edges indicate the lower and upper quartiles, the centre line represents the median, and whiskers extend to the most extreme data point within 1.5× the IQR. b, Strain-sharing between pairs of babies and siblings (T01) from the same versus different families at each baby time point. c, Proportion of strains acquired from group versus family, and corresponding cumulative relative abundance (bottom). For each baby time point, comparisons were performed against past or contemporaneous samples of the family and the nursery group (Methods). Statistical significance asterisks refer only to the proportion of strains acquired from the same group versus the family. In c–e, the two-sided Mann–Whitney U-test was used, with *P < 0.05, **P < 0.01 and ***P < 0.001. d, Association between having a sibling and the number of strains acquired from the nursery group. e, Breakdown between acquisition of new SGBs typed at the strain level and strain replacement for the strains acquired from the nursery (top) and association between either means of strain acquisition and having a sibling (bottom). Statistical significance asterisks refer to the comparison between SGB acquisition from nursery for babies with versus without siblings. f, Number of strains either donated (dark green) or acquired (light green) by each baby over the first term (left y axis), and ratio of donated strains to acquired strains (dashed line; right y axis).
To establish the relative contribution of strain transmission from the nursery with respect to strain transmission from the family, we computed, for each baby, the proportion of strains in the baby microbiome that were exclusively shared with, and hence putatively acquired from, either family members or other babies in the nursery group (Methods) and we refer to it as ‘proportion of strains acquired’. We found that the proportion of strains acquired from the nursery group—but not of strains acquired from the family—changed significantly over time. The proportion of strains acquired from family members fluctuated from an average of 24.0% per baby at T01 to 20.0% at the end of the first term of nursery (Wilcoxon signed-rank test, n = 25, W = 112, P = 0.18; Extended Data Fig. 7e), whereas those putatively acquired from the nursery group increased from an average of 6.5% to 28.4% at the end of the first term (Wilcoxon signed-rank test, n = 25, W = 0, P = 6.0 × 10−8; Extended Data Fig. 7e), significantly surpassing the proportion of strains acquired from the family (Mann–Whitney U-test, n = 52, U = 463, P = 0.023; Fig. 4c). This indicates that after only 3 months of nursery attendance, babies had proportionally more strains acquired from nursery peers than from their family.
A similar trend was observed when quantifying the relative abundance of strains acquired from either the family or the nursery group (Fig. 4c). Family contribution slightly diminished over time (from an average 33.2% at T01 to 20.6% at T15; Wilcoxon signed-rank test, n = 25, W = 72, P = 0.014; Extended Data Fig. 7f) whereas the contribution from the nursery group greatly expanded (reaching an average of 39.6% at T15 from a starting 10.2%; Wilcoxon signed-rank test, n = 25, W = 18, P = 1.5 × 10−5; Extended Data Fig. 7f). Strains shared with both family and group also increased significantly (from average 0.9% to 8.5%; Wilcoxon signed-rank test, n = 25, W = 0, P = 4.4 × 10−4; Extended Data Fig. 7f), probably reflecting reciprocal transmission between family and nursery (Fig. 2a). Overall, this suggests that the nursery collectively contributes to a larger extent to the strain composition of the gut microbiome of babies than to that of the family by the end of the first term (39.6% versus 20.6% at T15; Mann–Whitney U-test, n = 52, U = 479, P = 0.01; Extended Data Fig. 7f).
Long-term nursery effect on transmission
The extended longitudinal analysis of group 1 of nursery A revealed that the proportion of strains acquired from nursery peers continued to gradually increase during the second term (Extended Data Fig. 8a). Samples from all babies across nurseries at year-end (TA) confirmed comparable contributions of family and nursery to the baby (17.6% median proportion of strains acquired from nursery versus 15% from family; Mann–Whitney U-test, n = 19, U = 218, P = 0.29; Extended Data Fig. 8b) that non-significantly tended toward a greater family contribution after summer nursery closure, (8.7% median proportion of strains acquired from nursery versus 16.7% from family; Mann–Whitney U-test, n = 17, U = 122, P = 0.43; Extended Data Fig. 8b).
Babies showed lower strain retention and higher strain replacement across the summer break (that is, between TA and TB) compared with adults, despite no differences in the carriage of SGBs typed at the strain level (Extended Data Fig. 8c–f). Interestingly, family-acquired strains were significantly more retained and less replaced in babies over the summer break than nursery-acquired strains (Wilcoxon signed-rank test, n = 11, W = 5, P = 0.019 and P = 0.022 respectively; Extended Data Fig. 8g,h), suggesting that continuous seeding linked to continued contact is a factor behind long-term colonization.
Siblings affect baby strain acquisition
Predicting a potential role of siblings in the transmission patterns, we found that at T01, babies showed a higher SSR with their siblings (average 52.3%) than with their fathers (24.9%; Mann–Whitney U-test, n = 36, U = 147, P = 0.026) as well as with their mothers, although non-significantly (46.1%; Mann–Whitney U-test, n = 36, U = 120, P = 0.47; Extended Data Fig. 8i). Of note, an average of 10.4 strains were shared exclusively with siblings at T01, whereas only 2.0 and 2.4 were shared exclusively with the mother or the father (Extended Data Fig. 8j), possibly reflecting closer intestinal ecology, physical interaction and development stage, which are probably some of the same factors leading to the higher nursery strain acquisition observed in our cohort.
We further observed that having a sibling was associated with babies acquiring significantly fewer strains from their nursery group compared with babies without a sibling at T15 (Mann–Whitney U-test, n = 28, U = 117, P = 0.004; Fig. 4d). Although causality cannot be inferred, this might be linked to early acquisition from siblings ‘saturating’ the overall strain acquisition potential, which would be in line with babies with a sibling having higher alpha-diversity (Fig. 1c) and acquiring fewer new SGBs than only-children (Fig. 4e). Notably though, although all babies both spread and acquired strains in the nursery, the ratio between acquired and donated strains varied widely between babies (Fig. 4f).
The most-transmissible species
We next assessed species-level transmissibility by counting the number of strain-sharing events for each SGB in our cohort over the total potential number of strain-sharing events (Methods). Microeukaryotic taxa were not found to be abundant enough in babies to try to infer transmission, with Blastocystis, the most common human gut microeukaryote36, identified in 9.18% of the samples but never in babies (Supplementary Table 12). Focusing thus on prokaryotic taxa, out of the 64 SGBs with highest transmissibility (henceforward ‘T’) over all participant categories (Extended Data Fig. 9a and Supplementary Table 13), many known SGBs encompassed aerotolerant (S. gallolyticus, Rothia mucilaginosa and B. pseudocatenulatum) and spore-forming species (for example, Tyzzerella nexilis and Clostridium fessum). We also identified the spore-forming Clostridioides difficile among the most-transmissible SGBs between baby–baby pairs only (T = 0.38, prevalence in babies 24% and in adults 0%), in line with widespread carriage in asymptomatic babies37,38. Exceptions to this trend were prevalent non-sporulating human gut anaerobes (such as Blautia wexlerae and Faecalibacterium prausnitzii).
SGB transmissibility correlated with SGB prevalence in both adults (Spearman’s test, n = 461, ρ = 0.35, Padj = 9.8 × 10−14) and babies (Spearman’s test, n = 461, ρ = 0.40, Padj = 1.2 × 10−17; Extended Data Fig. 9b,c). The highest transmissibility scores were highlighted for SGBs shared in baby–siblings pairs, namely, A. finegoldii, Bacteroides ovatus and Bacteroides caccae, the butyrate-producing Roseburia intestinalis and Agathobaculum butyriciproducens39,40, Bifidobacterium bifidum, and Bifidobacterium breve (all with T = 1; Extended Data Fig. 9a). B. caccae strains were also commonly transmitted between mothers and babies, alongside strains of two undescribed Clostridium spp., Phocaeicola vulgatus and the typically maternally derived B. bifidum and B. pseudocatenulatum32,41. Highly transmitted SGBs between fathers and babies included Clostridium sp. AM333, Lachnospira spp., and the aerotolerant and bile-resistant Sutterella wadsworthensis. Finally, with the exception of the microaerophilic Streptococcus salivarius and S. wadsworthensis (T = 0.83 and T = 0.82, respectively), highly transmitted SGBs between mother–father pairs included multiple bifidobacteria and Blautia spp. Interestingly, many of the species are fibre-degrading specialists in the gut42,43,44, with known beneficial effects on the host45, indicating that within-family microbial transmission may hold a favourable potential for health-associated microbiome development.
We looked further into our dataset to identify species differentially more transmitted baby-to-baby in the nursery setting compared with baby–mother and baby–father pairs (Supplementary Table 14). B. breve, a highly prevalent and health-promoting species in (breast-fed) babies6,46, was differentially more transmissible among baby pairs, compared with mother–baby pairs, as was the case also for Dorea formicigenerans, an age progression biomarker in babies47 (Extended Data Fig. 10a,b). Bifidobacterium longum subsp. infantis, a specialized gut colonizer of breast-fed babies46 with anti-inflammatory effects48, was detected exclusively in babies in our cohort (Methods and Extended Data Fig. 10c,d), with prevalence peaking at approximately 50% mid-term (T08), before declining (Extended Data Fig. 10e); its transmission was significantly higher than B. longum subsp. longum among baby–baby pairs (T = 85.3% versus T = 19.4%, respectively; Fisher’s exact test, n = 142, P = 5 × 10−12), showing that the acquisition of B. longum subsp. infantis may specifically occur via interpersonal transmission among babies.
Host factors and microbiome transmission
In addition to the effect of having a sibling, age also significantly affected strain donation (increasing frequency in older babies, Spearman’s test, n = 39, ρ = 0.43, P = 0.007), but not strain acquisition (n = 39, ρ = 0.24, P = 0.14; Extended Data Fig. 11a–c). Interestingly, potentially delayed microbial colonization at birth (owing to caesarean delivery or intrapartum antimicrobial prophylaxis) did not influence microbial strain acquisition for babies in the nursery (Extended Data Fig. 11d,e), in line with no T01 alpha-diversity differences (Extended Data Fig. 5a,b). Analysis of the influence of diet of babies on strain-sharing revealed that infants consuming milk at T01, particularly maternal milk, exhibited elevated albeit not statistically significant SSRs with their mothers at T01 (Extended Data Fig. 12a,b). Further exploration of dietary impacts on strain acquisition and donation patterns failed to identify significant associations (Extended Data Fig. 12c–k), suggesting an overall negligible impact of diet on interpersonal microbiome transmission; however, putative dietary effects on the establishment of specific strains in a recipient microbiome cannot be definitely excluded, given the limited granularity of our dietary data.
Antibiotics effect on strain acquisition
Finally, we assessed the impact of antibiotic interventions on adult and babies’ interpersonal transmission, exploiting the recorded antibiotic administration events that included amoxicillin alone (n = 7 events) and in combination with clavulanic acid (n = 13), betamethasone dipropionate (n = 6) and the macrolide azithromycin (n = 4), routine treatments for bronchitis, inflammatory skin conditions, and upper respiratory, ear and intestinal infections49,50.
Antibiotic treatment (ATB) significantly reduced the absolute number of retained strains between consecutive time points in both adults (average 86.4 control pre–post versus 60.1 ATB pre–post; Mann–Whitney U-test, n = 74, U = 729, P = 5.1 × 10−4) and babies (average 24.3 control pre–post versus 14.1 ATB pre-post; Mann–Whitney U-test, n = 77, U = 1,169, P = 1.3 × 10−5; Extended Data Fig. 12l). Even for SGBs typed at the strain level that were present at both time points, the strain retention rate was also significantly diminished after treatments in adults (average 93.8% control pre–post versus 88.4% ATB pre–post; Mann–Whitney U-test, n = 74, U = 631, P = 0.028; Fig. 5a) and in babies (average 90.6% control pre–post versus 70.2% ATB pre–post; Mann–Whitney U-test, n = 76, U = 1,215, P = 2.9 × 10−7; Fig. 5a), but to a greater extent in the latter (ATB pre–post average strain retention rate adults versus babies: 88.4% versus 70.2%; Mann–Whitney U-test, n = 53, U = 466, P = 0.001; Fig. 5a).
Fig. 5: Antibiotic use is associated with lower strain retention and, in babies only, higher strain acquisition.
a, Strain retention rate (that is, the within-individual SSR) in adult and baby participants (n = 69 and n = 41, respectively, in a–c) who underwent antibiotic treatment (ATB pre–post) versus untreated controls (control pre–post). In all box plots, box edges indicate the lower and upper quartiles, the centre line represents the median, and whiskers extend to the most extreme data point within 1.5× the IQR. Statistical significant P values in all panels refer to two-sided Mann–Whitney U-tests. All other comparisons are non significant. b, Acquisition rate of SGBs typed at the strain level in babies and adults who underwent antibiotic treatment (ATB pre–post) versus untreated controls (control pre–post). c, Fraction of the strains present in pre samples replaced in post samples for babies and adults who underwent antibiotic treatment (ATB pre–post) versus untreated controls (control pre–post). Siblings and pets are excluded. Comparisons were performed between consecutive control pre–post and ATB pre–post time points (one per volunteer).
After antibiotic use, the gut microbiomes of babies were replenished with new strains (Fig. 5b,c). This was driven by both the acquisition of new SGBs (average SGB acquisition rate 30.4% control pre–post versus 49.2% ATB pre–post; Mann–Whitney U-test, n = 59, U = 164, P = 2.9 × 10−4; Fig. 5b and Extended Data Fig. 12m), and by the strain replacement within SGBs (average 2.1 replaced strains and 7.1% fraction of pre-ATB strains replaced control pre–post versus 3.9 and 13.6% ATB pre–post; Mann–Whitney U-test, n = 59, U = 209, P = 0.003 and P = 0.004; Fig. 5c and Extended Data Fig. 12n). In contrast, adult microbiomes appeared to be less prone to new colonizations after antibiotic treatment via either means of strain acquisition (ATB pre–post average SGB acquisition rate adults versus babies: 34.2% versus 49.2%; Mann–Whitney U-test, n = 33, U = 74, P = 0.041, Fig. 5b; ATB pre–post average fraction of pre-ATB strains replaced adults versus babies: 7.5% versus 13.6%; Mann–Whitney U-test, n = 33, U = 69, P = 0.026; Fig. 5c), suggesting that although infant microbiomes tend to be more impacted by antibiotic therapy, their richness is also more easily recovered.
Conclusion
Our longitudinal strain-resolved metagenomic framework revealed that the infant gut microbiome largely assembles, expands and modifies in the nursery via extensive baby–baby strain transmission, extending earlier work on family-to-baby transmission1,2,3,4,34,35 and overviews of the infant microbiome in nurseries14,51,52. After a few months of nursery attendance, the microbial strains acquired from peers in the same nursery group accounted for a larger proportion of the infant microbiome than those from the mother and—more generally—family members (Fig. 4c), who are known to exert the greatest influence on babies’ microbiome in the first months of life. Contributions to the infant microbiome by family and nursery were not influenced by birth practices or feeding regimes (Extended Data Figs. 11d,e and 12a–k), and became comparable by the end of the second term, possibly indicative of strains being shared with both family and nursery. In addition to their already established effects in emotional and cognitive development53,54, social relations among peers in the nursery are thus a hub for microbial enrichment during infancy, particularly of key early-life gut colonizers such as B. longum subsp. infantis and B. breve6,46 (Extended Data Fig. 10a–e).
Horizontal infant microbiome transmission does not occur only in nursery settings, as we found that baby–sibling strain-sharing surpasses transmission between parent–baby pairs (Extended Data Fig. 5g) and correlates with a later decrease in infant microbial acquisition in the nursery (Fig. 4d). Even pets might contribute strains to the babies but not to the adults (Extended Data Fig. 6d,e), and although limited and somewhat conflicting evidence has been produced on the effect of having a pet on human microbiomes55,56,57, larger studies specifically focused on strain transmission and medium-term retention should be promoted. Overall, our data further reinforce the role of horizontal intra-generational (and possibly inter-host species) over vertical inter-generational transmission not only in adults8 and nurseries (the main point of the present work) but also within a family context.
In several cases, we observed very effective spread of a single strain within nurseries (Fig. 2b). Such diffusion patterns are akin to typical pathogenic outbreaks within closed communities58,59. However, although pathogenic spread typically elicits an acute immunological reaction and/or requires treatment, leading to somewhat rapid clearance after transmission, for gut microbiome members, colonization may be long-lasting as we reported in several cases in our cohort (Fig. 2b), even though it remains unsettled whether colonization persisted for many years after the end of nursery school. Moreover, further elucidation of the phenotypes linked to the propagation of fast-spreading strains may be highly relevant towards a better comprehension of the factors favouring the development of a healthy host–microbiome mutualism.
Among the factors that may influence microbial transmission in babies, we found antibiotic usage to be the strongest one. Despite the infant microbiome being highly perturbed by antibiotic treatment during the first year of life, as previously reported60,61,62, it is also fast-recovering via extensive strain acquisition (Fig. 5 and Extended Data Fig. 12m,n), consistent with antibiotic treatment before faecal microbiota transplantation increasing donor strain engraftment in adults63. However, we found strong evidence that the extent and the rate of post-antibiotic strain acquisition was substantially higher in babies compared with adults (Fig. 5), and this clearly reinforces the risks—but potentially also the opportunities—of infant antibiotic intake connected with a deep reprogramming of the structure of the infant microbiome induced by post-antibiotic strain acquisition. Whether the rapid acquisition of microbial diversity after antibiotic courses in babies is driven specifically by the high level of peer-to-peer interaction in the nursery environment should be investigated further, but it is reasonable to consider that prolonged isolation within the family of antibiotic-treated babies would result in a slower microbiome recovery and acquisition of fewer baby-specific microbial species.
Methodologically, our strain-sharing pipeline models the genetics of the dominant strains of each species (SGBs) present in any given microbiome sample64 to enable identification of strain transmission events. Although recent surveys have pointed out the usual presence in the gut of a single strain of each species25, further advances in metagenomic strain-profiling tools could reveal the complexity of multiple coexisting conspecific strains and shed light on their role in influencing strain(s) transmission dynamics and long-term colonization in the gut microbiome.
Overall, our results reveal the centrality of social factors in shaping the infant microbiome via inter-individual microbial transmission, thus rebalancing social interactions as key to building a healthy microbiome, beyond their epidemiological role in the spread of (opportunistic) pathogens. Continued efforts on this topic should be focused on investigating the transmission of further microbiome components such as phages, plasmids and operons, as well as on applying experimental tools to profile the microbial features favouring diverse modes of transmission.
Methods
Cohort description and recruitment
A total of 134 volunteers comprising babies (4–15 months old at nursery start, median 10 months, 18 male, 25 female) about to attend the first year of nursery school, their parents (29–50 years old, median 36 years old, 30 male, 39 female), siblings (2–21 years old, median 2 years old, 3 male, 4 female) and house pets (n = 5, 2 cats and 3 dogs), and educators (34–56 years old, median 38.5 years old, 10 female) were recruited and enroled across 3 nursery schools (here identified as A, B and C), each with 2 distinct classes, in the municipality of Trento (Italy) in June 2022. The classes within the same nursery shared few activities (that is, baby drop-off and pick-up) and spaces throughout the day, and were followed by different educators. The protocol of this study was approved by the Ethics Committee of the University of Trento (protocol number 2022-040) and by the Ethics Panel of the European Research Council Executive Agency after evaluation of the project (microTOUCH Grant agreement ID 101045015). Upon enrolment, volunteers were asked to provide informed consent and complete metadata questionnaires. Consent for participation of babies was obtained directly from parents.
Metadata collection and organization
Date of birth, sex, anthropometric data (weight, height), and antibiotic treatment in the 3 months preceding the start of the study or supplemented during its course, in addition to information regarding putative contacts with other volunteers preceding the beginning of nursery, were collected for volunteers of all ages. Metadata specifically collected for babies included gestation length, mode of delivery and general diet at nursery admission (breast or formula milk feeding and weaning date of start). Adult participants were also required to provide information regarding past or ongoing chronic conditions and relative treatments, and putative maternal anti-Streptococcus B prophylaxis during birth. Diet metadata for babies and adults are detailed in the next section.
Dietary information collection and analysis
In brief, most babies had begun weaning at T01 (weaned n = 38, not weaned n = 2, not available = 3) and received identical solid meals while in the nursery. The majority followed a mixed feeding approach during weaning, combining solid foods with any type of milk (mixed diet n = 24, exclusively solid food n = 14, not available = 5). Among those babies receiving milk supplementation, feeding types were relatively balanced (breast-fed n = 9, formula-fed n = 10, receiving both n = 5). Finally, adults detailed their long-term dietary habits via the compilation of the EPIC Food Frequency Questionnaire (FFQ). FFQs were used to calculate the healthy Plant-based Diet Index65. Quality and quantity of plant-based foods were derived from FFQs for a total of 18 food groups, and divided into quintiles and assigned positive or negative scores. Participants whose intake exceeded the highest quintile received a score of 5, whereas those below the lowest quintile received a score of 1. Healthy plant-based foods received positive scores, whereas less healthy or unhealthy plant-based and animal-based foods received a negative score. A final score was derived by summarizing the scores of each participant. Metadata were collected and utilized after pseudonymization of volunteers IDs.
Sample collection
Sample collection began a week before the start of the first term of nursery (August 2022) and ended after the Christmas holidays (January 2023) for all volunteers. During the first 2 weeks the nursery organized a ‘settling-in phase’, in which babies were gradually introduced to the nursery and attended it for about 3 hours per weekday. In the following weeks, babies attended the nursery for about 8 hours per weekday. Throughout the term length (about 14 weeks), stool samples of infant participants were collected weekly (from before nursery admission T01 to at the end of Christmas holidays T15) by the nursery staff or the researcher in the nursery from nappies stored at room temperature on the same day of use, using collection tubes for specimen collection containing 9 ml of DNA/RNA Shield buffer (Zymo). Sample collection was extended until the end of the second term of the year (about 30 weeks, ending July 2023) for all donors in group 1 of nursery A, including babies, parents, educators and pets, maintaining sampling time-point frequencies and modalities. Two follow-up time points were collected for all participants enroled, at the end of the year of nursery (July 2023, ‘TA’) and at the end of the summer break (August/September 2023, ‘TB’). The samples collected were moved to the lab and DNA-extracted within 2 weeks of delivery. Samples collection of babies during summer or winter breaks time points together with those of siblings and pets were performed directly at home by the parents and stored at room temperature until the beginning of nursery (maximum 2 weeks later). All adult participants’ samples were self-collected following detailed instructions, delivered to the lab and processed as previously. Educators donated monthly, whereas parents collected one additional sample halfway the study period, in addition to initial and final sample time points.
DNA extraction and sequencing
After vortex homogenization, DNA was extracted using the DNeasy PowerSoil Pro Kit (Qiagen), following the directions of the Human Microbiome Project protocol66. Additional homogenized aliquots were stored at −20 °C. DNA was quantified using Qubit 2.0 fluorometer (Thermo Fisher Scientific). Sequencing libraries were prepared using the Nextera DNA Library Preparation Kit (Illumina), as described by the manufacturer’s guidelines. The sequencing was performed on the Illumina NovaSeq 6000 platform following manufacturer’s protocols. The sequencing depth was set at 15 Gbp.
Metagenome quality control and preprocessing
Stool samples sequences were pre-processed using the pipeline described at https://github.com/SegataLab/preprocessing. In brief, metagenomic reads were quality-controlled and reads of low quality (quality score <Q20), short reads (<75 bp) and reads with >2 ambiguous nucleotides were removed with Trim Galore (v0.6.6). Contaminant and host DNA was identified with Bowtie2 (v2.3.4.3)67 using the -sensitive-local parameter, allowing confident removal of the phiX 174 Illumina spike-in and human-associated reads (hg19/GRCh37 human genome release). Remaining high-quality reads were sorted and split to create standard forward, reverse and unpaired reads output files for each metagenome. Metagenomes with at least 1 Gbp were included in the analysis (n = 1,021), whereas metagenomes with insufficient sequencing depth were excluded (n = 5).
Species-level profiling
Profiling at the resolution of SGBs was performed with MetaPhlAn (v4.1)16,68 using the vJun23_202307 markers database and using the –unclassified_estimation parameter (Supplementary Table 4). SGBs with <0.1% relative abundance in all stool samples were removed from taxonomic profiles for calculation of diversity indices.
Building strain-level phylogenetic trees
To reliably detect strain-sharing events, we augmented our dataset with oral samples from the same cohort not analysed in this study (n = 342) and additional samples from 16 public longitudinal cohorts. To do so, we queried the curatedMetagenomicData (v3.18)69 for stool samples sequenced at least at 1-Gbp depth from healthy westernized human individuals with at least 2 time points per individual and 3 such individuals per dataset. We went through the corresponding papers and excluded studies involving an intervention between the sampled time points. Thus we have included samples satisfying the above criteria from the following datasets: ShaoY_2019 (ref. 32), MehtaRS_2018 (ref. 70), VatanenT_2016 (ref. 71), HMP_2019_ibdmdb72,73, BackhedF_2015 (ref. 74), CosteaPI_2017 (ref. 75), YassourM_2018 (ref. 3), KosticAD_2015 (ref. 76), LouisS_2016 (ref. 77), FerrettiP_2018 (ref. 1), HallAB_2017 (ref. 78), WampachL_2018 (ref. 79), NielsenHB_2014 (ref. 80), Heitz-BuschartA_2016 (ref. 81), ChuDM_2017 (ref. 82) and AsnicarF_2017 (ref. 83). In total, phylogenetic trees were built using data from 1,405 samples from this cohort and 4,322 samples from additional cohorts.
For all the SGBs detected in our cohort, we queried MetaRefSGB, a microbial genomic database containing >156,000 isolate genomes and >952,000 metagenome-assembled genomes (MAGs) as of vJun23 (ref. 16), for isolate genomes or MAGs from food sources, and included them in the trees as references84.
To build each tree, we included all the samples for which the SGB was detected by MetaPhlAn. SGBs detected in less than 20 samples from our cohort were discarded, leaving 1,363 SGBs. The strain-level phylogenetic trees were built for each SGB with StrainPhlAn (v4.1)16,68. For 1,107 SGBs, we were able to build the phylogenetic tree; the remaining 256 SGBs did not have a sufficient number of samples with enough marker genes with minimal coverage as reported by StrainPhlAn.
Assessment of SSRs
We discarded trees built from alignments shorter than 1,000 nt (n = 111 SGBs discarded). In the phylogenetic trees with genomes from a food source, we considered the ANI on the marker genes (mutation rates in StrainPhlAn) and discarded samples closer than 99.85% ANI as probably coming from food. When more than 20% of the samples would be discarded, we dropped the SGB altogether (n = 7 SGBs discarded).
We calculated phylogenetic distances between all samples as the length of the shortest path between the samples along the tree branches. We normalized distance distributions within each tree by dividing by the median distance.
To define strain-sharing events, we calculated a threshold best separating the within-individual distribution from the across-individuals distribution of phylogenetic distances. For the within-individual distribution, pairs of samples from the same individual sampled maximum 6 months apart were considered using maximum one pair per individual. Among the possible pairs, we chose the one maximizing the chances to type the strain of the SGB of interest in both samples, and we did so by choosing the pair for which the sample with the lowest coverage for the SGB between the two samples was the highest among all pairs. In case of ties, we maximized the higher estimated coverage of the two. The SGB coverage was estimated as the sequencing depth times the SGB’s relative abundance. For the across-individuals distribution, we pick pairs of samples coming from different datasets, one sample per individual maximizing the coverage. When there were less than 50 pairs in the across-individual distribution we discarded the SGB (n = 365 SGBs discarded). When there were less than 20 pairs in the within-individual distribution (n = 276 SGBs), we calculated the threshold as the 3rd percentile of the across-individual distribution, that is, setting the expected false discovery rate to 3%. When the within-individual distribution had at least 20 pairs (n = 309 SGBs), the threshold separating the distributions was calculated as maximizing Youden’s index, unless the expected false discovery rate exceeded 5%, in which case we set the threshold as the 5th percentile of the across-individual distribution (n = 61 SGBs). After manual curation of the trees, including outlier branches removal, a further 112 SGBs were discarded. In total, we assessed strain-sharing for 512 SGBs.
For each pair of samples, we called a strain-sharing event of an SGB when their phylogenetic distance in the corresponding tree was lower than the corresponding calculated threshold.
For each sample, we considered the SGBs in which corresponding trees it was placed, that is, profiled by StrainPhlAn. Finally, we define the SSR as the number of strain-sharing events divided by the number of SGBs profiled at the strain level in both of them by StrainPhlAn. For pairs with less than five SGBs profiled in common, we set the SSR as undefined and such pairs were excluded from SSR analysis. Strain retention rate is defined as the within-individual SSR. The within-individual strain replacement rate is defined as the number of strains not retained longitudinally among retained SGBs over the number of retained SGBs.
Samples found to be contaminated or mislabelled according to strain-sharing analysis (and validated with CrocoDeEL (v1.0.6)85) were removed post hoc (n = 8), reducing the dataset to 1,013 metagenomes. Samples collected during antibiotic treatment (n = 26) were excluded from all following analyses.
Within-SGB strain heterogeneity, strain-sharing networks and SGB transmissibility
Within-SGB strain heterogeneity (Extended Data Fig. 7a) was computed on a per-nursery basis as the number of strains of a given SGB present among babies at a given time point over the number of babies having the SGB at the same time point. A within-SGB strain heterogeneity of 1 indicates there is no strain-sharing among babies having the SGB (that is, they all have a different strain of the SGB).
Strain-sharing matrices were used to build unsupervised strain-sharing networks (Fig. 3b) with the R packages ggraph (v2.2.1) and tidygraph (v1.3.1), where only nodes with degree >0 are shown.
SGB transmissibility (Extended Data Fig. 9a) was computed as the number of individual pairs sharing the strain over the total number of potentially callable strain-sharing events involving the SGB (that is, the number of pairs of individuals in which the SGB was present and typable according to StrainPhlAn)8. When the SGB was not present among at least three pairs within a category, the transmissibility of the SGB within the category was set as undefined. Differential transmissibility between baby–baby and baby–mother or baby–father pairs was assessed for all SGBs having at least 10 baby–baby pairs sharing it, with application of a Fisher’s exact test (including false discovery rate control) in case there were at least 10 baby–mother pairs or 10 baby–father pairs sharing the SGB; otherwise the number of SGB- and strain-sharing events were reported for each group without application of the test.
Comparison of the contribution of familial and nursery strains with the infant microbiome
To compare the contribution of familial and nursery strains to babies’ microbiome composition, for each baby, we computed at each baby time point (T01 to T15) the number of strains shared either with any member of the family or with any other infant of the nursery group, disregarding strains shared with both (unless otherwise stated). This allowed us to compute the proportion of strains for a given baby microbiome that was putatively acquired from either the nursery or family (referred to in the text and figures as the ‘proportion of strains acquired’), as the strains exclusively shared with either one of the two groups of individuals. Considering that family members were less densely sampled than infants, we considered a maximum of three samples for each of the other babies in the nursery group (baseline T01, halftime T08 and final T15, when available, emulating the sampling timeline of parents), adding other babies and family samples to the longitudinal analysis considering the time in which they were sampled (that is, looking for strain-sharing only with samples collected in the past or contemporaneous to the target time point of the target baby). This probably explains the nonlinear increases of proportion of strains acquired from the nursery group at baby T08 and T15 observed (for example, Fig. 4c). Moreover, as a negative control for the considerably larger number of individuals in the nursery group (average n = 7) compared with the family (average n = 2), we also analysed the strain-sharing dynamics with a random nursery group from another nursery.
Metagenomic assembly and CRISPR analysis
MAGs were generated through a previously validated metagenomic assembly pipeline17, including assembly of contigs with MEGAHIT (v1.1.1)86, calculation of contigs coverage with Bowtie2 (v2.2.9)67, binning of contigs with MetaBAT2 (v2.12.1)87 and quality-checking of bins with CheckM (v1.1.3)88. Medium- and high-quality MAGs were identified following the criteria previously proposed89 and low-quality MAGs were discarded. The ANI between MAGs was computed using skani (v.0.2.1)90.
To validate the chain of transmission events of the strain of A. muciniphila SGB9226 shown in Fig. 2a, CRISPR arrays were identified from MAGs using MinCED (v0.4.2, default parameters)91. CRISPR spacers and repeats were extracted from raw sequencing reads using Crass (v1.0.1, parameters -d 20 -D 55 -s 20 -S 55 --longDescription)92. Following the identification of 5 CRISPR arrays and 39 CRISPR spacers in this set of MAGs, we looked for the CRISPR spacers of this strain in the metagenomic reads of the whole dataset. Although single CRISPR spacers were found in the metagenomic reads of up to 16/19 samples containing the strain (Fig. 2a), none of the 39 CRISPR spacers could ever be found in the remaining 627 metagenomes, providing independent validation for the trajectory of the strain in the nursery group.
B. longum strains assignment to subspecies
To evaluate the transmissibility of distinct subspecies of B. longum SGB17248 in our dataset, we constructed a StrainPhlAn 4 phylogenetic tree of all 591 B. longum strains identified in our cohort, which revealed 2 distinct clusters (cluster_1 and cluster_2; Extended Data Fig. 10c–e). Cluster_1 consisted exclusively of strains present in babies and hence was hypothesized to belong to B. longum subsp. infantis, whereas cluster_2 contained strains from both babies and adults, possibly representing other subspecies of B. longum. To definitively assign these strains to subspecies, we succeeded in generating MAGs (as described above) for 262 of the 591 strains and then calculated the ANI between these MAGs and 15 reference genomes representing the 3 well-known B. longum subspecies93 (5 reference genomes each for subsp. longum, subsp. infantis and subsp. suis). The 24 MAGs from cluster_1 showed highest similarity to B. longum subsp. infantis reference genomes (median ANI 98.04%), compared with lower ANI values with subsp. longum (95.60%) and subsp. suis (96.13%). Conversely, the 238 MAGs from cluster_2 showed the highest genomic similarity to B. longum subsp. longum reference genomes (median ANI 98.87%), with substantially lower similarity to subsp. suis (96.82%) and subsp. infantis (95.39%). The 591 B. longum strains as profiled with StrainPhlAn 4 were then assigned according to their genome-level assignment.
Blastocystis detection and ST profiling
The presence of Blastocystis in metagenomic samples was assessed using a previously validated computational workflow94. In brief, nine reference genomes for eight distinct Blastocystis subtypes (that is, subtype 1 (ST1) (ST1_LXWW01), ST2 (ST2_JZRJ01), ST3 (ST3_JZRK01), ST4 (GCF_000743755 and ST4_BT1_JZRL01), ST6 (ST6_JZRM01), ST8 (ST8_JZRN01) and ST9 (ST9_JZRO01)) were mapped against metagenomic reads with Bowtie2 (v2.5). Then, SAMtools (v1.19) and bedtools (v2.30) were used to compute the breadth of coverage of each genome. We reported a sample to be positive for a Blastocystis ST if the respective genome had a breadth of coverage of at least 10%.
PCR validation of transmission of an A. muciniphila SGB9226 strain
To further test how much potential problems with limit of detection in the metagenomic approach could influence strain transmission inference, we implemented an SGB-specific PCR assay for A. muciniphila (SGB9226) and applied it to the representative example depicted in Fig. 2a. We designed the primers (F-5′-TGACTGGACTCTATTGCCTGAAG-3′ and R-5′-GCCTTTCAATATGCCCTTCGTAC-3′; amplicon length 101 bp) to recognize the SGB9226-specific core gene (UniRef90_A0A2N8IRV1; identified by MetaPhlAn 4 (ref. 16)), using ConsensusPrime (v.1.0, with set consensus similarity 0.8 and consensus threshold 0.95)95 and Primer3 (v. 2.6.1; with set primer size 18–28 bp, optimal melting temperature 57–63 °C, GC content 40–60%, PRIMER_MAX_HAIRPIN_TH 24.0, PRIMER_INTERNAL_MAX_HAIRPIN_TH 24.0, PRIMER_MAX_END_STABILITA 9.0)96. Assay sensitivity was independently evaluated using a spike-in approach with DNA from the A. muciniphila type strain ATCC BAA-835 (from 1 M down to 1 single genome copy) into an A. muciniphila-negative faecal test sample. The assay achieved a limit of detection equivalent to a single genome copy of A. muciniphila (Extended Data Fig. 6a). PCRs were performed using GoTaq G2 Hot Start Green Master Mix (Promega) with 500 nM of each primer. The thermal cycling programme included an initial denaturation at 95 °C for 10 min, followed by 40 cycles of annealing at 62 °C. Positive bands were observed by electrophoresis on a 2% agarose gel.
Statistical analysis
Statistical analyses were performed in Python (v3.10.12) using libraries scikit-bio (v0.5.9), scipy (v1.10.1) and statsmodels (v0.14.0). Cross-sectional comparisons between groups were performed using the Mann–Whitney U-test (two groups) or the Kruskal–Wallis test with post hoc Dunn tests (multiple groups) for independent observations. Cross-sectional dependent observations were compared with a permutation test for medians (or means, when comparing the number of shared strains), with P values calculated as the proportion of times (out of a 1,000 permutations) that the observed difference in the median between groups with shuffled labels is equal or more extreme than the observed in the correctly labelled data. Longitudinal comparisons between time points were performed using the Wilcoxon signed-rank test. Jaccard dissimilarity matrices were computed from taxonomic profiles and compositional differences between groups were evaluated using permutational multivariate analysis of variance. When appropriate, correction for multiple testing was applied using the Benjamini–Hochberg procedure (Padj), with significance defined as Padj < 0.05.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Shotgun metagenomic data for microTOUCH-baby are available at the NCBI SRA under accession number PRJNA1140720, with respective sample-wise metadata information available in Supplementary Table 2 and at https://doi.org/10.5281/zenodo.17663257 (ref. 97). Accession numbers for sequencing data from additional longitudinal datasets can be found in the original publications. The human genome release used for host decontamination is available at NCBI RefSeq (GCF_000001405.13).
Code availability
All the software used in this study are available in the MetaPhlAn4 package (which includes StrainPhlAn 4 and the script for strain transmission inference), available at http://segatalab.cibio.unitn.it/tools/metaphlan with the open-source code at https://github.com/biobakery/MetaPhlAn.
References
Ferretti, P. et al. Mother-to-infant microbial transmission from different body sites shapes the developing infant gut microbiome. Cell Host Microbe 24, 133–145.e5 (2018).
Yang, B. et al. Development of gut microbiota and bifidobacterial communities of neonates in the first 6 weeks and their inheritance from mother. Gut Microbes 13, 1–13 (2021).
Yassour, M. et al. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24, 146–154.e4 (2018).
Dubois, L. et al. Paternal and induced gut microbiota seeding complement mother-to-infant transmission. Cell Host Microbe 32, 1011–1024.e4 (2024).
Heidrich, V., Valles-Colomer, M. & Segata, N. Human microbiome acquisition and transmission. Nat. Rev. Microbiol. https://doi.org/10.1038/s41579-025-01166-x (2025).
Stewart, C. J. et al. Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562, 583–588 (2018).
Brito, I. L. et al. Transmission of human-associated microbiota along family and social networks. Nat. Microbiol. 4, 964–971 (2019).
Valles-Colomer, M. et al. The p
|
|