You could try filling in the gaps of the DTM. Then you’ve got a dataset that you can subtract from the DSM. There are several techniques to, each with their own pro’s and cons. I’ve had pretty good results with the interpolation tools of scipy.
Here is an example of an original surface model:
And here the same area after subtracting the filled in terrain model
Raw values are now the approximate height of the objects in relation to the surrounding ground level. The street level itself has a value of 0.