Hi
I have come across a project where they have used a lot of ETL and done the datawarehousing the old classical way i.e generating integer Surrogate Keys. I believe they have done for performance as integer joins performs better that varchar joins . The dev system has 57 million fact table rows and dim tables have anywhere from 100 K to 1 m rows . So prod tables would be even horrendously larger.
so leaving aside modelling optimization techniques like partition pruning/ query pruning / etc etc . what is the best join strategy for just large tables .
The join fields are 10 characters , one option could be converting the alpha numeric characters to ascii or unicode but 10 characters would be blown to a 20 byte integers , so a join on just a big field would be worse .
any idea's on join optimization .
PS : we will be using SDI , so we can do complex transformation on real time .