The development of computer vision systems able to detect humans and to recognize their activities is a broad effort with applications in areas including virtual reality, smart monitoring and surveillance systems, motion analysis in sports, medicine and choreography, and vision-based user interfaces, etc. The understanding of human activity is a diverse and complex subject that includes tracking and modeling human activity, and representing video events at the semantic level. Its scope ranges from understanding the actions of an isolated person to understanding the actions and interactions of a crowd, or the interaction of objects like pieces of luggage or cars with persons.At The University of Texas at Austin, we are pursuing a number of projects on human motion. Professor Aggarwal will present his research on modeling and recognition of human actions and interactions, and humannb and object interactions. The work includes the study of interactions at the gross level as well as at the detailed level. The two levels present different problems in terms of observation and analysis. At the gross level we model persons as blobs, and at the detailed level we conceptualize human actions in terms of an operational triplet 'agent-motion-target' similar to 'verb argument structure' in linguistics. We consider atomic actions, composite actions and interactions, and continued and recursive activities. In addition, we consider the interactions between a person and an object including climbing a fence. The issues considered in these problems will illustrate the richness and the difficulty associated with understanding human motion. Application of the above research to monitoring and surveillance will be discussed together with actual examples